Skip to content

OCPEDGE-2727: Add eval-skills presubmit and enhance agent-eval ref#81166

Draft
dhensel-rh wants to merge 2 commits into
openshift:mainfrom
dhensel-rh:OCPEDGE-2727-eval-ref
Draft

OCPEDGE-2727: Add eval-skills presubmit and enhance agent-eval ref#81166
dhensel-rh wants to merge 2 commits into
openshift:mainfrom
dhensel-rh:OCPEDGE-2727-eval-ref

Conversation

@dhensel-rh

@dhensel-rh dhensel-rh commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add EVAL_REPO_DIR env var to openshift-claude-agent-eval ref (backward compatible, defaults to /opt/ai-helpers)
  • Reorder commands.sh: plugins install → setup script → config check (setup script can now override config)
  • Support setup script output: file path overrides EVAL_CONFIG, directory sets EVAL_SNAPSHOT_DIR
  • Add eval-skills optional presubmit for openshift-eng/edge-tooling — triggers on ^plugins/.*/(skills|evals) changes

Test plan

  • Rehearse the eval-skills job on this PR
  • Validate end-to-end with a test PR that changes a SKILL.md in edge-tooling
  • Confirm backward compatibility: existing jobs using the ref still work (no EVAL_REPO_DIR override → defaults to /opt/ai-helpers)

Supersedes #80177.

🤖 Generated with Claude Code

Summary by CodeRabbit

This updates the OpenShift CI eval setup for openshift-eng/edge-tooling so the openshift-claude-agent-eval job can run from a configurable repository directory instead of assuming /opt/ai-helpers, while keeping that path as the default. It also changes the eval startup flow so the plugin install runs before the setup script, and the setup script can now override the eval config or provide a snapshot directory through its output.

In addition, it adds an optional eval-skills presubmit that triggers on changes under plugin skills and evals paths, enabling targeted eval coverage for those areas.

- Add EVAL_REPO_DIR env var to ref (backward compatible, defaults to
  /opt/ai-helpers)
- Reorder commands.sh: plugins install → setup script → config check
- Support setup script overriding EVAL_CONFIG or setting EVAL_SNAPSHOT_DIR
- Add eval-skills presubmit for edge-tooling (optional, triggers on
  plugins skill/eval changes)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 26, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 26, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@dhensel-rh: This pull request references OCPEDGE-2727 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Add EVAL_REPO_DIR env var to openshift-claude-agent-eval ref (backward compatible, defaults to /opt/ai-helpers)
  • Reorder commands.sh: plugins install → setup script → config check (setup script can now override config)
  • Support setup script output: file path overrides EVAL_CONFIG, directory sets EVAL_SNAPSHOT_DIR
  • Add eval-skills optional presubmit for openshift-eng/edge-tooling — triggers on ^plugins/.*/(skills|evals) changes

Test plan

  • Rehearse the eval-skills job on this PR
  • Validate end-to-end with a test PR that changes a SKILL.md in edge-tooling
  • Confirm backward compatibility: existing jobs using the ref still work (no EVAL_REPO_DIR override → defaults to /opt/ai-helpers)

Supersedes #80177.

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 34adfa72-89ec-4091-be8f-ff4dc5da6d65

📥 Commits

Reviewing files that changed from the base of the PR and between 19f82bf and 16dd02b.

📒 Files selected for processing (1)
  • ci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-commands.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • ci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-commands.sh

Walkthrough

A new optional eval-skills CI job was added for plugin skills/evals changes. The eval step now exposes EVAL_REPO_DIR, and the eval command uses that directory while resolving setup-script output into either EVAL_CONFIG or EVAL_SNAPSHOT_DIR.

Changes

Eval-skills job and step flow

Layer / File(s) Summary
Job and step wiring
ci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-ref.yaml, ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml
The eval step gains EVAL_REPO_DIR, and a new optional eval-skills job is added with change matching and eval-step environment wiring.
Eval setup command flow
ci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-commands.sh
The command script changes into EVAL_REPO_DIR, captures setup-script output, and uses it to update EVAL_CONFIG or EVAL_SNAPSHOT_DIR before the config check.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • stbenjam
  • enxebre
  • pmtk
🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Ipv6 And Disconnected Network Test Compatibility ⚠️ Warning The new eval-skills job clones a public GitHub repo, so it requires internet access and is not disconnected-compatible. Mirror/vendor the harness internally or skip this job on disconnected clusters (for example with [Skipped:Disconnected]).
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly captures the two main changes: the new eval-skills presubmit and the agent-eval ref enhancements.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PASS: The PR only changes CI YAML and a shell script; no Ginkgo It/Describe titles were added, and the only test name is a static JUnit case string.
Test Structure And Quality ✅ Passed No Ginkgo test code was changed; the PR only updates CI config and shell step-registry scripts, so the test-structure checklist is not applicable.
Microshift Test Compatibility ✅ Passed No new Ginkgo/e2e tests were added; the PR only changes CI YAML and shell scripts, and the modified files contain no MicroShift-unsupported APIs or test declarations.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests were added; the PR changes CI config and shell scripts, and none of the 41 changed Go files contain Ginkgo test defs.
Topology-Aware Scheduling Compatibility ✅ Passed PR only updates CI config and eval scripts; no affinity, nodeSelector, PDB, replicas, or topology-based scheduling logic was added.
Ote Binary Stdout Contract ✅ Passed PR only changes CI config and a shell step script; no Go main/TestMain/init code or stdout logging contract changes were introduced.
No-Weak-Crypto ✅ Passed Changed YAML and shell script only add eval config/setup flow; no weak ciphers, custom crypto, or secret/token comparisons found.
Container-Privileges ✅ Passed No privileged/hostNetwork/hostIPC/allowPrivilegeEscalation flags appear in the changed CI config or agent-eval ref/commands; the PR only adds env vars and script logic.
No-Sensitive-Data-In-Logs ✅ Passed No raw secrets/PII are logged; token contents aren’t echoed, xtrace is disabled before token loading, and new logs only print config/path/model metadata.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dhensel-rh
Once this PR has been reviewed and has the lgtm label, please assign prashanth684 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-commands.sh`:
- Around line 58-66: The setup-script output handling in
openshift-claude-agent-eval-commands.sh is too permissive and can misinterpret
bad stdout as a snapshot directory. Update the EVAL_SETUP_SCRIPT result handling
in the EVAL_SETUP_OUTPUT block to accept only valid existing directories (using
the existing EVAL_SNAPSHOT_DIR path), and explicitly fail fast for any other
non-empty output; keep the EVAL_CONFIG override path in the same flow and use
the existing EVAL_SETUP_OUTPUT, EVAL_SNAPSHOT_DIR, and EVAL_CONFIG symbols to
locate the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3f31e6a5-d159-4800-8b20-5fe860f14de7

📥 Commits

Reviewing files that changed from the base of the PR and between 53106f6 and 19f82bf.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main-presubmits.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (3)
  • ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml
  • ci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-commands.sh
  • ci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-ref.yaml

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@dhensel-rh: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-eng-ai-helpers-main-eval-payload-analysis openshift-eng/ai-helpers presubmit Registry content changed
pull-ci-openshift-eng-ai-helpers-main-eval-payload-analysis-changed openshift-eng/ai-helpers presubmit Registry content changed
pull-ci-openshift-eng-ai-helpers-main-eval-payload-analysis-minimal openshift-eng/ai-helpers presubmit Registry content changed
pull-ci-openshift-eng-ai-helpers-main-eval-classify-review-comment openshift-eng/ai-helpers presubmit Registry content changed
pull-ci-openshift-eng-ai-helpers-main-eval-address-reviews openshift-eng/ai-helpers presubmit Registry content changed
pull-ci-openshift-eng-edge-tooling-main-eval-skills openshift-eng/edge-tooling presubmit Presubmit changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants