OCPEDGE-2727: Add eval-skills presubmit and enhance agent-eval ref by dhensel-rh · Pull Request #81166 · openshift/release

dhensel-rh · 2026-06-26T19:06:45Z

Summary

Add EVAL_REPO_DIR env var to openshift-claude-agent-eval ref (backward compatible, defaults to /opt/ai-helpers)
Reorder commands.sh: plugins install → setup script → config check (setup script can now override config)
Support setup script output: file path overrides EVAL_CONFIG, directory sets EVAL_SNAPSHOT_DIR
Add eval-skills optional presubmit for openshift-eng/edge-tooling — triggers on ^plugins/.*/(skills|evals) changes

Test plan

Rehearse the eval-skills job on this PR
Validate end-to-end with a test PR that changes a SKILL.md in edge-tooling
Confirm backward compatibility: existing jobs using the ref still work (no EVAL_REPO_DIR override → defaults to /opt/ai-helpers)

Supersedes #80177.

🤖 Generated with Claude Code

Summary by CodeRabbit

This updates the OpenShift CI eval setup for openshift-eng/edge-tooling so the openshift-claude-agent-eval job can run from a configurable repository directory instead of assuming /opt/ai-helpers, while keeping that path as the default. It also changes the eval startup flow so the plugin install runs before the setup script, and the setup script can now override the eval config or provide a snapshot directory through its output.

In addition, it adds an optional eval-skills presubmit that triggers on changes under plugin skills and evals paths, enabling targeted eval coverage for those areas.

- Add EVAL_REPO_DIR env var to ref (backward compatible, defaults to /opt/ai-helpers) - Reorder commands.sh: plugins install → setup script → config check - Support setup script overriding EVAL_CONFIG or setting EVAL_SNAPSHOT_DIR - Add eval-skills presubmit for edge-tooling (optional, triggers on plugins skill/eval changes) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci-robot · 2026-06-26T19:06:49Z

@dhensel-rh: This pull request references OCPEDGE-2727 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

Add EVAL_REPO_DIR env var to openshift-claude-agent-eval ref (backward compatible, defaults to /opt/ai-helpers)

Reorder commands.sh: plugins install → setup script → config check (setup script can now override config)

Support setup script output: file path overrides EVAL_CONFIG, directory sets EVAL_SNAPSHOT_DIR

Add eval-skills optional presubmit for openshift-eng/edge-tooling — triggers on ^plugins/.*/(skills|evals) changes

Test plan

Rehearse the eval-skills job on this PR

Validate end-to-end with a test PR that changes a SKILL.md in edge-tooling

Confirm backward compatibility: existing jobs using the ref still work (no EVAL_REPO_DIR override → defaults to /opt/ai-helpers)

Supersedes #80177.

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-06-26T19:06:49Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2026-06-26T19:07:12Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 34adfa72-89ec-4091-be8f-ff4dc5da6d65

📥 Commits

Reviewing files that changed from the base of the PR and between 19f82bf and 16dd02b.

📒 Files selected for processing (1)

ci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-commands.sh

🚧 Files skipped from review as they are similar to previous changes (1)

ci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-commands.sh

Walkthrough

A new optional eval-skills CI job was added for plugin skills/evals changes. The eval step now exposes EVAL_REPO_DIR, and the eval command uses that directory while resolving setup-script output into either EVAL_CONFIG or EVAL_SNAPSHOT_DIR.

Changes

Eval-skills job and step flow

Layer / File(s)	Summary
Job and step wiring `ci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-ref.yaml`, `ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml`	The eval step gains `EVAL_REPO_DIR`, and a new optional `eval-skills` job is added with change matching and eval-step environment wiring.
Eval setup command flow `ci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-commands.sh`	The command script changes into `EVAL_REPO_DIR`, captures setup-script output, and uses it to update `EVAL_CONFIG` or `EVAL_SNAPSHOT_DIR` before the config check.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

openshift/release#81069: Both changes affect how the eval harness resolves EVAL_CONFIG in the agent-eval flow.

Suggested reviewers

stbenjam
enxebre
pmtk

🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Ipv6 And Disconnected Network Test Compatibility	⚠️ Warning	The new eval-skills job clones a public GitHub repo, so it requires internet access and is not disconnected-compatible.	Mirror/vendor the harness internally or skip this job on disconnected clusters (for example with `[Skipped:Disconnected]`).

✅ Passed checks (14 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly captures the two main changes: the new eval-skills presubmit and the agent-eval ref enhancements.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	PASS: The PR only changes CI YAML and a shell script; no Ginkgo It/Describe titles were added, and the only test name is a static JUnit case string.
Test Structure And Quality	✅ Passed	No Ginkgo test code was changed; the PR only updates CI config and shell step-registry scripts, so the test-structure checklist is not applicable.
Microshift Test Compatibility	✅ Passed	No new Ginkgo/e2e tests were added; the PR only changes CI YAML and shell scripts, and the modified files contain no MicroShift-unsupported APIs or test declarations.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	No new Ginkgo e2e tests were added; the PR changes CI config and shell scripts, and none of the 41 changed Go files contain Ginkgo test defs.
Topology-Aware Scheduling Compatibility	✅ Passed	PR only updates CI config and eval scripts; no affinity, nodeSelector, PDB, replicas, or topology-based scheduling logic was added.
Ote Binary Stdout Contract	✅ Passed	PR only changes CI config and a shell step script; no Go main/TestMain/init code or stdout logging contract changes were introduced.
No-Weak-Crypto	✅ Passed	Changed YAML and shell script only add eval config/setup flow; no weak ciphers, custom crypto, or secret/token comparisons found.
Container-Privileges	✅ Passed	No privileged/hostNetwork/hostIPC/allowPrivilegeEscalation flags appear in the changed CI config or agent-eval ref/commands; the PR only adds env vars and script logic.
No-Sensitive-Data-In-Logs	✅ Passed	No raw secrets/PII are logged; token contents aren’t echoed, xtrace is disabled before token loading, and new logs only print config/path/model metadata.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

openshift-ci · 2026-06-26T19:07:18Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dhensel-rh
Once this PR has been reviewed and has the lgtm label, please assign prashanth684 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

~~ci-operator/config/openshift-eng/edge-tooling/OWNERS~~ [dhensel-rh]
~~ci-operator/jobs/openshift-eng/edge-tooling/OWNERS~~ [dhensel-rh]
ci-operator/step-registry/openshift/claude/agent-eval/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-commands.sh`:
- Around line 58-66: The setup-script output handling in
openshift-claude-agent-eval-commands.sh is too permissive and can misinterpret
bad stdout as a snapshot directory. Update the EVAL_SETUP_SCRIPT result handling
in the EVAL_SETUP_OUTPUT block to accept only valid existing directories (using
the existing EVAL_SNAPSHOT_DIR path), and explicitly fail fast for any other
non-empty output; keep the EVAL_CONFIG override path in the same flow and use
the existing EVAL_SETUP_OUTPUT, EVAL_SNAPSHOT_DIR, and EVAL_CONFIG symbols to
locate the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3f31e6a5-d159-4800-8b20-5fe860f14de7

📥 Commits

Reviewing files that changed from the base of the PR and between 53106f6 and 19f82bf.

⛔ Files ignored due to path filters (1)

ci-operator/jobs/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main-presubmits.yaml is excluded by !ci-operator/jobs/**

📒 Files selected for processing (3)

ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml
ci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-commands.sh
ci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-ref.yaml

openshift-merge-bot · 2026-06-26T19:51:14Z

[REHEARSALNOTIFIER]
@dhensel-rh: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name	Repo	Type	Reason
pull-ci-openshift-eng-ai-helpers-main-eval-payload-analysis	openshift-eng/ai-helpers	presubmit	Registry content changed
pull-ci-openshift-eng-ai-helpers-main-eval-payload-analysis-changed	openshift-eng/ai-helpers	presubmit	Registry content changed
pull-ci-openshift-eng-ai-helpers-main-eval-payload-analysis-minimal	openshift-eng/ai-helpers	presubmit	Registry content changed
pull-ci-openshift-eng-ai-helpers-main-eval-classify-review-comment	openshift-eng/ai-helpers	presubmit	Registry content changed
pull-ci-openshift-eng-ai-helpers-main-eval-address-reviews	openshift-eng/ai-helpers	presubmit	Registry content changed
pull-ci-openshift-eng-edge-tooling-main-eval-skills	openshift-eng/edge-tooling	presubmit	Presubmit changed

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 26, 2026

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 26, 2026

dhensel-rh mentioned this pull request Jun 26, 2026

OCPEDGE-2727 Add eval harness config for cluster-diagnostic skill #80177

Closed

coderabbitai Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread ci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-commands.sh

coderabbit change to protect Fail fast on invalid setup-script output.

16dd02b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OCPEDGE-2727: Add eval-skills presubmit and enhance agent-eval ref#81166

OCPEDGE-2727: Add eval-skills presubmit and enhance agent-eval ref#81166
dhensel-rh wants to merge 2 commits into
openshift:mainfrom
dhensel-rh:OCPEDGE-2727-eval-ref

dhensel-rh commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

openshift-ci-robot commented Jun 26, 2026 •

edited by atlassian Bot

Loading

Summary

Test plan

Uh oh!

openshift-ci Bot commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

openshift-ci Bot commented Jun 26, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

openshift-merge-bot Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

dhensel-rh commented Jun 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented Jun 26, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

openshift-ci Bot commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

openshift-ci Bot commented Jun 26, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-merge-bot Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dhensel-rh commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented Jun 26, 2026 •

edited by atlassian Bot

Loading

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading