OCPEDGE-2727 Add eval harness config for cluster-diagnostic skill by dhensel-rh · Pull Request #80177 · openshift/release

dhensel-rh · 2026-06-06T10:10:59Z

Summary

Add evaluation config and test cases for the two-node:cluster-diagnostic skill
Covers validate and recovery-guide modes (3 validate cases, 2 recovery-guide cases)
5 judges: budget check, severity classification, procedure completeness, forbidden
recommendations, knowledge base accuracy
Enables /test eval-cluster-diagnostic in CI once the release repo entry is added
Test cases

Test cases

Five cases covering both testable modes:

validate (3 cases)

001: Sequential shutdown with shutdown -h — expects BLOCKER
002: Simultaneous Redfish GracefulShutdown — expects clean pass
005: pcs node standby --all — expects BLOCKER

recovery-guide (2 cases)

003: Full cluster shutdown procedure
004: Standby recovery via SSH (no BMC involved)

Summary by CodeRabbit

This pull request updates the OpenShift Edge Tooling CI configuration (ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml) by adding two new optional evaluation jobs:

eval-cluster-diagnostic
eval-threat-model-tnf

Both jobs run the openshift-claude-agent-eval workflow with the same settings:

EVAL_MODEL=claude-opus-4-6
EVAL_PARALLELISM="3"
EVAL_CONFIG pointing to a per-skill YAML under plugins/two-node/evals/

eval-cluster-diagnostic is additionally configured with run_if_changed: ^plugins/two-node/(skills|evals) so it triggers when the two-node skill/eval inputs change.

For eval-cluster-diagnostic, the PR aims to add CI-exercised evaluation coverage for two modes—validate (cases 001/002/005, with a mix of shutdown command and expected failure/pass expectations) and recovery-guide (cases 003/004)—evaluated by five judges: budget check, severity classification, procedure completeness, forbidden recommendations, and knowledge base accuracy.

Note: the referenced evaluation config files (plugins/two-node/evals/cluster-diagnostic.yaml and plugins/two-node/evals/threat-model-tnf.yaml) are not present in the current repo checkout, consistent with the draft review feedback that generated files may still be missing to make the jobs runnable end-to-end.

openshift-ci · 2026-06-06T10:11:02Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2026-06-06T10:11:21Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

This PR adds two optional CI test jobs (eval-cluster-diagnostic and eval-threat-model-tnf) to the openshift-eng-edge-tooling build configuration. Both jobs invoke the openshift-claude-agent-eval workflow and share the same model and parallelism settings, each pointing to its own evaluation configuration file.

Changes

Optional CI Evaluation Test Jobs

Layer / File(s)	Summary
Eval test job definitions `ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml`	Two optional test jobs are introduced: `eval-cluster-diagnostic` and `eval-threat-model-tnf`. Both are wired to the `openshift-claude-agent-eval` workflow with `EVAL_MODEL` set to `claude-opus-4-6` and `EVAL_PARALLELISM` set to `"3"`, each with its own `EVAL_CONFIG` path under `plugins/two-node/evals/`. The `eval-cluster-diagnostic` job includes a `run_if_changed` trigger for `plugins/two-node/(skills

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested labels

jira/valid-reference

🚥 Pre-merge checks | ✅ 15

✅ Passed checks (15 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding evaluation harness configuration for the cluster-diagnostic skill, which aligns with the core content of the pull request.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	No Ginkgo test files were modified or added in this PR. The PR only adds CI/YAML test job configurations (eval-cluster-diagnostic, eval-threat-model-tnf), not Ginkgo tests.
Test Structure And Quality	✅ Passed	PR contains no Ginkgo test code to review. The check is not applicable as the PR only modifies CI/operator YAML configuration files and documentation.
Microshift Test Compatibility	✅ Passed	This PR does not add any Ginkgo e2e tests. It only modifies CI/operator configuration YAML files to add workflow job entries. The check is not applicable since there are no It(), Describe(), Contex...
Single Node Openshift (Sno) Test Compatibility	✅ Passed	This PR adds only CI/YAML configuration and evaluation workflow entries, not Ginkgo e2e tests. The check applies specifically to new Go test code with It(), Describe(), Context(), When() patterns,...
Topology-Aware Scheduling Compatibility	✅ Passed	File is CI/operator configuration only, not deployment manifests or operator code. No scheduling constraints, pod affinity rules, topology assumptions, or cluster topologies are introduced.
Ote Binary Stdout Contract	✅ Passed	The PR only modifies CI/operator YAML configuration files and contains no OTE binary Go source code. The check requires flagging stdout writes in process-level Go code, which are not present in thi...
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	PR adds CI workflow configuration for AI agent evaluation harness, not Ginkgo e2e tests. Custom check for Ginkgo IPv6/disconnected network compatibility is not applicable.
No-Weak-Crypto	✅ Passed	PR adds only CI configuration files for OpenShift edge-tooling evaluation tests; no weak crypto patterns (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-t...
Container-Privileges	✅ Passed	No privileged container settings (privileged, hostPID, hostNetwork, hostIPC, SYS_ADMIN, allowPrivilegeEscalation, runAsRoot) found in the CI/operator configuration file added in this PR.
No-Sensitive-Data-In-Logs	✅ Passed	The PR adds two optional CI test entries to the openshift-eng-edge-tooling-main.yaml file. The new entries (eval-cluster-diagnostic and eval-threat-model-tnf) only set non-sensitive environment var...

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

ci-operator/projects/OCPEDGE-2505/findings.md (1)
69-81: ⚡ Quick win

Clarify interface naming: NodeController vs NodePowerController.

The document proposes two similar but distinct interfaces:

Lines 69-76: NodeController with 4 methods

Lines 144-160: NodePowerController with 4 methods plus FindNode()

The later interface (lines 144-160) adds the FindNode() method and uses more explicit type names (PowerState enum). It appears to be the refined version.

Consider removing the earlier interface definition (lines 69-81) and keeping only the detailed design in section 2, or explicitly note that lines 69-76 show the initial concept and lines 144-160 show the final design.

Also applies to: 143-160
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ci-operator/projects/OCPEDGE-2505/findings.md` around lines 69 - 81, The file
defines two overlapping interfaces—NodeController (with
PowerOff/PowerOn/GetNodeState/WaitForNodeState) and a refined
NodePowerController (with the same methods plus FindNode and using PowerState);
remove the earlier NodeController block or mark it explicitly as
deprecated/initial concept so only the refined NodePowerController (including
FindNode and PowerState) remains as the canonical interface; update any
surrounding text to reference NodePowerController and its methods (PowerOff,
PowerOn, GetNodeState, WaitForNodeState, FindNode) to avoid
duplication/confusion.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@ci-operator/projects/OCPEDGE-2505/findings.md`:
- Around line 69-81: The file defines two overlapping interfaces—NodeController
(with PowerOff/PowerOn/GetNodeState/WaitForNodeState) and a refined
NodePowerController (with the same methods plus FindNode and using PowerState);
remove the earlier NodeController block or mark it explicitly as
deprecated/initial concept so only the refined NodePowerController (including
FindNode and PowerState) remains as the canonical interface; update any
surrounding text to reference NodePowerController and its methods (PowerOff,
PowerOn, GetNodeState, WaitForNodeState, FindNode) to avoid
duplication/confusion.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 1a778987-f31c-4a7d-b3af-656bc5ff49bb

📥 Commits

Reviewing files that changed from the base of the PR and between ccef783 and c3c9640.

📒 Files selected for processing (8)

ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml
ci-operator/projects/OCPEDGE-2505/.gitignore
ci-operator/projects/OCPEDGE-2505/CLAUDE.md
ci-operator/projects/OCPEDGE-2505/findings.md
ci-operator/projects/tnf-upgrade-ci-lane/.gitignore
ci-operator/projects/tnf-upgrade-ci-lane/CLAUDE.md
ci-operator/projects/tnf-upgrade-ci-lane/ci-runs.md
ci-operator/projects/tnf-upgrade-ci-lane/test-failures.md

✅ Files skipped from review due to trivial changes (6)

ci-operator/projects/tnf-upgrade-ci-lane/.gitignore
ci-operator/projects/OCPEDGE-2505/.gitignore
ci-operator/projects/OCPEDGE-2505/CLAUDE.md
ci-operator/projects/tnf-upgrade-ci-lane/test-failures.md
ci-operator/projects/tnf-upgrade-ci-lane/ci-runs.md
ci-operator/projects/tnf-upgrade-ci-lane/CLAUDE.md

🚧 Files skipped from review as they are similar to previous changes (1)

ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml

ggiguash · 2026-06-07T06:53:04Z

    from: root
  run_if_changed: (SKILL\.md|^scripts/lint-skills\.py|^Makefile|^plugins/.*/skills/)
+- always_run: false
+  as: eval-cluster-diagnostic


Should we adopt the name of eval-<plugin> for the test?
There's also a question if we're going to run these evals separately or inside the same job.

The name is set when you run this command

/eval-analyze --skill <skill> --config <eval config name>

I am ok with the naming convention, starting with eval-<plugin> since they will be easy to recognize in openshift-eng-edge-tooling-main.yaml

I think the evals should run as part of CI for new skills. for existing skills, we should probably go back and revisit what we choose to do.

On a related note, I worry about
openshift-eng-edge-tooling-main.yaml growing a little unwieldy because the number of skills we currently have and might have in the future.

kasturinarra · 2026-06-09T08:17:31Z

    clone: true
    from: root
  run_if_changed: (SKILL\.md|^scripts/lint-skills\.py|^Makefile|^plugins/.*/skills/)
+- always_run: false


Worth considering adding run_if_changed: ^plugins/two-node/ here so these evals auto-trigger when the two-node skill code changes, rather than relying on someone remembering to type
/test eval-cluster-diagnostic manually.

Adding this line to
run_if_changed: ^plugins/two-node/(skills|evals)
will properly match files under skills or evals.

The main thing to be mindful of is that each triggered eval uses Claude Opus and can run up to 2 hours, so you'll consume API credits on every matching PR. If that's acceptable for the team, auto-triggering is fine.

kasturinarra · 2026-06-09T08:18:50Z

+  optional: true
+  steps:
+    env:
+      EVAL_CONFIG: plugins/two-node/evals/cluster-diagnostic.yaml


Instead of a separate test entry per eval, have you considered supporting a directory-based approach — something like EVAL_CONFIG_DIR: plugins/two-node/evals/(Not sure if this is feasible as of today, but asking) — where the commands.sh discovers and runs all eval configs in the directory? That way new evals are picked up automatically without needing a ci-operator config change each time. Would require a small update to the commands.sh loop logic from #79925.

This is possible to add. As a proof of concept to get buy-in from the team, I think I'll stick to the manual approach for now. This could be future enhancement.

kasturinarra · 2026-06-09T08:20:17Z

I think you are already aware of this, but the PR is missing generated files can you help add that please ? thanks !!

…eat-model skills Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci · 2026-06-16T19:06:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhensel-rh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~ci-operator/config/openshift-eng/edge-tooling/OWNERS~~ [dhensel-rh]
~~ci-operator/jobs/openshift-eng/edge-tooling/OWNERS~~ [dhensel-rh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-merge-bot · 2026-06-16T19:08:37Z

[REHEARSALNOTIFIER]
@dhensel-rh: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name	Repo	Type	Reason
pull-ci-openshift-eng-edge-tooling-main-eval-cluster-diagnostic	openshift-eng/edge-tooling	presubmit	Presubmit changed
pull-ci-openshift-eng-edge-tooling-main-eval-threat-model-tnf	openshift-eng/edge-tooling	presubmit	Presubmit changed

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

openshift-ci · 2026-06-16T19:14:19Z

@dhensel-rh: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

dhensel-rh · 2026-06-26T19:07:12Z

Superseded by #81166 — fresh branch from upstream/main with all changes consolidated.

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 6, 2026

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 6, 2026

openshift-merge-bot Bot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Jun 6, 2026

dhensel-rh mentioned this pull request Jun 6, 2026

OCPEDGE-2727: Add agent-eval-harness integration config for cluster-diagnostic skill openshift-eng/edge-tooling#178

Open

openshift-ci Bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 7, 2026

dhensel-rh force-pushed the OCPEDGE-2727 branch from c3c9640 to 8733ea2 Compare June 7, 2026 01:39

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 7, 2026

coderabbitai Bot reviewed Jun 7, 2026

View reviewed changes

ggiguash reviewed Jun 7, 2026

View reviewed changes

kasturinarra reviewed Jun 9, 2026

View reviewed changes

dhensel-rh and others added 2 commits June 16, 2026 14:40

OCPEDGE-2727: Add eval harness CI jobs for cluster-diagnostic and thr…

9a2ba48

…eat-model skills Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Added a line for run_if_changed

a8bfc87

dhensel-rh force-pushed the OCPEDGE-2727 branch from 8733ea2 to a8bfc87 Compare June 16, 2026 18:41

dhensel-rh marked this pull request as ready for review June 16, 2026 18:43

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 16, 2026

openshift-ci Bot requested review from fonta-rh and jerpeter1 June 16, 2026 18:44

Adding presubmit file and updating run_if_changed to both TNF skills

3dbd3be

openshift-merge-bot Bot removed the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Jun 16, 2026

dhensel-rh marked this pull request as draft June 26, 2026 18:39

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 26, 2026

dhensel-rh mentioned this pull request Jun 26, 2026

OCPEDGE-2727: Add eval-skills presubmit and enhance agent-eval ref #81166

Draft

3 tasks

dhensel-rh closed this Jun 26, 2026

Uh oh!

Conversation

dhensel-rh commented Jun 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test cases

Summary by CodeRabbit

Uh oh!

openshift-ci Bot commented Jun 6, 2026

Uh oh!

coderabbitai Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Suggested labels

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ggiguash Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

dhensel-rh Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kasturinarra Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhensel-rh Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

kasturinarra Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

dhensel-rh Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

kasturinarra commented Jun 9, 2026

Uh oh!

openshift-ci Bot commented Jun 16, 2026

Uh oh!

openshift-merge-bot Bot commented Jun 16, 2026

Uh oh!

openshift-ci Bot commented Jun 16, 2026

Uh oh!

dhensel-rh commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dhensel-rh commented Jun 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading

dhensel-rh Jun 8, 2026 •

edited

Loading

kasturinarra Jun 9, 2026 •

edited

Loading