-
Notifications
You must be signed in to change notification settings - Fork 2.3k
OCPEDGE-2727 Add eval harness config for cluster-diagnostic skill #80177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -55,6 +55,26 @@ tests: | |
| clone: true | ||
| from: root | ||
| run_if_changed: (SKILL\.md|^scripts/lint-skills\.py|^Makefile|^plugins/.*/skills/) | ||
| - always_run: false | ||
| as: eval-cluster-diagnostic | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we adopt the name of
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The name is set when you run this command
I am ok with the naming convention, starting with I think the evals should run as part of CI for new skills. for existing skills, we should probably go back and revisit what we choose to do. On a related note, I worry about |
||
| optional: true | ||
| run_if_changed: ^plugins/two-node/(skills|evals) | ||
| steps: | ||
| env: | ||
| EVAL_CONFIG: plugins/two-node/evals/cluster-diagnostic.yaml | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of a separate test entry per eval, have you considered supporting a directory-based approach — something like EVAL_CONFIG_DIR: plugins/two-node/evals/(Not sure if this is feasible as of today, but asking) — where the commands.sh discovers and runs all eval configs in the directory? That way new evals are picked up automatically without needing a ci-operator config change each time. Would require a small update to the commands.sh loop logic from #79925.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is possible to add. As a proof of concept to get buy-in from the team, I think I'll stick to the manual approach for now. This could be future enhancement. |
||
| EVAL_MODEL: claude-opus-4-6 | ||
| EVAL_PARALLELISM: "3" | ||
| workflow: openshift-claude-agent-eval | ||
| - always_run: false | ||
| as: eval-threat-model-tnf | ||
| optional: true | ||
| run_if_changed: ^plugins/two-node/(skills|evals) | ||
| steps: | ||
| env: | ||
| EVAL_CONFIG: plugins/two-node/evals/threat-model-tnf.yaml | ||
| EVAL_MODEL: claude-opus-4-6 | ||
| EVAL_PARALLELISM: "3" | ||
| workflow: openshift-claude-agent-eval | ||
| - as: ocp-ci-monitor | ||
| cron: 0 7 * * 1-5 | ||
| reporter_config: | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth considering adding run_if_changed: ^plugins/two-node/ here so these evals auto-trigger when the two-node skill code changes, rather than relying on someone remembering to type
/test eval-cluster-diagnostic manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this line to
run_if_changed: ^plugins/two-node/(skills|evals)will properly match files under skills or evals.
The main thing to be mindful of is that each triggered eval uses Claude Opus and can run up to 2 hours, so you'll consume API credits on every matching PR. If that's acceptable for the team, auto-triggering is fine.