Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,26 @@ tests:
clone: true
from: root
run_if_changed: (SKILL\.md|^scripts/lint-skills\.py|^Makefile|^plugins/.*/skills/)
- always_run: false

@kasturinarra kasturinarra Jun 9, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth considering adding run_if_changed: ^plugins/two-node/ here so these evals auto-trigger when the two-node skill code changes, rather than relying on someone remembering to type
/test eval-cluster-diagnostic manually.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this line to
run_if_changed: ^plugins/two-node/(skills|evals)
will properly match files under skills or evals.

The main thing to be mindful of is that each triggered eval uses Claude Opus and can run up to 2 hours, so you'll consume API credits on every matching PR. If that's acceptable for the team, auto-triggering is fine.

as: eval-cluster-diagnostic

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we adopt the name of eval-<plugin> for the test?
There's also a question if we're going to run these evals separately or inside the same job.

@dhensel-rh dhensel-rh Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is set when you run this command

/eval-analyze --skill <skill> --config <eval config name>

I am ok with the naming convention, starting with eval-<plugin> since they will be easy to recognize in openshift-eng-edge-tooling-main.yaml

I think the evals should run as part of CI for new skills. for existing skills, we should probably go back and revisit what we choose to do.

On a related note, I worry about
openshift-eng-edge-tooling-main.yaml growing a little unwieldy because the number of skills we currently have and might have in the future.

optional: true
run_if_changed: ^plugins/two-node/(skills|evals)
steps:
env:
EVAL_CONFIG: plugins/two-node/evals/cluster-diagnostic.yaml

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a separate test entry per eval, have you considered supporting a directory-based approach — something like EVAL_CONFIG_DIR: plugins/two-node/evals/(Not sure if this is feasible as of today, but asking) — where the commands.sh discovers and runs all eval configs in the directory? That way new evals are picked up automatically without needing a ci-operator config change each time. Would require a small update to the commands.sh loop logic from #79925.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possible to add. As a proof of concept to get buy-in from the team, I think I'll stick to the manual approach for now. This could be future enhancement.

EVAL_MODEL: claude-opus-4-6
EVAL_PARALLELISM: "3"
workflow: openshift-claude-agent-eval
- always_run: false
as: eval-threat-model-tnf
optional: true
run_if_changed: ^plugins/two-node/(skills|evals)
steps:
env:
EVAL_CONFIG: plugins/two-node/evals/threat-model-tnf.yaml
EVAL_MODEL: claude-opus-4-6
EVAL_PARALLELISM: "3"
workflow: openshift-claude-agent-eval
- as: ocp-ci-monitor
cron: 0 7 * * 1-5
reporter_config:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,159 @@
presubmits:
openshift-eng/edge-tooling:
- agent: kubernetes
always_run: false
branches:
- ^main$
- ^main-
cluster: build12
context: ci/prow/eval-cluster-diagnostic
decorate: true
decoration_config:
sparse_checkout_files:
- images/Containerfile.ci
- images/Containerfile.markdownlint
labels:
ci.openshift.io/generator: prowgen
pj-rehearse.openshift.io/can-be-rehearsed: "true"
name: pull-ci-openshift-eng-edge-tooling-main-eval-cluster-diagnostic
optional: true
rerun_command: /test eval-cluster-diagnostic
run_if_changed: ^plugins/two-node/(skills|evals)
spec:
containers:
- args:
- --gcs-upload-secret=/secrets/gcs/service-account.json
- --image-import-pull-secret=/etc/pull-secret/.dockerconfigjson
- --lease-server-credentials-file=/etc/boskos/credentials
- --report-credentials-file=/etc/report/credentials
- --target=eval-cluster-diagnostic
command:
- ci-operator
env:
- name: HTTP_SERVER_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
image: quay-proxy.ci.openshift.org/openshift/ci:ci_ci-operator_latest
imagePullPolicy: Always
name: ""
ports:
- containerPort: 8080
name: http
resources:
requests:
cpu: 10m
volumeMounts:
- mountPath: /etc/boskos
name: boskos
readOnly: true
- mountPath: /secrets/gcs
name: gcs-credentials
readOnly: true
- mountPath: /secrets/manifest-tool
name: manifest-tool-local-pusher
readOnly: true
- mountPath: /etc/pull-secret
name: pull-secret
readOnly: true
- mountPath: /etc/report
name: result-aggregator
readOnly: true
serviceAccountName: ci-operator
volumes:
- name: boskos
secret:
items:
- key: credentials
path: credentials
secretName: boskos-credentials
- name: manifest-tool-local-pusher
secret:
secretName: manifest-tool-local-pusher
- name: pull-secret
secret:
secretName: registry-pull-credentials
- name: result-aggregator
secret:
secretName: result-aggregator
trigger: (?m)^/test( | .* )eval-cluster-diagnostic,?($|\s.*)
- agent: kubernetes
always_run: false
branches:
- ^main$
- ^main-
cluster: build12
context: ci/prow/eval-threat-model-tnf
decorate: true
decoration_config:
sparse_checkout_files:
- images/Containerfile.ci
- images/Containerfile.markdownlint
labels:
ci.openshift.io/generator: prowgen
pj-rehearse.openshift.io/can-be-rehearsed: "true"
name: pull-ci-openshift-eng-edge-tooling-main-eval-threat-model-tnf
optional: true
rerun_command: /test eval-threat-model-tnf
run_if_changed: ^plugins/two-node/(skills|evals)
spec:
containers:
- args:
- --gcs-upload-secret=/secrets/gcs/service-account.json
- --image-import-pull-secret=/etc/pull-secret/.dockerconfigjson
- --lease-server-credentials-file=/etc/boskos/credentials
- --report-credentials-file=/etc/report/credentials
- --target=eval-threat-model-tnf
command:
- ci-operator
env:
- name: HTTP_SERVER_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
image: quay-proxy.ci.openshift.org/openshift/ci:ci_ci-operator_latest
imagePullPolicy: Always
name: ""
ports:
- containerPort: 8080
name: http
resources:
requests:
cpu: 10m
volumeMounts:
- mountPath: /etc/boskos
name: boskos
readOnly: true
- mountPath: /secrets/gcs
name: gcs-credentials
readOnly: true
- mountPath: /secrets/manifest-tool
name: manifest-tool-local-pusher
readOnly: true
- mountPath: /etc/pull-secret
name: pull-secret
readOnly: true
- mountPath: /etc/report
name: result-aggregator
readOnly: true
serviceAccountName: ci-operator
volumes:
- name: boskos
secret:
items:
- key: credentials
path: credentials
secretName: boskos-credentials
- name: manifest-tool-local-pusher
secret:
secretName: manifest-tool-local-pusher
- name: pull-secret
secret:
secretName: registry-pull-credentials
- name: result-aggregator
secret:
secretName: result-aggregator
trigger: (?m)^/test( | .* )eval-threat-model-tnf,?($|\s.*)
- agent: kubernetes
always_run: true
branches:
Expand Down