OCPEDGE-2727: Add opt-in eval pre-push hook for skill quality checks by dhensel-rh · Pull Request #206 · openshift-eng/edge-tooling

dhensel-rh · 2026-06-24T17:05:44Z

What this does

Adds an opt-in pre-push hook that evaluates changed skills locally using
the agent-eval-harness. Push completes immediately — the eval runs in
the background and posts results back to the PR.

On the PR you get:

A commit status in the checks list (pending → success/failure)
A rolling comment with the full eval report (scores, judge explanations, root causes)

This replaces the baseline-comparison approach from #178. Instead of
storing scenarios and diffing against them, each run generates everything
fresh and judges on safety, completeness, and quality.

How to use it

# Install hooks (one-time)
make setup-githooks

# Push with eval enabled
RUN_EVAL=1 git push

The eval log is written to /tmp/skill-eval-<sha>.log. Without
RUN_EVAL=1, the hook is a no-op.

Files

File	Purpose
`.githooks/pre-push`	Backgrounds the eval, passes SHA and repo to the script
`scripts/eval-skill.sh`	Detects changed skills, runs analyze → dataset → run, posts status + report
`.gitignore`	Excludes `eval/runs/` from version control
`README.md`	Documents both hooks (pre-commit + pre-push)

openshift-ci · 2026-06-24T17:05:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhensel-rh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [dhensel-rh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-06-24T17:05:58Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds a pre-push hook that can run skill evaluation, updates the evaluation script to discover changed skills and report results, and refreshes setup docs. .gitignore now ignores eval/runs/.

Changes

Skill evaluation hook flow

Layer / File(s)	Summary
Hook bootstrap and log path `.githooks/pre-push`, `.gitignore`	The pre-push hook enables strict Bash mode, checks `RUN_EVAL`, derives repo and SHA values, starts the evaluation script in the background, and ignores `eval/runs/`.
Setup and change detection `scripts/eval-skill.sh`	The script sets runtime defaults, creates a temp workspace, detects changed `SKILL.md` files from `git diff`, and posts an initial pending status when work is found.
Per-skill evaluation and reporting `scripts/eval-skill.sh`	The script runs the three evaluation stages, records outcomes, finds the newest analysis output, prints the totals table, and posts final status and PR comment data through `gh`.
Setup docs `README.md`	The `make setup-githooks` description now lists the shared `pre-commit` and `pre-push` hooks.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 11

✅ Passed checks (11 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: an opt-in pre-push eval hook for skill quality checks.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
No-Weak-Crypto	✅ Passed	No weak-crypto primitives or custom crypto found in the touched files; the SHA references are commit hashes, not crypto usage.
Container-Privileges	✅ Passed	No container/K8s manifests were changed, and the modified hook/script/docs contain no privileged settings or allowPrivilegeEscalation fields.
No-Sensitive-Data-In-Logs	✅ Passed	Only non-secret data is logged (skill names, file paths, counts, and GitHub status); no passwords, tokens, PII, or hostnames are echoed.
No-Hardcoded-Secrets	✅ Passed	Reviewed the touched files; none contain hardcoded secrets, credential URLs, or api_key/secret/token/password string assignments.
No-Injection-Vectors	✅ Passed	Added shell scripts only use quoted variable interpolation; no eval/exec, SQL, YAML load, pickle, or unsafe DOM sinks found.
Ai-Attribution	✅ Passed	The PR mentions Claude Code, and the commit includes an `Assisted-by:` trailer; no commit messages contain `Co-Authored-By`.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.githooks/pre-push:
- Around line 1-10: The .githooks/pre-push hook does not follow the required
Shell standards because it uses the wrong shebang and is missing strict mode.
Update the script header in pre-push to use /usr/bin/bash and add set -euo
pipefail near the top, keeping the existing RUN_EVAL gate and trailing exit 0 so
the hook remains non-blocking.

In `@scripts/eval-skill.sh`:
- Line 26: The unquoted expansion of CHANGED_SKILLS in eval-skill.sh is relying
on word splitting and will break paths with spaces, so update the iteration used
by the printf and the later loop that consumes the same variable to avoid direct
expansion. Refactor the logic around CHANGED_SKILLS to read items safely with a
newline- or NUL-delimited loop using mapfile or while read, and keep the
existing per-file behavior without shell splitting.
- Around line 1-2: The shell script has the wrong shebang and is missing the
`-e` option. Update the header in `eval-skill.sh` to use `#!/usr/bin/bash` and
change the `set` line to include `-e` alongside `-u` and `pipefail`. Keep the
existing guarded `claude -p` flow and `exit 0` behavior intact while ensuring
unexpected command failures are caught.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 022cbed0-5eab-4342-baeb-6c30c897d0f4

📥 Commits

Reviewing files that changed from the base of the PR and between 6d5fd24 and 4188e4b.

📒 Files selected for processing (3)

.githooks/pre-push
.gitignore
scripts/eval-skill.sh

coderabbitai

🧹 Nitpick comments (1)

scripts/eval-skill.sh (1)
30-43: 📐 Maintainability & Code Quality | 🔵 Trivial

Avoid piling up eval comments in scripts/eval-skill.sh:30-42
post_pr_comment always creates a new PR comment. Switch to gh pr comment --edit-last --create-if-none --body ... (gh 2.19.0+) or update a marker comment so the eval report stays as a single rolling comment.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/eval-skill.sh` around lines 30 - 43, The post_pr_comment helper
currently adds a brand-new PR comment every run, causing duplicate eval reports.
Update post_pr_comment to reuse a single rolling comment by switching the gh pr
comment call to --edit-last --create-if-none with the same body, or otherwise
implement a marker-based update approach. Keep the change localized to
post_pr_comment and preserve the existing checks around EVAL_SHA, EVAL_REPO,
report_file, and gh availability.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@scripts/eval-skill.sh`:
- Around line 30-43: The post_pr_comment helper currently adds a brand-new PR
comment every run, causing duplicate eval reports. Update post_pr_comment to
reuse a single rolling comment by switching the gh pr comment call to
--edit-last --create-if-none with the same body, or otherwise implement a
marker-based update approach. Keep the change localized to post_pr_comment and
preserve the existing checks around EVAL_SHA, EVAL_REPO, report_file, and gh
availability.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3030db68-9892-4049-ac21-142353c96e33

📥 Commits

Reviewing files that changed from the base of the PR and between ffb90c6 and 4c17a87.

📒 Files selected for processing (2)

.githooks/pre-push
scripts/eval-skill.sh

🚧 Files skipped from review as they are similar to previous changes (1)

.githooks/pre-push

ggiguash · 2026-06-25T12:45:50Z

@@ -1,4 +1,5 @@
 .reports
+eval/runs/


I think this should also go to /tmp.

Is the suggestion to place the report outputs to /tmp so as not to have an entry in .gitignore ? The AGENT_EVAL_RUNS_DIR could be used here to satisfy this ask. WDYS @ggiguash ?

Adds a pre-push hook and eval-skill.sh script that detects changed skills and runs the full eval pipeline (analyze → dataset → run) locally. Push completes immediately — eval runs in the background via nohup/disown. Skill detection is scoped to the current user's commits only, so other people's merged work on the branch does not trigger eval. Results are posted to the PR as a commit status (pending → success/ failure in the checks list) and a rolling PR comment with the full analysis.md report. Gated on RUN_EVAL=1 — without it the hook is a no-op. Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dhensel-rh · 2026-06-25T14:40:55Z

Skill Eval Report

Test run (updated) — rolling comment verified.

Skill	Result
threat-model:lvms	PASS
threat-model:sno	PASS
two-node:cluster-diagnostic	PASS

Rolling update works — this replaced the previous comment.

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 24, 2026

coderabbitai Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 24, 2026

coderabbitai Bot requested changes Jun 24, 2026

View reviewed changes

Comment thread .githooks/pre-push Outdated

Comment thread scripts/eval-skill.sh Outdated

Comment thread scripts/eval-skill.sh Outdated

coderabbitai Bot approved these changes Jun 24, 2026

View reviewed changes

dhensel-rh force-pushed the OCPEDGE-2727-eval-hooks branch from 5931315 to f10090b Compare June 24, 2026 18:30

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

dhensel-rh force-pushed the OCPEDGE-2727-eval-hooks branch from 4c17a87 to b1ee040 Compare June 25, 2026 11:22

ggiguash reviewed Jun 25, 2026

View reviewed changes

dhensel-rh force-pushed the OCPEDGE-2727-eval-hooks branch from b1ee040 to 2c4a800 Compare June 25, 2026 13:36

dhensel-rh marked this pull request as draft June 26, 2026 18:39

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OCPEDGE-2727: Add opt-in eval pre-push hook for skill quality checks#206

OCPEDGE-2727: Add opt-in eval pre-push hook for skill quality checks#206
dhensel-rh wants to merge 1 commit into
openshift-eng:mainfrom
dhensel-rh:OCPEDGE-2727-eval-hooks

dhensel-rh commented Jun 24, 2026 •

edited

Loading

Uh oh!

openshift-ci Bot commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

Reviews paused

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

ggiguash Jun 25, 2026

Uh oh!

dhensel-rh Jun 25, 2026

Uh oh!

dhensel-rh commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

dhensel-rh commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

How to use it

Files

Uh oh!

openshift-ci Bot commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ggiguash Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

dhensel-rh Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

dhensel-rh commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Skill Eval Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dhensel-rh commented Jun 24, 2026 •

edited

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

dhensel-rh commented Jun 25, 2026 •

edited

Loading