Skip to content

OCPEDGE-2727: Add opt-in eval pre-push hook for skill quality checks#206

Draft
dhensel-rh wants to merge 1 commit into
openshift-eng:mainfrom
dhensel-rh:OCPEDGE-2727-eval-hooks
Draft

OCPEDGE-2727: Add opt-in eval pre-push hook for skill quality checks#206
dhensel-rh wants to merge 1 commit into
openshift-eng:mainfrom
dhensel-rh:OCPEDGE-2727-eval-hooks

Conversation

@dhensel-rh

@dhensel-rh dhensel-rh commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

What this does

Adds an opt-in pre-push hook that evaluates changed skills locally using
the agent-eval-harness. Push completes immediately — the eval runs in
the background and posts results back to the PR.

On the PR you get:

  • A commit status in the checks list (pending → success/failure)
  • A rolling comment with the full eval report (scores, judge explanations, root causes)

This replaces the baseline-comparison approach from #178. Instead of
storing scenarios and diffing against them, each run generates everything
fresh and judges on safety, completeness, and quality.

How to use it

# Install hooks (one-time)
make setup-githooks

# Push with eval enabled
RUN_EVAL=1 git push

The eval log is written to /tmp/skill-eval-<sha>.log. Without
RUN_EVAL=1, the hook is a no-op.

Files

File Purpose
.githooks/pre-push Backgrounds the eval, passes SHA and repo to the script
scripts/eval-skill.sh Detects changed skills, runs analyze → dataset → run, posts status + report
.gitignore Excludes eval/runs/ from version control
README.md Documents both hooks (pre-commit + pre-push)

@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhensel-rh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 24, 2026
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a pre-push hook that can run skill evaluation, updates the evaluation script to discover changed skills and report results, and refreshes setup docs. .gitignore now ignores eval/runs/.

Changes

Skill evaluation hook flow

Layer / File(s) Summary
Hook bootstrap and log path
.githooks/pre-push, .gitignore
The pre-push hook enables strict Bash mode, checks RUN_EVAL, derives repo and SHA values, starts the evaluation script in the background, and ignores eval/runs/.
Setup and change detection
scripts/eval-skill.sh
The script sets runtime defaults, creates a temp workspace, detects changed SKILL.md files from git diff, and posts an initial pending status when work is found.
Per-skill evaluation and reporting
scripts/eval-skill.sh
The script runs the three evaluation stages, records outcomes, finds the newest analysis output, prints the totals table, and posts final status and PR comment data through gh.
Setup docs
README.md
The make setup-githooks description now lists the shared pre-commit and pre-push hooks.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 11
✅ Passed checks (11 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: an opt-in pre-push eval hook for skill quality checks.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
No-Weak-Crypto ✅ Passed No weak-crypto primitives or custom crypto found in the touched files; the SHA references are commit hashes, not crypto usage.
Container-Privileges ✅ Passed No container/K8s manifests were changed, and the modified hook/script/docs contain no privileged settings or allowPrivilegeEscalation fields.
No-Sensitive-Data-In-Logs ✅ Passed Only non-secret data is logged (skill names, file paths, counts, and GitHub status); no passwords, tokens, PII, or hostnames are echoed.
No-Hardcoded-Secrets ✅ Passed Reviewed the touched files; none contain hardcoded secrets, credential URLs, or api_key/secret/token/password string assignments.
No-Injection-Vectors ✅ Passed Added shell scripts only use quoted variable interpolation; no eval/exec, SQL, YAML load, pickle, or unsafe DOM sinks found.
Ai-Attribution ✅ Passed The PR mentions Claude Code, and the commit includes an Assisted-by: trailer; no commit messages contain Co-Authored-By.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 24, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.githooks/pre-push:
- Around line 1-10: The .githooks/pre-push hook does not follow the required
Shell standards because it uses the wrong shebang and is missing strict mode.
Update the script header in pre-push to use /usr/bin/bash and add set -euo
pipefail near the top, keeping the existing RUN_EVAL gate and trailing exit 0 so
the hook remains non-blocking.

In `@scripts/eval-skill.sh`:
- Line 26: The unquoted expansion of CHANGED_SKILLS in eval-skill.sh is relying
on word splitting and will break paths with spaces, so update the iteration used
by the printf and the later loop that consumes the same variable to avoid direct
expansion. Refactor the logic around CHANGED_SKILLS to read items safely with a
newline- or NUL-delimited loop using mapfile or while read, and keep the
existing per-file behavior without shell splitting.
- Around line 1-2: The shell script has the wrong shebang and is missing the
`-e` option. Update the header in `eval-skill.sh` to use `#!/usr/bin/bash` and
change the `set` line to include `-e` alongside `-u` and `pipefail`. Keep the
existing guarded `claude -p` flow and `exit 0` behavior intact while ensuring
unexpected command failures are caught.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 022cbed0-5eab-4342-baeb-6c30c897d0f4

📥 Commits

Reviewing files that changed from the base of the PR and between 6d5fd24 and 4188e4b.

📒 Files selected for processing (3)
  • .githooks/pre-push
  • .gitignore
  • scripts/eval-skill.sh

Comment thread .githooks/pre-push Outdated
Comment thread scripts/eval-skill.sh Outdated
Comment thread scripts/eval-skill.sh Outdated
@dhensel-rh dhensel-rh force-pushed the OCPEDGE-2727-eval-hooks branch from 5931315 to f10090b Compare June 24, 2026 18:30

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
scripts/eval-skill.sh (1)

30-43: 📐 Maintainability & Code Quality | 🔵 Trivial

Avoid piling up eval comments in scripts/eval-skill.sh:30-42
post_pr_comment always creates a new PR comment. Switch to gh pr comment --edit-last --create-if-none --body ... (gh 2.19.0+) or update a marker comment so the eval report stays as a single rolling comment.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/eval-skill.sh` around lines 30 - 43, The post_pr_comment helper
currently adds a brand-new PR comment every run, causing duplicate eval reports.
Update post_pr_comment to reuse a single rolling comment by switching the gh pr
comment call to --edit-last --create-if-none with the same body, or otherwise
implement a marker-based update approach. Keep the change localized to
post_pr_comment and preserve the existing checks around EVAL_SHA, EVAL_REPO,
report_file, and gh availability.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@scripts/eval-skill.sh`:
- Around line 30-43: The post_pr_comment helper currently adds a brand-new PR
comment every run, causing duplicate eval reports. Update post_pr_comment to
reuse a single rolling comment by switching the gh pr comment call to
--edit-last --create-if-none with the same body, or otherwise implement a
marker-based update approach. Keep the change localized to post_pr_comment and
preserve the existing checks around EVAL_SHA, EVAL_REPO, report_file, and gh
availability.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3030db68-9892-4049-ac21-142353c96e33

📥 Commits

Reviewing files that changed from the base of the PR and between ffb90c6 and 4c17a87.

📒 Files selected for processing (2)
  • .githooks/pre-push
  • scripts/eval-skill.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • .githooks/pre-push

@dhensel-rh dhensel-rh force-pushed the OCPEDGE-2727-eval-hooks branch from 4c17a87 to b1ee040 Compare June 25, 2026 11:22
Comment thread .gitignore
@@ -1,4 +1,5 @@
.reports
eval/runs/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should also go to /tmp.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the suggestion to place the report outputs to /tmp so as not to have an entry in .gitignore ? The AGENT_EVAL_RUNS_DIR could be used here to satisfy this ask. WDYS @ggiguash ?

Adds a pre-push hook and eval-skill.sh script that detects changed
skills and runs the full eval pipeline (analyze → dataset → run)
locally. Push completes immediately — eval runs in the background
via nohup/disown.

Skill detection is scoped to the current user's commits only, so
other people's merged work on the branch does not trigger eval.

Results are posted to the PR as a commit status (pending → success/
failure in the checks list) and a rolling PR comment with the full
analysis.md report.

Gated on RUN_EVAL=1 — without it the hook is a no-op.

Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dhensel-rh dhensel-rh force-pushed the OCPEDGE-2727-eval-hooks branch from b1ee040 to 2c4a800 Compare June 25, 2026 13:36
@dhensel-rh

dhensel-rh commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Skill Eval Report

Test run (updated) — rolling comment verified.

Skill Result
threat-model:lvms PASS
threat-model:sno PASS
two-node:cluster-diagnostic PASS

Rolling update works — this replaced the previous comment.

@dhensel-rh dhensel-rh marked this pull request as draft June 26, 2026 18:39
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants