OCPEDGE-2727: Add opt-in eval pre-push hook for skill quality checks#206
OCPEDGE-2727: Add opt-in eval pre-push hook for skill quality checks#206dhensel-rh wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dhensel-rh The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a pre-push hook that can run skill evaluation, updates the evaluation script to discover changed skills and report results, and refreshes setup docs. ChangesSkill evaluation hook flow
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 11✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.githooks/pre-push:
- Around line 1-10: The .githooks/pre-push hook does not follow the required
Shell standards because it uses the wrong shebang and is missing strict mode.
Update the script header in pre-push to use /usr/bin/bash and add set -euo
pipefail near the top, keeping the existing RUN_EVAL gate and trailing exit 0 so
the hook remains non-blocking.
In `@scripts/eval-skill.sh`:
- Line 26: The unquoted expansion of CHANGED_SKILLS in eval-skill.sh is relying
on word splitting and will break paths with spaces, so update the iteration used
by the printf and the later loop that consumes the same variable to avoid direct
expansion. Refactor the logic around CHANGED_SKILLS to read items safely with a
newline- or NUL-delimited loop using mapfile or while read, and keep the
existing per-file behavior without shell splitting.
- Around line 1-2: The shell script has the wrong shebang and is missing the
`-e` option. Update the header in `eval-skill.sh` to use `#!/usr/bin/bash` and
change the `set` line to include `-e` alongside `-u` and `pipefail`. Keep the
existing guarded `claude -p` flow and `exit 0` behavior intact while ensuring
unexpected command failures are caught.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 022cbed0-5eab-4342-baeb-6c30c897d0f4
📒 Files selected for processing (3)
.githooks/pre-push.gitignorescripts/eval-skill.sh
5931315 to
f10090b
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
scripts/eval-skill.sh (1)
30-43: 📐 Maintainability & Code Quality | 🔵 TrivialAvoid piling up eval comments in
scripts/eval-skill.sh:30-42
post_pr_commentalways creates a new PR comment. Switch togh pr comment --edit-last --create-if-none --body ...(gh 2.19.0+) or update a marker comment so the eval report stays as a single rolling comment.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/eval-skill.sh` around lines 30 - 43, The post_pr_comment helper currently adds a brand-new PR comment every run, causing duplicate eval reports. Update post_pr_comment to reuse a single rolling comment by switching the gh pr comment call to --edit-last --create-if-none with the same body, or otherwise implement a marker-based update approach. Keep the change localized to post_pr_comment and preserve the existing checks around EVAL_SHA, EVAL_REPO, report_file, and gh availability.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@scripts/eval-skill.sh`:
- Around line 30-43: The post_pr_comment helper currently adds a brand-new PR
comment every run, causing duplicate eval reports. Update post_pr_comment to
reuse a single rolling comment by switching the gh pr comment call to
--edit-last --create-if-none with the same body, or otherwise implement a
marker-based update approach. Keep the change localized to post_pr_comment and
preserve the existing checks around EVAL_SHA, EVAL_REPO, report_file, and gh
availability.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 3030db68-9892-4049-ac21-142353c96e33
📒 Files selected for processing (2)
.githooks/pre-pushscripts/eval-skill.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- .githooks/pre-push
4c17a87 to
b1ee040
Compare
| @@ -1,4 +1,5 @@ | |||
| .reports | |||
| eval/runs/ | |||
There was a problem hiding this comment.
I think this should also go to /tmp.
There was a problem hiding this comment.
Is the suggestion to place the report outputs to /tmp so as not to have an entry in .gitignore ? The AGENT_EVAL_RUNS_DIR could be used here to satisfy this ask. WDYS @ggiguash ?
Adds a pre-push hook and eval-skill.sh script that detects changed skills and runs the full eval pipeline (analyze → dataset → run) locally. Push completes immediately — eval runs in the background via nohup/disown. Skill detection is scoped to the current user's commits only, so other people's merged work on the branch does not trigger eval. Results are posted to the PR as a commit status (pending → success/ failure in the checks list) and a rolling PR comment with the full analysis.md report. Gated on RUN_EVAL=1 — without it the hook is a no-op. Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
b1ee040 to
2c4a800
Compare
Skill Eval ReportTest run (updated) — rolling comment verified.
Rolling update works — this replaced the previous comment. |
What this does
Adds an opt-in pre-push hook that evaluates changed skills locally using
the agent-eval-harness. Push completes immediately — the eval runs in
the background and posts results back to the PR.
On the PR you get:
This replaces the baseline-comparison approach from #178. Instead of
storing scenarios and diffing against them, each run generates everything
fresh and judges on safety, completeness, and quality.
How to use it
The eval log is written to
/tmp/skill-eval-<sha>.log. WithoutRUN_EVAL=1, the hook is a no-op.Files
.githooks/pre-pushscripts/eval-skill.sh.gitignoreeval/runs/from version controlREADME.md