Inline Gemini reviewer for calibration#87
Conversation
Inlines _gemini-review.yml at pin 128656e40 + the 5 scripts and 4 layer files it depends on, replacing the thin caller of the reusable in HarperFast/ai-review-prompts with a self-contained workflow. Why: 3 logged Gemini production runs, 0 useful (2 short-circuited to "no blockers" on non-trivial CI changes; 1 false-positive finding). Iterating on the prompt through ai-review-prompts requires PR + merge + pin bump + CI wait per change. Inlining gives a single-repo edit-push-observe loop until output is comparable to Claude's. Surgery applied to the workflow file: - on: workflow_call -> on: pull_request + workflow_dispatch - dropped both ai-review-prompts checkouts (sparse + full) - dropped the HarperFast/skills checkout (verified: layers reference skills paths as text only, no filesystem reads) - inlined the review-layers list and the OAuth repo-specific-checks block as literals (previously workflow_call inputs from the caller) - bash steps reference .github/scripts/<name>.sh instead of .ai-review-prompts/.github/scripts/<name>.sh - workflow_dispatch inputs expose model + gemini-cli-version for per-run tuning from the Actions UI One non-cosmetic patch to compose-review-scope.sh: LAYERS_DIR env var indirection (default .github/review-layers) so the script can be driven from either the inlined or reusable shape with the same code. Claude reviewer is unchanged - still consumes the ai-review-prompts reusable at f22bf7d. Both reviewers run in parallel on this PR; Claude will review the inlined workflow itself, Gemini will exercise the inlined logic on its own diff. Promotion path: once Gemini output reliably matches Claude's quality, fold the refined prompt + any script tweaks back to ai-review-prompts and revert this file to its thin-caller form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Reviewed; no blockers found. |
| gemini-cli-version: | ||
| description: 'Gemini CLI version pin (tag / branch / SHA / latest)' | ||
| type: string | ||
| default: 'latest' |
There was a problem hiding this comment.
workflow_dispatch missing pr-number input — runs are non-functional from the Actions UI
workflow_dispatch doesn't carry github.event.pull_request.number, so every downstream use of that context is empty:
Find prior review commentcallsgh api "…/issues//comments"(empty issue path) — fails, butset -uo pipefailwithout-eswallows it, soprior_review.outputs.idis silently empty.- The Gemini prompt says "reviewing pull request #" (blank number) — the model has no useful PR context.
Post Gemini review commentcallsgh pr comment "" --body-file …— fails silently for the same reason. Nothing is posted.
Net result: an Actions UI run shows green but does nothing.
The fix depends on intent. If workflow_dispatch is meant for ad-hoc review of an existing PR:
| default: 'latest' | |
| gemini-cli-version: | |
| description: 'Gemini CLI version pin (tag / branch / SHA / latest)' | |
| type: string | |
| default: 'latest' | |
| pr-number: | |
| description: 'PR number to review (required for workflow_dispatch)' | |
| type: string | |
| required: true |
…and then use github.event.pull_request.number || inputs.pr-number wherever PR_NUMBER is set. If workflow_dispatch is only meant to test workflow plumbing (not actually post a review), a guard at the top of the review job and a clear comment explaining the limitation would do.
|
Reviewed; no blockers found. |
Added in the previous commit without a real use case ("for ad-hoc
tuning from the Actions UI") — every calibration tweak goes through
a commit on the branch anyway. Both reviewers caught the cost: the
trigger had no pr-number input, so any manual dispatch would have
queried API paths with an empty PR number and failed silently.
Hardcodes the two values that were behind the inputs fallback
(gemini_model + gemini_cli_version) at their previous default
values.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single-shot Gemini has no shell/API access from inside the prompt, so any awareness of prior review state has to be pre-fetched and injected. Adds a new workflow step between `find-prior-review-comment` and the gemini-cli step that fetches the prior comment body when one exists, and exposes it via a step output the prompt interpolates. Updates the prompt with a new `## Prior review` section that: - Documents the empty-vs-non-empty signal (first review vs update) - Instructs that findings present in the prior review but absent from the current diff should be acknowledged with the resolving commit SHA, not re-posted - Instructs "no blockers" summaries on follow-up runs to briefly note what changed (matching Claude's continuity pattern) Closes the one usability gap observed in PR #87's first calibration cycle — Claude noted "prior blocker resolved by commit 52b3a24", Gemini just said "no blockers" without continuity. find-prior-review-comment.sh is unchanged (byte-identical to its ai-review-prompts@128656e40 source) — the continuity logic lives in the workflow file, which is what's being calibrated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 3 of PR #87 calibration regressed: injecting a "no blockers" prior review primed Gemini to hunt for issues without giving it anything real to evaluate, and it manufactured two weak findings against the prompt itself. Two changes addressing the priming: 1. The prior-body fetch step now checks for `### N.` finding headers in the body before emitting. Empty / "no blockers" priors produce no output, so the prompt sees an empty continuity section and reviews the current diff normally. 2. The "## Prior review" prompt section is reframed as a "Continuity reference" — neutral about whether the section exists ("ignore it and review the current diff normally" if empty), permissive rather than prescriptive about referencing it ("your summary line can cite..." instead of "for each finding ... decide whether ..."). Drops the checklist-y enumeration that pushed the model into completing-a-checklist mode against an empty input. Drops the `git log` reference that contributed to the "do tools or don't I" confusion in single-shot mode. Continuity behavior is preserved when it's actually useful (a real prior finding exists) and suppressed when it isn't (clean prior or first run). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both findings from PR #87 rounds 3-4 are legitimate latent bugs in the prompt + post-script that were carried over from the ai-review-prompts@128656e40 source. Rounds 1-2 happened to not surface them because the model was focused on the diff content; once raised, they keep recurring because they remain unfixed. Finding 1 (prompt contradiction): the prompt forbade "shell commands, gh, or any API" while also instructing the model to "read agent context files first" and apply a layered review scope that requires reading the diff. In our configuration gemini-cli has default tools.core (full shell), so the model DOES use shell to research the diff — the prohibition was overreaching. Reworded to allow research tools (read files, git log/blame/diff, codebase search) while keeping the actual intent: do not attempt to post the review yourself, the workflow handles that. Finding 2 (marker check on indented output): the existing sed expression in post-review-comment.sh removed leading blank lines but preserved leading whitespace on the first non-blank line. If the model ever indented the marker line (e.g. " <!-- gemini- review:v1 -->"), the case check would fail and a valid review wouldn't post. Added a second sed pass to strip leading whitespace from the first non-blank line. The defense against mystery content (refuse if marker absent) is unchanged. post-review-comment.sh is no longer byte-identical to its source at 128656e40 — first divergence from the inlined-as-copy state. Note for promotion: this fix belongs in ai-review-prompts when the calibration cycle wraps up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Bump both reviewer pins to ai-review-prompts@ea19009 Adopts the promoted Gemini calibration (ai-review-prompts#41) and restores both reviewers to a single shared pin. Gemini changes vs the prior pin (128656e40): - Single-shot architecture validated empirically over 5 iteration rounds in PR #87 (gemini/inline-calibrate) - Prior-review body fetched and injected as continuity context when the prior had findings (### N. headers); skipped on no-blockers priors so the model doesn't get primed to hunt for issues against uninformative context - Prompt's "your output IS the PR comment" section reworded to allow shell/file research tools while still forbidding self- posting (the workflow handles posting) - post-review-comment.sh now strips leading whitespace from the marker line so a non-whitespace-stable provider can't break the marker check Claude reviewer is byte-identical between f22bf7d and ea19009 in ai-review-prompts; bumping the pin is a no-op for Claude behavior but restores the two reviewers to a single shared upgrade motion. The "intentionally NOT in lockstep" comment from the divergence period is replaced with a note that lockstep is restored. PR #87 (gemini/inline-calibrate) is now superseded by this PR and will be closed without merging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add caller-side permissions block to gemini-review.yml Mirrors the hardening on claude-review.yml — explicit grant at the calling-job level for the union the reusable's authorize + review jobs need (contents: read + pull-requests: write + id-token: write). The workflow happens to work without this on oauth today because the repo default is permissive enough, but the explicit grant survives repo-default changes and matches claude-review.yml's shape and discipline. Surfaced as a finding by Gemini's own review of this PR (#88) — exactly the loop the calibration was built for. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Superseded by #88 (which has merged). The calibration this branch was running is now in production via HarperFast/ai-review-prompts#41; oauth's gemini-review.yml caller is back to the thin-caller form pointing at Calibration summary for the record (rounds 1-5 on this branch):
Closing without merging — the inlined files served their purpose as a fast iteration loop; the changes have promoted upstream. |
Summary
Replaces oauth's thin caller of the
_gemini-review.ymlreusable inHarperFast/ai-review-promptswith a self-contained inlined copy of the same logic at pin128656e40(2026-05-12). The 5 scripts + 4 layer files the reusable depends on come along into.github/scripts/and.github/review-layers/.Why
3 logged Gemini production runs to date, 0 useful — two short-circuited to "no blockers" on non-trivial CI/permissions changes; one posted a false-positive finding (claimed CI pin "regression" on a release branch that pre-dated the "regressed" commit). The output gap from Claude is in the prompt and posting discipline, not the architecture.
Iterating on those through
ai-review-promptsis PR + merge + pin bump + CI wait per change — too slow when none of the prior iterations produced useful output. Inlining gives a single-repo edit-push-observe loop. Once Gemini output reliably matches Claude's quality, the refined prompt promotes back toai-review-promptsand this file reverts to its thin-caller form.What changed
Workflow file (
.github/workflows/gemini-review.yml):on: workflow_call→on: pull_request+workflow_dispatch(the latter exposesmodelandgemini-cli-versionfor ad-hoc tuning from the Actions UI)ai-review-promptscheckouts (sparse-checkout for the auth script + full clone for layers/scripts)HarperFast/skillscheckout — verified layer files referenceharper-best-practices/paths as text only, not filesystem readsworkflow_callinputs).github/scripts/<name>.shScripts (
.github/scripts/): 5 files, verbatim copy exceptcompose-review-scope.sh, which gets aLAYERS_DIRenv-var indirection (default.github/review-layers) so it works from either the inlined or reusable shape.Layer files (
.github/review-layers/): 4 files copied verbatim fromai-review-prompts@128656e40—universal.md,harper/common.md,harper/v5.md,repo-type/plugin.md.Claude reviewer (
.github/workflows/claude-review.yml) is unchanged — still consumes theai-review-promptsreusable atf22bf7d. Both reviewers run on this PR: Claude reviews the inlined workflow change itself; Gemini exercises the inlined logic on its own diff.Test plan
compose-review-scope.shemits a non-empty composed scope (≥ ~5 KB) underLAYERS_DIR=.github/review-layers<!-- gemini-review:v1 -->markerprovider:geminicommentOnce those land, iterate on the prompt in subsequent commits to this branch and observe.
🤖 Generated with Claude Code