From 5a90679fb8d66dcf816c690fed0f023897f50b6a Mon Sep 17 00:00:00 2001 From: Original Gary <276612211+OpenGaryBot@users.noreply.github.com> Date: Sat, 25 Apr 2026 19:11:24 +1000 Subject: [PATCH 1/8] =?UTF-8?q?feat(skills):=20add=20/merge=20operator=20c?= =?UTF-8?q?ommand=20=E2=80=94=20ranked=20merge=20queue?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Read-only operator HUD. Lists every PR currently at stage:ready-for-merge (or all-gates-green), ranks by calibrated HIGH/MED/LOW confidence, emits copy-paste merge commands the operator runs themselves. Consumes /run's report (~/.claude/orchestrator-log/run-*.md, <30min old) or derives directly via gh when log is stale. Confidence calibration is the load-bearing part. MED-cap signals (never HIGH): override:* labels, deployed-surface-no-live-check, sensitivity:private files, CodeRabbit findings above informational, skip:flaky tests, stale verifier (>24h), UI changes without persona-qa run. LOW signals: stacked MED-caps, adversarial needs-human-review, decision conflict, scope creep across >3 bounded contexts. Calibration intent: median outcome should be MED, not HIGH. If a first-day run surfaces 80% HIGH, the column adds no signal — tighten caps. Hard rules: - Never merges (read-only — copy-paste commands only) - Never invokes subagents (driving the pipeline is /run's job) - Never softens calibration; em-dash stays em-dash for HIGH Args: --risky inverts sort (LOW first); --repo scopes to one repo. disable-model-invocation: true — operator-only. --- claude-code/.claude/skills/merge/SKILL.md | 119 ++++++++++++++++++++++ 1 file changed, 119 insertions(+) create mode 100644 claude-code/.claude/skills/merge/SKILL.md diff --git a/claude-code/.claude/skills/merge/SKILL.md b/claude-code/.claude/skills/merge/SKILL.md new file mode 100644 index 0000000..28f6741 --- /dev/null +++ b/claude-code/.claude/skills/merge/SKILL.md @@ -0,0 +1,119 @@ +--- +name: merge +description: Ranked merge queue with honest confidence scoring. Replaces "open 12 PR tabs" with one table. Reads the latest /run report (if <30min old) or derives ready-for-merge state via gh, computes a calibrated HIGH/MED/LOW confidence per PR, emits a markdown table with copy-paste merge commands. Read-only — never merges, never invokes subagents. Operator-only. +disable-model-invocation: true +argument-hint: "[--risky] [--repo ]" +allowed-tools: Bash(gh:*), Bash(ls:*), Bash(date:*), Bash(stat:*), Read, Grep, Glob +model: opus +--- + +# /merge — ranked merge queue, no auto-merge + +Read-only operator HUD. Lists every PR currently at `stage:ready-for-merge` (or all-gates-green with no stage label blocking), ranks by calibrated confidence, emits copy-paste merge commands the operator runs themselves. + +`/merge` does NOT merge. It does NOT drive the pipeline (that's `/run`). It does NOT invoke subagents. It reads existing pipeline state and surfaces it. + +Read these every fire (auto-load via `InstructionsLoaded`; cite by name in any followup question): + +- `~/.claude/rules/pipeline-nevers.md` — `override:skip-adversarial`, `override:allow-score-drop` are MED-cap signals (never HIGH); never merges to main directly +- `~/.claude/rules/context-repo.md` — sensitivity:private files in a PR cap confidence at MED +- `~/.claude/rules/advocacy-domain.md` — bounded contexts list (used to detect scope-creep) + +## Argument parsing + +`$ARGUMENTS`: +- `--risky` — invert sort to ascending (LOW first); default sort is descending (HIGH first) +- `--repo ` — single-repo scope (`Open-Paws/`) + +## Algorithm + +### 1. Find the input + +```bash +LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1) +``` + +If `LATEST` exists AND its mtime is within the last 30 minutes: +- Read it. Pull every PR mentioned in `### Advanced` (where `stage to == stage:ready-for-merge`) and `### No action needed` (where row contains `stage:ready-for-merge`). +- Also pull every PR from `### Stopped at human gate` whose `needs:` reason is "human approval (non-last-pusher branch protection)" — those are gate-clear from the bot's side. + +Otherwise (no recent log, or `--repo` set forcing fresh derivation): +- Derive directly via `gh pr list --repo Open-Paws/ --state open --label 'stage:ready-for-merge' --json number,title,labels,url,statusCheckRollup,reviewDecision,headRefName,additions,deletions,changedFiles --limit 100` +- Plus a sweep of all-gates-green PRs without a blocking stage label: `gh pr list --repo Open-Paws/ --state open --json number,title,labels,url,statusCheckRollup,reviewDecision --limit 200` then filter client-side for `statusCheckRollup` all-SUCCESS and labels containing none of `stage:plan-in-progress`/`stage:tests-in-progress`/`stage:impl-in-progress`/`stage:fix-needed`/`stage:adversarial-pending`. + +If `--repo` not set, walk all Open-Paws repos for which OpenGaryBot has push access. + +### 2. Per-PR signals to gather + +For each candidate PR, fetch: + +```bash +gh pr view / --json number,title,labels,url,statusCheckRollup,reviewDecision,headRefName,additions,deletions,changedFiles,files,comments,body +``` + +Plus, if the PR touches a deployed surface, look for "live HTTP check" evidence in PR comments — search for `gh pr view --comments` output containing patterns like `curl -i`, `health check`, `200 OK`, `live check passed`. If none, the deployed-surface MED-cap fires. + +### 3. Confidence calibration (read carefully — calibration is half the value of this command) + +**A PR is NEVER HIGH if any of the following are true** (these always cap at MED or lower): + +- Label `override:skip-adversarial` is present +- Label `override:allow-score-drop` is present +- PR touches a deployed service (Cloud Run, Vercel project, Edge Function, etc.) AND no live HTTP check post-build is recorded in PR comments + - Detect deployed service via files matched: `Dockerfile`, `cloudrun*.yaml`, `vercel.json`, `supabase/functions/`, `.github/workflows/deploy*.yml`, etc. +- Files touched include any path tagged `sensitivity:private` per `context-repo.md` +- CodeRabbit found anything above informational severity (parse PR comments for `**Issue:**`/`**Refactor:**`/`**Bug:**`/`**Note:**` blocks; the `chill` profile uses `Note:` for informational, so anything else is above-informational) +- Any test in the PR was classified as `skip:flaky` by test-reviewer (look for that label OR comment from `test-reviewer` mentioning the classification) +- Time since `stage:verified` was applied (or last `verifier` completion comment) exceeds 24h (stale verification) +- PR touches a UI surface (`*.tsx`, `*.jsx`, `*.vue`, `*.svelte`, `app/`, `pages/`, `components/`) AND no `persona-qa` completion comment is present on the PR + +**A PR is LOW if any of the following are true:** + +- Multiple of the MED-cap conditions above stack (count ≥ 2) +- The `adversarial` subagent flagged anything as `needs-human-review` (look for that phrase in adversarial completion comment) +- PR conflicts with a closed decision in `$OP_CONTEXT_REPO/decisions.md` (cross-reference any decision-conflict entry from the run log; on direct derivation, scan PR body + diff for keywords against `$OP_CONTEXT_REPO/decisions.md` headings) +- Files touched span more than 3 of the bounded contexts in `~/.claude/rules/advocacy-domain.md` (Investigation Operations / Public Campaigns / Coalition Coordination / Legal Defense). Use heuristic: any file under `*/investigations/*` → Investigation Operations; under `*/campaigns/*` or `*/petitions/*` → Public Campaigns; under `*/coalitions/*` or `*/partners/*` → Coalition Coordination; under `*/legal/*` or `*/cases/*` → Legal Defense. Probable scope creep. + +**Otherwise: HIGH.** + +### 4. Calibration intent (this is why the score is useful) + +If the first day of running this surfaces 80% HIGH, the calibration is too lax — the operator stops reading the column because everything looks identical. The MED-cap conditions exist specifically to push the median to MED, not HIGH. Most PRs in flight will hit at least one MED-cap (deployed-surface-no-live-check, or UI-no-persona-qa, or stale-verifier). When the operator sees HIGH, it should mean "this one really is clean — the column actually distinguishes". + +If you find yourself reasoning "this PR is fine, let's call it HIGH despite a MED-cap firing" — STOP. The score is mechanical. Don't soften it. Surface the cap reason in the "Top reason not HIGH" column and let the operator decide. + +## Output format + +```markdown +## /merge queue — — N PRs ready + +| Conf | Repo | PR | Title | Top reason not HIGH | Merge command | +|------|------|----|----|---------------------|---------------| +| HIGH | | # | | — | gh pr merge --squash --delete-branch <url> | +| MED | <repo> | #<num> | <title 60ch> | <one-line reason> | gh pr merge --squash --delete-branch <url> | +| LOW | <repo> | #<num> | <title 60ch> | <one-line reason> | gh pr merge --squash --delete-branch <url> | + +Distribution: HIGH: N | MED: N | LOW: N +Source: <log path> (mtime: <H>m ago) OR Source: live gh derivation (no recent run log) +``` + +Default sort: descending — HIGH first, MED second, LOW last. With `--risky`: ascending — LOW first. + +Title truncation: hard 60-char limit. Suffix with `…` if truncated. Don't try to be clever; truncation is fine because the merge command has the full URL. + +The "Top reason not HIGH" column for HIGH PRs is **always em-dash (`—`), never a sycophantic comment**. If you find yourself writing "looks clean!" or "all gates green ✓" — that's wrong. Em-dash is the contract. + +If only one cap fires for a MED, write the one cap. If multiple cap conditions fire on a LOW, write the most decision-relevant one (typically: adversarial flag > decision conflict > scope creep > stacked MED-caps). Reserve the long explanation for follow-up questions; the column is a glance. + +Merge command always uses `--squash --delete-branch` per `pipeline-reference.md` STAGE 14. + +## Hard rules + +- **Never actually merge.** `/merge` is read-only. It produces copy-paste commands the operator runs. +- **Never invoke subagents.** This skill reads existing pipeline state. Driving the pipeline is `/run`'s job. +- **Never apply labels.** Including `ready-for-merge` itself. +- **HIGH means HIGH.** Don't soften the calibration to make the report feel friendlier. The whole value of the column is that it discriminates. + +## Followup questions + +After surfacing the table, the operator can ask "why is platform#118 LOW?" — answer from the data already gathered, citing the specific cap conditions that fired. Stay in main session for these followups; that's why this skill has no `agent:` field. From 717ea417c4932dbe09c46cb50a32d19189df8242 Mon Sep 17 00:00:00 2001 From: Original Gary <276612211+OpenGaryBot@users.noreply.github.com> Date: Sat, 25 Apr 2026 22:02:05 +1000 Subject: [PATCH 2/8] =?UTF-8?q?fix(merge):=20add=20Bash(head:*)=20to=20all?= =?UTF-8?q?owed-tools=20=E2=80=94=20CodeRabbit=20feedback?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- claude-code/.claude/skills/merge/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/claude-code/.claude/skills/merge/SKILL.md b/claude-code/.claude/skills/merge/SKILL.md index 28f6741..a1723b7 100644 --- a/claude-code/.claude/skills/merge/SKILL.md +++ b/claude-code/.claude/skills/merge/SKILL.md @@ -3,7 +3,7 @@ name: merge description: Ranked merge queue with honest confidence scoring. Replaces "open 12 PR tabs" with one table. Reads the latest /run report (if <30min old) or derives ready-for-merge state via gh, computes a calibrated HIGH/MED/LOW confidence per PR, emits a markdown table with copy-paste merge commands. Read-only — never merges, never invokes subagents. Operator-only. disable-model-invocation: true argument-hint: "[--risky] [--repo <name>]" -allowed-tools: Bash(gh:*), Bash(ls:*), Bash(date:*), Bash(stat:*), Read, Grep, Glob +allowed-tools: Bash(gh:*), Bash(head:*), Bash(ls:*), Bash(date:*), Bash(stat:*), Read, Grep, Glob model: opus --- From d4d06c040b9dda36bfd220743594952e0a6fa8b4 Mon Sep 17 00:00:00 2001 From: Original Gary <276612211+OpenGaryBot@users.noreply.github.com> Date: Sat, 25 Apr 2026 22:07:49 +1000 Subject: [PATCH 3/8] fix(merge): calibration treats pending operator approval as score event, not cap MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The /merge confidence score is meant to answer "is this PR ready for the operator's decision," not "have you decided yet." Previously, every PR sitting at mergeStateStatus == BLOCKED hit a blanket MED-cap — but for bot-authored PRs that's almost always "review approval pending," which is precisely the decision the operator runs /merge to make. The cap was circular. Changes: - Drop the blanket BLOCKED cap. Replace with finer-grained caps that catch real content concerns: reviewDecision == CHANGES_REQUESTED, mergeStateStatus == BEHIND (rebase needed before any merge command succeeds). - Add explicit note: "BLOCKED waiting for operator approval" is NOT a cap. Score answers content readiness, not approval state. - Update merge command to chained "gh pr review --approve <url> && gh pr merge ..." so a single paste both satisfies branch protection's review-required gate and merges. On already-approved PRs, the first half is a benign re-approval. - Rewrite §4 calibration intent to reflect the new philosophy: a clean bot-authored PR IS allowed to be HIGH; the column distinguishes content concerns, not approval state. Also catches up the canonical repo file with iterations made in the local copy since #33 was opened (algorithm clarifications, gh as source of truth, output format details). Preserves Bash(head:*) in allowed-tools per #717ea41. --- claude-code/.claude/skills/merge/SKILL.md | 80 +++++++++++++++++------ 1 file changed, 61 insertions(+), 19 deletions(-) diff --git a/claude-code/.claude/skills/merge/SKILL.md b/claude-code/.claude/skills/merge/SKILL.md index a1723b7..2a51b61 100644 --- a/claude-code/.claude/skills/merge/SKILL.md +++ b/claude-code/.claude/skills/merge/SKILL.md @@ -1,6 +1,6 @@ --- name: merge -description: Ranked merge queue with honest confidence scoring. Replaces "open 12 PR tabs" with one table. Reads the latest /run report (if <30min old) or derives ready-for-merge state via gh, computes a calibrated HIGH/MED/LOW confidence per PR, emits a markdown table with copy-paste merge commands. Read-only — never merges, never invokes subagents. Operator-only. +description: Ranked merge queue with honest confidence scoring. Replaces "open 12 PR tabs" with one table. Pulls open PRs from gh, filters to `stage:ready-for-merge` (plus all-gates-green PRs without a blocking stage label), runs its own per-PR evaluation, computes calibrated HIGH/MED/LOW confidence, emits a markdown table with clickable PR links and copy-paste merge commands. Read-only — never merges, never invokes subagents. Operator-only. disable-model-invocation: true argument-hint: "[--risky] [--repo <name>]" allowed-tools: Bash(gh:*), Bash(head:*), Bash(ls:*), Bash(date:*), Bash(stat:*), Read, Grep, Glob @@ -9,9 +9,9 @@ model: opus # /merge — ranked merge queue, no auto-merge -Read-only operator HUD. Lists every PR currently at `stage:ready-for-merge` (or all-gates-green with no stage label blocking), ranks by calibrated confidence, emits copy-paste merge commands the operator runs themselves. +Read-only operator HUD. Pulls open PRs directly from gh, filters to `stage:ready-for-merge` (and all-gates-green PRs without a blocking stage label), runs its own per-PR evaluation against the calibration rules below, emits clickable PR links and copy-paste merge commands the operator runs themselves. -`/merge` does NOT merge. It does NOT drive the pipeline (that's `/run`). It does NOT invoke subagents. It reads existing pipeline state and surfaces it. +`/merge` does NOT merge. It does NOT drive the pipeline (that's `/run`). It does NOT invoke subagents. It does NOT depend on the latest `/run` report — gh is the source of truth, and this command does its own eval. Read these every fire (auto-load via `InstructionsLoaded`; cite by name in any followup question): @@ -27,28 +27,57 @@ Read these every fire (auto-load via `InstructionsLoaded`; cite by name in any f ## Algorithm -### 1. Find the input +### 1. Pull candidate PRs from gh (primary path — always run this) + +The canonical input is `gh pr list`, not the run log. Always derive live. + +**Step A — labelled `stage:ready-for-merge` PRs:** + +```bash +gh pr list --repo Open-Paws/<repo> --state open \ + --label 'stage:ready-for-merge' \ + --json number,title,labels,url,statusCheckRollup,reviewDecision,mergeStateStatus,mergeable,headRefName,additions,deletions,changedFiles \ + --limit 100 +``` + +**Step B — all-gates-green sweep** (catches PRs that are functionally ready but missing the label, e.g. label-application drift): + +```bash +gh pr list --repo Open-Paws/<repo> --state open \ + --json number,title,labels,url,statusCheckRollup,reviewDecision,mergeStateStatus,mergeable \ + --limit 200 +``` + +Client-side filter: keep PRs where `statusCheckRollup` is all-SUCCESS AND labels contain none of `stage:plan-in-progress`/`stage:tests-in-progress`/`stage:impl-in-progress`/`stage:fix-needed`/`stage:adversarial-pending`. Tag these as "no label, all green" in the source column so the operator knows they came in via the sweep, not the official label. + +**Repo scope:** + +- `--repo <name>` set → just that repo +- Not set → walk every Open-Paws repo where `OpenGaryBot` has push access. Get the list with `gh repo list Open-Paws --limit 100 --json name,viewerPermission` and keep entries where `viewerPermission` is `WRITE` or `ADMIN`. + +**Deduplicate** between Step A and Step B by `(repo, number)`. + +### 1b. Optional: cross-reference the latest run log (secondary signal only) ```bash LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1) ``` -If `LATEST` exists AND its mtime is within the last 30 minutes: -- Read it. Pull every PR mentioned in `### Advanced` (where `stage to == stage:ready-for-merge`) and `### No action needed` (where row contains `stage:ready-for-merge`). -- Also pull every PR from `### Stopped at human gate` whose `needs:` reason is "human approval (non-last-pusher branch protection)" — those are gate-clear from the bot's side. +If `LATEST` exists AND its mtime is within the last 30 minutes, read it ONLY to harvest two specific signals that are otherwise expensive to recompute live: + +- Adversarial subagent verdicts mentioned per-PR (cheap signal for the LOW caps below) +- Decision-conflict flags raised by the planner / plan-reviewer -Otherwise (no recent log, or `--repo` set forcing fresh derivation): -- Derive directly via `gh pr list --repo Open-Paws/<repo> --state open --label 'stage:ready-for-merge' --json number,title,labels,url,statusCheckRollup,reviewDecision,headRefName,additions,deletions,changedFiles --limit 100` -- Plus a sweep of all-gates-green PRs without a blocking stage label: `gh pr list --repo Open-Paws/<repo> --state open --json number,title,labels,url,statusCheckRollup,reviewDecision --limit 200` then filter client-side for `statusCheckRollup` all-SUCCESS and labels containing none of `stage:plan-in-progress`/`stage:tests-in-progress`/`stage:impl-in-progress`/`stage:fix-needed`/`stage:adversarial-pending`. +Do NOT use the run log to determine which PRs are candidates. Do NOT inherit the run log's HIGH/MED/LOW classification — recompute from scratch against the rules below. The run log is a hint store; gh is the source of truth. -If `--repo` not set, walk all Open-Paws repos for which OpenGaryBot has push access. +If `LATEST` is stale or missing, skip this step entirely. The eval still works; you just lose the prior-context shortcut. ### 2. Per-PR signals to gather For each candidate PR, fetch: ```bash -gh pr view <repo>/<num> --json number,title,labels,url,statusCheckRollup,reviewDecision,headRefName,additions,deletions,changedFiles,files,comments,body +gh pr view <repo>/<num> --json number,title,labels,url,statusCheckRollup,reviewDecision,mergeStateStatus,mergeable,headRefName,additions,deletions,changedFiles,files,comments,body ``` Plus, if the PR touches a deployed surface, look for "live HTTP check" evidence in PR comments — search for `gh pr view --comments` output containing patterns like `curl -i`, `health check`, `200 OK`, `live check passed`. If none, the deployed-surface MED-cap fires. @@ -66,9 +95,14 @@ Plus, if the PR touches a deployed surface, look for "live HTTP check" evidence - Any test in the PR was classified as `skip:flaky` by test-reviewer (look for that label OR comment from `test-reviewer` mentioning the classification) - Time since `stage:verified` was applied (or last `verifier` completion comment) exceeds 24h (stale verification) - PR touches a UI surface (`*.tsx`, `*.jsx`, `*.vue`, `*.svelte`, `app/`, `pages/`, `components/`) AND no `persona-qa` completion comment is present on the PR +- `reviewDecision == CHANGES_REQUESTED` (a reviewer — CodeRabbit or human — flagged a change still unaddressed; merging now ignores that feedback) +- `mergeStateStatus == BEHIND` (head ref out of date; merge command will bounce until the branch is rebased onto base — content may be clean but the operation can't complete) + +**Important: "BLOCKED waiting for operator approval" is NOT a cap.** When `mergeStateStatus == BLOCKED` and the only reason is "review approval pending" (i.e. `reviewDecision == REVIEW_REQUIRED` with no CHANGES_REQUESTED, branch is up-to-date, all checks green, no other cap fires), the PR is HIGH-eligible. The operator runs `/merge` *to decide which PR to approve* — circular-flagging "you haven't approved this yet" as a content concern makes the score useless. The score answers "is this ready for your decision," not "have you decided yet." If the merge command bounces because branch protection wants the operator's approval first, that's the expected workflow, not a calibration failure — the operator runs `gh pr review --approve <url>` immediately before the emitted `gh pr merge` command. **A PR is LOW if any of the following are true:** +- `mergeable == CONFLICTING` or `mergeStateStatus == DIRTY` (actual merge conflict — needs manual rebase + conflict resolution before any merge command will work, regardless of content quality) - Multiple of the MED-cap conditions above stack (count ≥ 2) - The `adversarial` subagent flagged anything as `needs-human-review` (look for that phrase in adversarial completion comment) - PR conflicts with a closed decision in `$OP_CONTEXT_REPO/decisions.md` (cross-reference any decision-conflict entry from the run log; on direct derivation, scan PR body + diff for keywords against `$OP_CONTEXT_REPO/decisions.md` headings) @@ -78,7 +112,11 @@ Plus, if the PR touches a deployed surface, look for "live HTTP check" evidence ### 4. Calibration intent (this is why the score is useful) -If the first day of running this surfaces 80% HIGH, the calibration is too lax — the operator stops reading the column because everything looks identical. The MED-cap conditions exist specifically to push the median to MED, not HIGH. Most PRs in flight will hit at least one MED-cap (deployed-surface-no-live-check, or UI-no-persona-qa, or stale-verifier). When the operator sees HIGH, it should mean "this one really is clean — the column actually distinguishes". +The score answers ONE question: *is the content of this PR ready for the operator's decision?* It does not answer "has the operator decided yet." Pending review approval is the operator's job to provide; flagging "you haven't approved this" as a content concern makes the score circular and useless. + +A clean bot-authored PR — CI green, no CodeRabbit issues, adversarial cleared, no merge conflict, no stale verifier, no UI-without-persona-qa, no deploy-without-live-check — is HIGH. The merge command bouncing on "review approval pending" is not a calibration failure, it's the expected workflow: operator reviews HIGH PRs, approves, merges. + +The MED-cap conditions exist to surface real content concerns: reviewer flagged unaddressed changes, deployed surface lacks a live HTTP check, UI lacks persona-qa, override label bypassed adversarial. When the operator sees MED, it means "look at this — there's something the pipeline noticed but couldn't resolve." When they see HIGH, it means "the pipeline thinks this is clean; your call." If you find yourself reasoning "this PR is fine, let's call it HIGH despite a MED-cap firing" — STOP. The score is mechanical. Don't soften it. Surface the cap reason in the "Top reason not HIGH" column and let the operator decide. @@ -87,16 +125,18 @@ If you find yourself reasoning "this PR is fine, let's call it HIGH despite a ME ```markdown ## /merge queue — <ISO timestamp> — N PRs ready -| Conf | Repo | PR | Title | Top reason not HIGH | Merge command | -|------|------|----|----|---------------------|---------------| -| HIGH | <repo> | #<num> | <title 60ch> | — | gh pr merge --squash --delete-branch <url> | -| MED | <repo> | #<num> | <title 60ch> | <one-line reason> | gh pr merge --squash --delete-branch <url> | -| LOW | <repo> | #<num> | <title 60ch> | <one-line reason> | gh pr merge --squash --delete-branch <url> | +| Conf | Repo | PR | Title | Top reason not HIGH | Approve+merge command | +|------|------|----|----|---------------------|-----------------------| +| HIGH | <repo> | [#<num>](<url>) | <title 60ch> | — | gh pr review --approve <url> && gh pr merge --squash --delete-branch <url> | +| MED | <repo> | [#<num>](<url>) | <title 60ch> | <one-line reason> | gh pr review --approve <url> && gh pr merge --squash --delete-branch <url> | +| LOW | <repo> | [#<num>](<url>) | <title 60ch> | <one-line reason> | gh pr review --approve <url> && gh pr merge --squash --delete-branch <url> | Distribution: HIGH: N | MED: N | LOW: N -Source: <log path> (mtime: <H>m ago) OR Source: live gh derivation (no recent run log) +Source: live gh derivation (<N> repos walked, <M> PRs evaluated). Run-log cross-reference: <used / skipped — stale / skipped — none>. ``` +**PR column always renders as `[#<num>](<url>)`, never bare `#<num>`** — clickable links per the standing operator-output rule. Same goes for any followup answer that names a PR. + Default sort: descending — HIGH first, MED second, LOW last. With `--risky`: ascending — LOW first. Title truncation: hard 60-char limit. Suffix with `…` if truncated. Don't try to be clever; truncation is fine because the merge command has the full URL. @@ -107,6 +147,8 @@ If only one cap fires for a MED, write the one cap. If multiple cap conditions f Merge command always uses `--squash --delete-branch` per `pipeline-reference.md` STAGE 14. +The chained `gh pr review --approve <url> && gh pr merge ...` lets the operator paste once. If they've already approved the PR via the UI, the first half is a benign re-approval; if they haven't, it satisfies branch protection's "review required" gate immediately before the merge attempt. Caveat: on a MED with `reviewDecision == CHANGES_REQUESTED`, the operator pasting this is explicitly overriding the reviewer's flagged concerns — the cap reason is in column 5 to make that visible. If they want to handle the change first, they don't paste; that's the whole point of MED. + ## Hard rules - **Never actually merge.** `/merge` is read-only. It produces copy-paste commands the operator runs. From 2fa857a6756c5d611b6025f235fe8fe3718376cd Mon Sep 17 00:00:00 2001 From: Original Gary <276612211+OpenGaryBot@users.noreply.github.com> Date: Sat, 25 Apr 2026 22:38:24 +1000 Subject: [PATCH 4/8] fix(merge): workflow-only deploy PRs don't trip the live-HTTP-check cap MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The deploy-surface MED-cap currently false-positives on PRs that add or modify a deploy workflow file (.github/workflows/deploy*.yml, Dockerfile) without touching any application code. Adding the deploy mechanism itself doesn't deploy anything until the next code-bearing PR merges, so a live HTTP check pre-merge is structurally impossible (chicken-and-egg) and the cap is misapplied. Carve-out: if the only deploy-surface files in the PR are the deploy mechanism files themselves AND no application code is touched (no src/, app/, lib/, pages/, components/, etc. outside tests/), the cap does NOT fire. The next PR that ships actual code through this deploy path will correctly trip the cap if it lacks live verification. Concrete example caught in the queue: slingshot-uk-phase1#324 adds .github/workflows/deploy-pipeline.yml only, no service code, was being flagged MED for "no live HTTP check" — impossible to satisfy pre-merge. Under the refined cap it correctly evaluates to HIGH. --- claude-code/.claude/skills/merge/SKILL.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/claude-code/.claude/skills/merge/SKILL.md b/claude-code/.claude/skills/merge/SKILL.md index 2a51b61..0fe69c2 100644 --- a/claude-code/.claude/skills/merge/SKILL.md +++ b/claude-code/.claude/skills/merge/SKILL.md @@ -88,8 +88,9 @@ Plus, if the PR touches a deployed surface, look for "live HTTP check" evidence - Label `override:skip-adversarial` is present - Label `override:allow-score-drop` is present -- PR touches a deployed service (Cloud Run, Vercel project, Edge Function, etc.) AND no live HTTP check post-build is recorded in PR comments +- PR deploys application code AND no live HTTP check post-build is recorded in PR comments - Detect deployed service via files matched: `Dockerfile`, `cloudrun*.yaml`, `vercel.json`, `supabase/functions/`, `.github/workflows/deploy*.yml`, etc. + - **Workflow-only carve-out:** If the ONLY deploy-surface files in the PR are themselves the deploy mechanism (e.g., the PR adds or modifies `.github/workflows/deploy*.yml` or a `Dockerfile`) AND no application code files are touched (no `src/`, `app/`, `lib/`, `pages/`, `components/`, `*.ts`/`*.tsx`/`*.py`/`*.go` etc. outside `tests/`), the cap does NOT fire. Adding the deploy mechanism doesn't deploy anything until the next code change merges. Live HTTP checks become relevant when the PR after this one ships actual code. - Files touched include any path tagged `sensitivity:private` per `context-repo.md` - CodeRabbit found anything above informational severity (parse PR comments for `**Issue:**`/`**Refactor:**`/`**Bug:**`/`**Note:**` blocks; the `chill` profile uses `Note:` for informational, so anything else is above-informational) - Any test in the PR was classified as `skip:flaky` by test-reviewer (look for that label OR comment from `test-reviewer` mentioning the classification) From bb59fd90d18b2ae4eb37a502a60f4e75bcd7d0a5 Mon Sep 17 00:00:00 2001 From: Original Gary <276612211+OpenGaryBot@users.noreply.github.com> Date: Sun, 26 Apr 2026 13:17:46 +1000 Subject: [PATCH 5/8] =?UTF-8?q?feat(skills):=20/merge=20=E2=80=94=20cap=20?= =?UTF-8?q?BLOCKED=20PRs=20at=20MED=20to=20flag=20branch=20protection=20fr?= =?UTF-8?q?iction?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes #36 --- claude-code/.claude/skills/merge/SKILL.md | 1 + 1 file changed, 1 insertion(+) diff --git a/claude-code/.claude/skills/merge/SKILL.md b/claude-code/.claude/skills/merge/SKILL.md index 0fe69c2..3603c31 100644 --- a/claude-code/.claude/skills/merge/SKILL.md +++ b/claude-code/.claude/skills/merge/SKILL.md @@ -98,6 +98,7 @@ Plus, if the PR touches a deployed surface, look for "live HTTP check" evidence - PR touches a UI surface (`*.tsx`, `*.jsx`, `*.vue`, `*.svelte`, `app/`, `pages/`, `components/`) AND no `persona-qa` completion comment is present on the PR - `reviewDecision == CHANGES_REQUESTED` (a reviewer — CodeRabbit or human — flagged a change still unaddressed; merging now ignores that feedback) - `mergeStateStatus == BEHIND` (head ref out of date; merge command will bounce until the branch is rebased onto base — content may be clean but the operation can't complete) +- `mergeStateStatus == BLOCKED` (branch protection blocks direct merge — content may be clean, but a required check is missing, the branch needs an admin override, or branch-protection criteria aren't met). See the carve-out below for the approval-pending sub-case. **Important: "BLOCKED waiting for operator approval" is NOT a cap.** When `mergeStateStatus == BLOCKED` and the only reason is "review approval pending" (i.e. `reviewDecision == REVIEW_REQUIRED` with no CHANGES_REQUESTED, branch is up-to-date, all checks green, no other cap fires), the PR is HIGH-eligible. The operator runs `/merge` *to decide which PR to approve* — circular-flagging "you haven't approved this yet" as a content concern makes the score useless. The score answers "is this ready for your decision," not "have you decided yet." If the merge command bounces because branch protection wants the operator's approval first, that's the expected workflow, not a calibration failure — the operator runs `gh pr review --approve <url>` immediately before the emitted `gh pr merge` command. From 0994033a396bb0fc8561d4651193abe9ab11ce36 Mon Sep 17 00:00:00 2001 From: Original Gary <276612211+OpenGaryBot@users.noreply.github.com> Date: Sun, 26 Apr 2026 13:44:30 +1000 Subject: [PATCH 6/8] =?UTF-8?q?fix(skills):=20/merge=20=E2=80=94=20drop=20?= =?UTF-8?q?BLOCKED+REVIEW=5FREQUIRED=20HIGH-eligibility=20carve-out=20per?= =?UTF-8?q?=20CodeRabbit?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The carve-out contradicted the MED-cap rule the prior commit added. CodeRabbit (review #PRR_kwDORVDiQs747-60) flagged it as Major. Operator sided with CodeRabbit's reading. Carve-out deleted; MED-cap for BLOCKED is now mechanical with no exceptions. Refs #36 --- claude-code/.claude/skills/merge/SKILL.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/claude-code/.claude/skills/merge/SKILL.md b/claude-code/.claude/skills/merge/SKILL.md index 3603c31..2f03dc1 100644 --- a/claude-code/.claude/skills/merge/SKILL.md +++ b/claude-code/.claude/skills/merge/SKILL.md @@ -98,9 +98,7 @@ Plus, if the PR touches a deployed surface, look for "live HTTP check" evidence - PR touches a UI surface (`*.tsx`, `*.jsx`, `*.vue`, `*.svelte`, `app/`, `pages/`, `components/`) AND no `persona-qa` completion comment is present on the PR - `reviewDecision == CHANGES_REQUESTED` (a reviewer — CodeRabbit or human — flagged a change still unaddressed; merging now ignores that feedback) - `mergeStateStatus == BEHIND` (head ref out of date; merge command will bounce until the branch is rebased onto base — content may be clean but the operation can't complete) -- `mergeStateStatus == BLOCKED` (branch protection blocks direct merge — content may be clean, but a required check is missing, the branch needs an admin override, or branch-protection criteria aren't met). See the carve-out below for the approval-pending sub-case. - -**Important: "BLOCKED waiting for operator approval" is NOT a cap.** When `mergeStateStatus == BLOCKED` and the only reason is "review approval pending" (i.e. `reviewDecision == REVIEW_REQUIRED` with no CHANGES_REQUESTED, branch is up-to-date, all checks green, no other cap fires), the PR is HIGH-eligible. The operator runs `/merge` *to decide which PR to approve* — circular-flagging "you haven't approved this yet" as a content concern makes the score useless. The score answers "is this ready for your decision," not "have you decided yet." If the merge command bounces because branch protection wants the operator's approval first, that's the expected workflow, not a calibration failure — the operator runs `gh pr review --approve <url>` immediately before the emitted `gh pr merge` command. +- `mergeStateStatus == BLOCKED` (branch protection blocks direct merge — content may be clean, but a required check is missing, the branch needs an admin override, or branch-protection criteria aren't met). **A PR is LOW if any of the following are true:** From bce80b426131e54169ce840327b80d1a6cf7c3b5 Mon Sep 17 00:00:00 2001 From: Original Gary <276612211+OpenGaryBot@users.noreply.github.com> Date: Sun, 26 Apr 2026 13:57:42 +1000 Subject: [PATCH 7/8] =?UTF-8?q?fix(skills):=20/merge=20=E2=80=94=20align?= =?UTF-8?q?=20lines=20115-118=20narrative=20with=20no-exceptions=20BLOCKED?= =?UTF-8?q?=20MED-cap?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Prior commit 0994033 deleted the structural line-102 carve-out but left the same conceptual contradiction in narrative form at 115-118. CodeRabbit re-flagged on re-review. Cleaning up the drift. Refs #36 --- claude-code/.claude/skills/merge/SKILL.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/claude-code/.claude/skills/merge/SKILL.md b/claude-code/.claude/skills/merge/SKILL.md index 2f03dc1..f8a0f50 100644 --- a/claude-code/.claude/skills/merge/SKILL.md +++ b/claude-code/.claude/skills/merge/SKILL.md @@ -112,9 +112,9 @@ Plus, if the PR touches a deployed surface, look for "live HTTP check" evidence ### 4. Calibration intent (this is why the score is useful) -The score answers ONE question: *is the content of this PR ready for the operator's decision?* It does not answer "has the operator decided yet." Pending review approval is the operator's job to provide; flagging "you haven't approved this" as a content concern makes the score circular and useless. +The score answers ONE question: *is this PR safe to merge right now with no further pipeline action?* If a required check is missing, branch protection is unsatisfied, or the merge command will bounce, the answer is no — and that drops confidence by design. The score is mechanical, not aspirational. -A clean bot-authored PR — CI green, no CodeRabbit issues, adversarial cleared, no merge conflict, no stale verifier, no UI-without-persona-qa, no deploy-without-live-check — is HIGH. The merge command bouncing on "review approval pending" is not a calibration failure, it's the expected workflow: operator reviews HIGH PRs, approves, merges. +A clean bot-authored PR — CI green, no CodeRabbit issues, adversarial cleared, no merge conflict, no stale verifier, no UI-without-persona-qa, no deploy-without-live-check, AND `mergeStateStatus` is mergeable (not BLOCKED, BEHIND, or DIRTY) — is HIGH. If `mergeStateStatus == BLOCKED` for any reason — missing required check, missing required approval, branch-protection criteria unmet — that's a MED cap, no exceptions. The cap fires on the mechanical state regardless of why it's blocked. The MED-cap conditions exist to surface real content concerns: reviewer flagged unaddressed changes, deployed surface lacks a live HTTP check, UI lacks persona-qa, override label bypassed adversarial. When the operator sees MED, it means "look at this — there's something the pipeline noticed but couldn't resolve." When they see HIGH, it means "the pipeline thinks this is clean; your call." From 5f7788fb9d26ecf8ada2f901d8419735be8b8b54 Mon Sep 17 00:00:00 2001 From: Original Gary <276612211+OpenGaryBot@users.noreply.github.com> Date: Sun, 26 Apr 2026 14:24:53 +1000 Subject: [PATCH 8/8] fix(merge): newest-of-both fixture vs orch log selection MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CodeRabbit asked for fixture-first precedence on the optional run-log secondary signal. Apply the same newest-of-both pattern just landed in unblock — a stale fixture must not shadow a fresh real run. Fixture pattern: run-*-fixture.md. Real pattern: run-*.md (with -fixture.md excluded). Pick whichever is newer; fall back to whichever exists alone. --- claude-code/.claude/skills/merge/SKILL.md | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/claude-code/.claude/skills/merge/SKILL.md b/claude-code/.claude/skills/merge/SKILL.md index f8a0f50..a76d132 100644 --- a/claude-code/.claude/skills/merge/SKILL.md +++ b/claude-code/.claude/skills/merge/SKILL.md @@ -60,9 +60,21 @@ Client-side filter: keep PRs where `statusCheckRollup` is all-SUCCESS AND labels ### 1b. Optional: cross-reference the latest run log (secondary signal only) ```bash -LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1) +FIXTURE_LATEST=$(ls -t ~/.claude/orchestrator-log/run-*-fixture.md 2>/dev/null | head -1) +ORCH_LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null | grep -v -- '-fixture\.md$' | head -1) +if [ -z "$FIXTURE_LATEST" ]; then + LATEST="$ORCH_LATEST" +elif [ -z "$ORCH_LATEST" ]; then + LATEST="$FIXTURE_LATEST" +elif [ "$FIXTURE_LATEST" -nt "$ORCH_LATEST" ]; then + LATEST="$FIXTURE_LATEST" +else + LATEST="$ORCH_LATEST" +fi ``` +Newest-of-both, not fixture-first. A stale fixture must not shadow a fresh real run report. + If `LATEST` exists AND its mtime is within the last 30 minutes, read it ONLY to harvest two specific signals that are otherwise expensive to recompute live: - Adversarial subagent verdicts mentioned per-PR (cheap signal for the LOW caps below)