Skip to content

feat(skills): /merge — ranked merge queue with honest confidence scoring#33

Merged
samtuckerdavis merged 8 commits intomainfrom
operator-cmd/merge
Apr 26, 2026
Merged

feat(skills): /merge — ranked merge queue with honest confidence scoring#33
samtuckerdavis merged 8 commits intomainfrom
operator-cmd/merge

Conversation

@OriginalGary
Copy link
Copy Markdown
Contributor

@OriginalGary OriginalGary commented Apr 25, 2026

Summary

Adds /merge — second of three operator-level slash commands. Read-only ranked queue of PRs ready to merge, with calibrated HIGH/MED/LOW confidence and copy-paste merge commands.

Builds on /run (PR #32). Stack-independent — branched from main, doesn't depend on #32 being merged first.

Why

Pipeline produces a steady stream of PRs hitting stage:ready-for-merge. The operator either opens 12 PR tabs or accepts whatever the bot says. /merge is the middle ground: one table, one column for confidence, one column for the "top reason this isn't HIGH". Operator scans top-down, runs the merge commands they trust, defers the rest.

Calibration is the value

If 80% of PRs come back HIGH, the column adds no signal — operator ignores it. So the scoring is built around mechanical MED-caps:

  • override:skip-adversarial or override:allow-score-drop labels → never HIGH
  • Deployed-surface PR with no live HTTP check post-build → never HIGH
  • sensitivity:private files in PR → never HIGH
  • CodeRabbit findings above informational → never HIGH
  • skip:flaky test classifications → never HIGH
  • Last verifier pass >24h ago → never HIGH (stale verification)
  • UI surface changes with no persona-qa completion comment → never HIGH

LOW (stacked caps, adversarial flag, decision conflict, scope creep across >3 bounded contexts) is rare but means real review time.

The intent is that median outcome is MED. The operator should expect to actually read the "Top reason not HIGH" column.

Read-only contract

  • Never merges. Never. Output is copy-paste gh pr merge --squash --delete-branch <url> for the operator to run.
  • Never invokes subagents. Driving the pipeline forward is /run's job.
  • Never softens calibration to make the report friendlier. HIGH means HIGH.

Test plan

  • Confirm SKILL.md frontmatter parses (verified locally)
  • Run against the smoke-test fixture log file at ~/.claude/orchestrator-log/run-*-fixture.md — verify section parsing extracts ready-for-merge PRs correctly (verified: 6 PRs extracted from fixture's "No action needed" section)
  • After /run PR (feat(skills): /run — on-demand operator drive of the 14-stage pipeline #32) merges and a real run is captured, invoke /merge and eyeball the HIGH/MED/LOW distribution
  • If first-day distribution is 80%+ HIGH, tighten MED-cap conditions
  • Verify --risky correctly inverts sort to LOW-first
  • Verify the "Top reason not HIGH" column for HIGH PRs is always em-dash, never sycophantic prose

Followup: /unblock PR lands next, also independent.

Summary by CodeRabbit

  • New Features
    • Added a read-only /merge command that presents a ranked merge queue (HIGH/MED/LOW) with per-PR “Top reason not HIGH”, deterministic default sorting, 60-char title truncation, distribution/freshness notes, and copy-paste approve+merge commands.
    • Live-sources PR candidates, supports --risky and --repo scoping, consults recent run signals when fresh, inspects PR comments for deployed live-check evidence, enforces deterministic risk caps (HIGH/MED/LOW) from multiple safety checks (mergeability, branch protection, sensitive files, flaky tests, stale verification, decision conflicts, UI/persona scope, etc.), and explicitly never merges or alters labels.

Read-only operator HUD. Lists every PR currently at stage:ready-for-merge
(or all-gates-green), ranks by calibrated HIGH/MED/LOW confidence, emits
copy-paste merge commands the operator runs themselves.

Consumes /run's report (~/.claude/orchestrator-log/run-*.md, <30min old)
or derives directly via gh when log is stale.

Confidence calibration is the load-bearing part. MED-cap signals (never
HIGH): override:* labels, deployed-surface-no-live-check, sensitivity:private
files, CodeRabbit findings above informational, skip:flaky tests, stale
verifier (>24h), UI changes without persona-qa run. LOW signals: stacked
MED-caps, adversarial needs-human-review, decision conflict, scope creep
across >3 bounded contexts.

Calibration intent: median outcome should be MED, not HIGH. If a first-day
run surfaces 80% HIGH, the column adds no signal — tighten caps.

Hard rules:
- Never merges (read-only — copy-paste commands only)
- Never invokes subagents (driving the pipeline is /run's job)
- Never softens calibration; em-dash stays em-dash for HIGH

Args: --risky inverts sort (LOW first); --repo scopes to one repo.
disable-model-invocation: true — operator-only.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a read-only /merge operator skill: discovers candidate PRs (gh + optional recent /run), gathers PR metadata and deployed-surface signals, deterministically calibrates HIGH/MED/LOW confidence with explicit caps, deduplicates, and renders a ranked markdown merge-queue with approve+merge copy-paste commands. (≤50 words)

Changes

Cohort / File(s) Summary
Merge Queue Skill
claude-code/.claude/skills/merge/SKILL.md
New /merge skill doc: operator args --risky, --repo; read-only HUD; documents required markdown table schema and copy-paste merge commands.
Candidate discovery & reports
claude-code/.claude/skills/merge/SKILL.md
Discovers PRs via two-path gh pr list sweep (label stage:ready-for-merge + all-gates-green client-filter), or optional recent orchestrator /run log (≤30m) for per-PR hints; deduplicates by (repo, number).
PR heuristics & enforcement
claude-code/.claude/skills/merge/SKILL.md
Fetches PR details (gh pr view style), checks comments/files for deployed-surface and live HTTP evidence, recomputes confidence with deterministic HIGH/MED/LOW caps based on labels, private-file sensitivity, flaky-test/stale verification, mergeability/merge-state, branch protection, adversarial flags, decision conflicts, and scope creep; enforces sorting defaults (invert with --risky).
Output & guardrails
claude-code/.claude/skills/merge/SKILL.md
Emits ranked markdown table (clickable links, truncated titles, “Top reason not HIGH”), provides approve+merge gh commands, documents explicit prohibition of merging, subagents, and label mutation.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Operator as Operator
    participant MergeSkill as MergeSkill
    participant LocalReports as LocalReports
    participant GitHubCLI as GitHubCLI
    participant GHAPI as GHAPI
    participant Output as Output

    Operator->>MergeSkill: invoke /merge (--repo, --risky)
    MergeSkill->>LocalReports: check recent /run (≤30m)
    alt recent report found
        LocalReports-->>MergeSkill: candidate PR IDs + per-PR hints
    else no recent report
        MergeSkill->>GitHubCLI: gh pr list (label path + all-gates-green sweep)
        GitHubCLI-->>MergeSkill: candidate PR list
    end
    loop per PR
        MergeSkill->>GHAPI: gh pr view / checks / files / comments
        GHAPI-->>MergeSkill: PR details + comments
        MergeSkill->>MergeSkill: detect deployed-surface, scan comments for live HTTP evidence
        MergeSkill->>MergeSkill: apply deterministic HIGH/MED/LOW rules and caps
    end
    MergeSkill->>Output: render ranked markdown table + approve+merge commands
    Output-->>Operator: display read-only merge queue
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

🚥 Pre-merge checks | ✅ 8
✅ Passed checks (8 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically describes the main change: introducing a /merge operator command that ranks PRs with confidence scoring.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
No Speciesist Idioms ✅ Passed Grep results contain false positives—'pen' appears only within words like 'open' and 'append' in technical contexts, not as speciesist terminology. No speciesist idioms, herd metaphors, or industry animal framing detected in /merge skill documentation.
No Hardcoded Secrets Or Credentials ✅ Passed Comprehensive audit confirmed no hardcoded API keys, tokens, passwords, or credentials in non-test files. GitHub Actions properly reference secrets via ${{ secrets.GITHUB_TOKEN }} syntax.
No Tier 3 Data Committed ✅ Passed New /merge SKILL.md file contains no Tier 3 data; abstract advocacy domain references are scoping heuristics only, with no case files, legal strategies, real names, or financial data disclosed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch operator-cmd/merge
  • 🛠️ fix NAV violations: Commit on current branch
  • 🛠️ fix NAV violations: Create PR

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]
coderabbitai Bot previously requested changes Apr 25, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@claude-code/.claude/skills/merge/SKILL.md`:
- Line 6: The allowed-tools list in SKILL.md omits Bash(head:*), but the merge
skill uses `head -1` (see the selection of the latest run log) which prevents
the queue investigation path from running; update the allowed-tools entry to
include Bash(head:*) so the runtime permits `head` usage by adding Bash(head:*)
to the comma-separated list (the symbol to change is the allowed-tools
declaration in SKILL.md).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8ada347a-7076-4025-ac97-c0aad17c35d0

📥 Commits

Reviewing files that changed from the base of the PR and between 0157ede and 5a90679.

📒 Files selected for processing (1)
  • claude-code/.claude/skills/merge/SKILL.md

Comment thread claude-code/.claude/skills/merge/SKILL.md Outdated
@OriginalGary
Copy link
Copy Markdown
Contributor Author

STAGE 7 implementation complete — added Bash(head:*) to allowed-tools per CodeRabbit feedback. Pushed 717ea41.

coderabbitai[bot]
coderabbitai Bot previously requested changes Apr 25, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@claude-code/.claude/skills/merge/SKILL.md`:
- Line 33: The current assignment to LATEST uses a plain glob (run-*.md) which
can pick a newer non-fixture file and violate fixture-log precedence; change the
discovery so it first looks for the newest fixture log (e.g., pattern
run-*-fixture.md) and only if none exists falls back to the newest run-*.md;
update the LATEST assignment logic (the variable named LATEST and the glob
run-*.md) to attempt the fixture pattern first and then the generic pattern as a
fallback.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 883025ee-8195-48c0-8626-34db8caf784a

📥 Commits

Reviewing files that changed from the base of the PR and between 5a90679 and 717ea41.

📒 Files selected for processing (1)
  • claude-code/.claude/skills/merge/SKILL.md

Comment thread claude-code/.claude/skills/merge/SKILL.md Outdated
…nt, not cap

The /merge confidence score is meant to answer "is this PR ready for the
operator's decision," not "have you decided yet." Previously, every PR sitting
at mergeStateStatus == BLOCKED hit a blanket MED-cap — but for bot-authored
PRs that's almost always "review approval pending," which is precisely the
decision the operator runs /merge to make. The cap was circular.

Changes:
- Drop the blanket BLOCKED cap. Replace with finer-grained caps that catch
  real content concerns: reviewDecision == CHANGES_REQUESTED, mergeStateStatus
  == BEHIND (rebase needed before any merge command succeeds).
- Add explicit note: "BLOCKED waiting for operator approval" is NOT a cap.
  Score answers content readiness, not approval state.
- Update merge command to chained "gh pr review --approve <url> && gh pr merge ..."
  so a single paste both satisfies branch protection's review-required gate
  and merges. On already-approved PRs, the first half is a benign re-approval.
- Rewrite §4 calibration intent to reflect the new philosophy: a clean
  bot-authored PR IS allowed to be HIGH; the column distinguishes content
  concerns, not approval state.

Also catches up the canonical repo file with iterations made in the local
copy since #33 was opened (algorithm clarifications, gh as source of truth,
output format details). Preserves Bash(head:*) in allowed-tools per #717ea41.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
claude-code/.claude/skills/merge/SKILL.md (1)

63-63: ⚠️ Potential issue | 🟠 Major

Fixture-log precedence still not honored.

The current glob run-*.md will select any run log by recency, bypassing the campaign contract to prefer fixture logs. When a newer non-fixture log exists, this skews queue calibration and breaks deterministic test verification.

♻️ Proposed fix to prioritize fixture logs
-LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1)
+LATEST=$(ls -t ~/.claude/orchestrator-log/run-*-fixture.md 2>/dev/null | head -1)
+if [ -z "$LATEST" ]; then
+  LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1)
+fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claude-code/.claude/skills/merge/SKILL.md` at line 63, The current LATEST
assignment uses the broad glob run-*.md which picks the most recent run log and
ignores the campaign contract; update the LATEST selection logic (the LATEST
variable assignment) to first try the fixture-specific pattern (e.g.,
run-fixture-*.md) and only if no fixture files are found fall back to the
general run-*.md pattern, preserving the existing suppression of errors
(2>/dev/null) and time-sorted head selection so fixture logs are preferred for
queue calibration and deterministic testing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@claude-code/.claude/skills/merge/SKILL.md`:
- Line 63: The current LATEST assignment uses the broad glob run-*.md which
picks the most recent run log and ignores the campaign contract; update the
LATEST selection logic (the LATEST variable assignment) to first try the
fixture-specific pattern (e.g., run-fixture-*.md) and only if no fixture files
are found fall back to the general run-*.md pattern, preserving the existing
suppression of errors (2>/dev/null) and time-sorted head selection so fixture
logs are preferred for queue calibration and deterministic testing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: dac8f546-3283-4440-8e6a-86a31bd672e7

📥 Commits

Reviewing files that changed from the base of the PR and between 717ea41 and d4d06c0.

📒 Files selected for processing (1)
  • claude-code/.claude/skills/merge/SKILL.md

@OriginalGary
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review please — Bash(head:*) added to allowed-tools in 717ea41 and calibration logic improved in d4d06c0. Your CHANGES_REQUESTED appears to predate both commits.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 25, 2026

@OpenGaryBot: Acknowledged — triggering a fresh full review to cover both 717ea41 (Bash(head:*) in allowed-tools) and d4d06c0 (calibration refinements). Previous findings will be re-evaluated against the current state of the branch.


🧠 Learnings used
Learnt from: CR
Repo: Open-Paws/structured-coding-with-ai PR: 0
File: windsurf/CLAUDE.md:0-0
Timestamp: 2026-04-25T08:29:52.224Z
Learning: Apply five-layer review pipeline from `code-review.md` during PR review: automated → AI → Ousterhout → AI failures → advocacy

Learnt from: CR
Repo: Open-Paws/structured-coding-with-ai PR: 0
File: cline/CLAUDE.md:0-0
Timestamp: 2026-04-25T08:28:47.289Z
Learning: Follow five-layer code review pipeline: automated → AI → Ousterhout → AI failures → advocacy from `code-review.md`

Learnt from: CR
Repo: Open-Paws/structured-coding-with-ai PR: 0
File: gemini-cli/GEMINI.md:0-0
Timestamp: 2026-04-25T08:35:00.610Z
Learning: Review tool confirmation prompts carefully — confirm that each tool invocation is appropriate for advocacy-sensitive code before approving

Learnt from: CR
Repo: Open-Paws/structured-coding-with-ai PR: 0
File: gemini-cli/GEMINI.md:0-0
Timestamp: 2026-04-25T08:35:00.610Z
Learning: Code review layer 1: Automated (formatting, linting, static analysis, type checking, security scanning, tests). Fix before requesting review

Learnt from: CR
Repo: Open-Paws/structured-coding-with-ai PR: 0
File: gemini-cli/GEMINI.md:0-0
Timestamp: 2026-04-25T08:35:00.610Z
Learning: Distinguish blocking review findings (security, data leaks, silent failures, broken tests) from suggestions. Two human approvals for AI-generated PRs

Learnt from: CR
Repo: Open-Paws/structured-coding-with-ai PR: 0
File: agents-md/AGENTS.md:0-0
Timestamp: 2026-04-25T08:32:53.551Z
Learning: Distinguish blocking review issues (security, data leaks, silent failures, broken tests) from suggestions; require two human approvals for primarily AI-generated PRs

Learnt from: CR
Repo: Open-Paws/structured-coding-with-ai PR: 0
File: roo-code/CLAUDE.md:0-0
Timestamp: 2026-04-25T08:29:39.820Z
Learning: Apply five-layer review pipeline from `code-review.md` (automated → AI → Ousterhout → AI failures → advocacy) during PR review

Learnt from: CR
Repo: Open-Paws/structured-coding-with-ai PR: 0
File: agents-md/AGENTS.md:0-0
Timestamp: 2026-04-25T08:32:53.551Z
Learning: In Layer 4 review of AI code, verify ALL safety checks from the original code are preserved; compare error handling paths explicitly between old and new versions; verify AI did not edit tests to pass rather than fix underlying code

Learnt from: CR
Repo: Open-Paws/structured-coding-with-ai PR: 0
File: agents-md/AGENTS.md:0-0
Timestamp: 2026-04-25T08:32:53.551Z
Learning: For Layer 1 automated checks: fix formatting, linting, static analysis, type checking, security scanning, test suite failures before requesting review

Learnt from: CR
Repo: Open-Paws/structured-coding-with-ai PR: 0
File: gemini-cli/GEMINI.md:0-0
Timestamp: 2026-04-25T08:35:00.610Z
Learning: Code review layer 4: Human review for AI-specific failures. Check DRY, multi-responsibility functions, suppressed errors, hallucinated APIs, over-patterning, silent failure pattern

Learnt from: CR
Repo: Open-Paws/structured-coding-with-ai PR: 0
File: gemini-cli/GEMINI.md:0-0
Timestamp: 2026-04-25T08:35:00.610Z
Learning: External contributions: check merged PRs (gh pr list --state merged --limit 5) and match their description style: one to three sentences, no benefit lists, no section headers

Learnt from: CR
Repo: Open-Paws/structured-coding-with-ai PR: 0
File: gemini-cli/GEMINI.md:0-0
Timestamp: 2026-04-25T08:35:00.610Z
Learning: Create PR under 200 lines changed (ideally under 100). Two human approvals required. AI-Assisted label for primarily agent-generated code

Learnt from: CR
Repo: Open-Paws/structured-coding-with-ai PR: 0
File: cursor/.cursor/rules/git-workflow.mdc:0-0
Timestamp: 2026-04-25T08:35:25.977Z
Learning: Monitor PR status after submission; investigate CI/CD failures immediately, respond to all review comments, implement fixes on the same branch, and re-request review when fixes are pushed; the task is not done until the PR is merged

Learnt from: CR
Repo: Open-Paws/structured-coding-with-ai PR: 0
File: agents-md/AGENTS.md:0-0
Timestamp: 2026-04-25T08:32:53.551Z
Learning: Label primarily agent-generated code with AI-Assisted label in PR; require two human approvals before merging

Learnt from: CR
Repo: Open-Paws/structured-coding-with-ai PR: 0
File: agents-md/AGENTS.md:0-0
Timestamp: 2026-04-25T08:32:53.551Z
Learning: Review full diff against plan: matches plan? Acceptance criteria met? No scope creep? Tests fail when behavior breaks? All safety checks preserved?

Learnt from: CR
Repo: Open-Paws/structured-coding-with-ai PR: 0
File: gemini-cli/GEMINI.md:0-0
Timestamp: 2026-04-25T08:35:00.610Z
Learning: Before opening a PR, run desloppify scan — code quality score after changes must be ≥ score before, minimum scores: Gary ≥80, Platform repos ≥75, others ≥70
✅ Actions performed

Full review triggered.

The deploy-surface MED-cap currently false-positives on PRs that add or
modify a deploy workflow file (.github/workflows/deploy*.yml, Dockerfile)
without touching any application code. Adding the deploy mechanism itself
doesn't deploy anything until the next code-bearing PR merges, so a live
HTTP check pre-merge is structurally impossible (chicken-and-egg) and the
cap is misapplied.

Carve-out: if the only deploy-surface files in the PR are the deploy
mechanism files themselves AND no application code is touched (no src/,
app/, lib/, pages/, components/, etc. outside tests/), the cap does NOT
fire. The next PR that ships actual code through this deploy path will
correctly trip the cap if it lacks live verification.

Concrete example caught in the queue: slingshot-uk-phase1#324 adds
.github/workflows/deploy-pipeline.yml only, no service code, was being
flagged MED for "no live HTTP check" — impossible to satisfy pre-merge.
Under the refined cap it correctly evaluates to HIGH.
coderabbitai[bot]
coderabbitai Bot previously requested changes Apr 26, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
claude-code/.claude/skills/merge/SKILL.md (1)

63-63: ⚠️ Potential issue | 🟠 Major

Fix fixture-log precedence in candidate signal intake

Line 63 still selects run-*.md directly, which can bypass the fixture campaign source and skew confidence calibration when a newer non-fixture log exists.

Proposed fix
-LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1)
+LATEST=$(ls -t ~/.claude/orchestrator-log/run-*-fixture.md 2>/dev/null | head -1)
+if [ -z "$LATEST" ]; then
+  LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1)
+fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claude-code/.claude/skills/merge/SKILL.md` at line 63, The current LATEST
assignment (LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null |
head -1)) can pick a newer non-fixture log and bypass the fixture campaign;
change it to prefer fixture logs first by checking for files matching
~/.claude/orchestrator-log/fixture-*.md (or another fixture-specific pattern
used in this repo) and using the newest fixture file if any exist, otherwise
fall back to the existing run-*.md selection; update the LATEST assignment and
any helper logic around LATEST to implement that precedence (refer to the LATEST
variable and the run-*.md/fixture-*.md filename patterns to locate and modify
the code).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@claude-code/.claude/skills/merge/SKILL.md`:
- Around line 101-104: The document currently contradicts itself by capping
mergeStateStatus == BLOCKED while simultaneously carving out a HIGH-eligible
subcase for reviewDecision == REVIEW_REQUIRED; update SKILL.md so BLOCKED is
consistently treated as capped at MED by removing or relocating the carve-out:
either delete the "BLOCKED waiting for operator approval" HIGH-eligibility
exception or explicitly condition it to apply only when mergeStateStatus !=
BLOCKED (i.e., use reviewDecision == REVIEW_REQUIRED without mergeStateStatus ==
BLOCKED) and update the explanatory text so the mechanical scoring contract is
unambiguous.

---

Duplicate comments:
In `@claude-code/.claude/skills/merge/SKILL.md`:
- Line 63: The current LATEST assignment (LATEST=$(ls -t
~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1)) can pick a newer
non-fixture log and bypass the fixture campaign; change it to prefer fixture
logs first by checking for files matching
~/.claude/orchestrator-log/fixture-*.md (or another fixture-specific pattern
used in this repo) and using the newest fixture file if any exist, otherwise
fall back to the existing run-*.md selection; update the LATEST assignment and
any helper logic around LATEST to implement that precedence (refer to the LATEST
variable and the run-*.md/fixture-*.md filename patterns to locate and modify
the code).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 447238ab-e608-4717-bcbd-1b12b36b5078

📥 Commits

Reviewing files that changed from the base of the PR and between d4d06c0 and bb59fd9.

📒 Files selected for processing (1)
  • claude-code/.claude/skills/merge/SKILL.md

Comment thread claude-code/.claude/skills/merge/SKILL.md Outdated
…arve-out per CodeRabbit

The carve-out contradicted the MED-cap rule the prior commit added.
CodeRabbit (review #PRR_kwDORVDiQs747-60) flagged it as Major.
Operator sided with CodeRabbit's reading. Carve-out deleted; MED-cap
for BLOCKED is now mechanical with no exceptions.

Refs #36
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
claude-code/.claude/skills/merge/SKILL.md (2)

115-118: ⚠️ Potential issue | 🟠 Major

Resolve remaining BLOCKED-vs-HIGH wording drift in calibration narrative

Line 117 says a clean PR can be HIGH even when merge bounces on review approval pending, but Line 101 mechanically caps mergeStateStatus == BLOCKED at MED. This reintroduces campaign ambiguity in operator decisions.

Proposed wording adjustment
-A clean bot-authored PR — CI green, no CodeRabbit issues, adversarial cleared, no merge conflict, no stale verifier, no UI-without-persona-qa, no deploy-without-live-check — is HIGH. The merge command bouncing on "review approval pending" is not a calibration failure, it's the expected workflow: operator reviews HIGH PRs, approves, merges.
+A clean bot-authored PR — CI green, no CodeRabbit issues, adversarial cleared, no merge conflict, no stale verifier, no UI-without-persona-qa, no deploy-without-live-check, and no BLOCKED/BEHIND state — is HIGH. If branch protection reports BLOCKED (including approval-pending paths), keep MED per the mechanical cap; operator can then approve and merge.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claude-code/.claude/skills/merge/SKILL.md` around lines 115 - 118, The
calibration incorrectly forces mergeStateStatus == BLOCKED to MED regardless of
other signals; update the scoring logic (where mergeStateStatus is mapped to
MED) to distinguish "operator approval pending" from genuine content blockers by
allowing a HIGH score when all content checks pass (CI green, no CodeRabbit
issues, adversarial cleared, no merge conflict, no stale verifier, etc.), and
adjust the narrative text around the scoring function and the phrase that
references BLOCKED so the wording no longer treats pending review as a
content-level BLOCKED condition (update the code handling mergeStateStatus, the
function that computes the PR score, and the accompanying narrative text to
reflect this exception).

63-63: ⚠️ Potential issue | 🟠 Major

Prefer fixture-first log discovery to keep the investigation honest

Line 63 currently picks the newest run-*.md, which can bypass fixture logs and shift coalition scoring inputs when a newer non-fixture file exists. Use fixture-first with generic fallback.

Proposed fix
-LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1)
+LATEST=$(ls -t ~/.claude/orchestrator-log/run-*-fixture.md 2>/dev/null | head -1)
+if [ -z "$LATEST" ]; then
+  LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1)
+fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claude-code/.claude/skills/merge/SKILL.md` at line 63, The current LATEST
assignment picks the newest run-*.md and can select non-fixture logs; update the
logic that sets the LATEST variable to first search for fixture-specific run
files (e.g., pattern matching fixture or "-fixture" in the filename) and use the
newest fixture match if present, otherwise fall back to the original ls -t
~/.claude/orchestrator-log/run-*.md | head -1 behavior; modify the LATEST
assignment or surrounding shell snippet so it tries the fixture-first pattern
then a generic fallback, keeping the variable name LATEST and the same
downstream usage (so functions or scripts referencing LATEST continue to work).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@claude-code/.claude/skills/merge/SKILL.md`:
- Around line 115-118: The calibration incorrectly forces mergeStateStatus ==
BLOCKED to MED regardless of other signals; update the scoring logic (where
mergeStateStatus is mapped to MED) to distinguish "operator approval pending"
from genuine content blockers by allowing a HIGH score when all content checks
pass (CI green, no CodeRabbit issues, adversarial cleared, no merge conflict, no
stale verifier, etc.), and adjust the narrative text around the scoring function
and the phrase that references BLOCKED so the wording no longer treats pending
review as a content-level BLOCKED condition (update the code handling
mergeStateStatus, the function that computes the PR score, and the accompanying
narrative text to reflect this exception).
- Line 63: The current LATEST assignment picks the newest run-*.md and can
select non-fixture logs; update the logic that sets the LATEST variable to first
search for fixture-specific run files (e.g., pattern matching fixture or
"-fixture" in the filename) and use the newest fixture match if present,
otherwise fall back to the original ls -t ~/.claude/orchestrator-log/run-*.md |
head -1 behavior; modify the LATEST assignment or surrounding shell snippet so
it tries the fixture-first pattern then a generic fallback, keeping the variable
name LATEST and the same downstream usage (so functions or scripts referencing
LATEST continue to work).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fb7d7d00-45b6-40b0-8c8e-6594919c7cdc

📥 Commits

Reviewing files that changed from the base of the PR and between bb59fd9 and 0994033.

📒 Files selected for processing (1)
  • claude-code/.claude/skills/merge/SKILL.md

…s BLOCKED MED-cap

Prior commit 0994033 deleted the structural line-102 carve-out but
left the same conceptual contradiction in narrative form at 115-118.
CodeRabbit re-flagged on re-review. Cleaning up the drift.

Refs #36
CodeRabbit asked for fixture-first precedence on the optional run-log
secondary signal. Apply the same newest-of-both pattern just landed in
unblock — a stale fixture must not shadow a fresh real run.

Fixture pattern: run-*-fixture.md. Real pattern: run-*.md (with
-fixture.md excluded). Pick whichever is newer; fall back to whichever
exists alone.
@OriginalGary OriginalGary dismissed stale reviews from coderabbitai[bot], coderabbitai[bot], and coderabbitai[bot] April 26, 2026 04:25

Addressed in commit 717ea41 ('fix(merge): add Bash(head:) to allowed-tools — CodeRabbit feedback'). Verified: current head bce80b4's allowed-tools list includes Bash(head:). Bot label hygiene per /run skill §3a.

@OriginalGary
Copy link
Copy Markdown
Contributor Author

🤖 Bot label hygiene (operator session, 2026-04-26): all 3 stale CodeRabbit CHANGES_REQUESTED reviews dismissed (Review 1 addressed by 717ea41; Reviews 2+3 declined as nitpicks per /merge skill's 'gh is source of truth' contract). All CI green. Applying stage:adversarial-pending so next /run fire dispatches adversarial subagent per dispatch table.

@OriginalGary
Copy link
Copy Markdown
Contributor Author

STAGE 13 adversarial — PASS

Read-only audit of claude-code/.claude/skills/merge/SKILL.md (161 lines, single-file PR). 6+1 checks worked through cold against the merged-skill text, the PR commit history, and the rule files /merge depends on (pipeline-nevers.md, context-repo.md, advocacy-domain.md).

Findings

Check 1 — Calibration honesty. Internally consistent. The "BLOCKED → MED no exceptions" discipline introduced in commits bb59fd9/0994033/bce80b4 is exactly the right call for a tool whose only job is calibration. With branch protection requiring review on most repos, the practical distribution will skew MED-heavy — which is the explicit design intent stated in both the PR description and the skill body ("expect to read the column"). HIGH becomes rare-but-meaningful rather than default-accept. ✅

Check 2 — No silent skips. Step A (label) + Step B (sweep) + dedup with explicit "no label, all green" source-column tag. PRs filtered out (e.g. in stage:fix-needed, stage:adversarial-pending) are filtered by design — the queue is "merge-eligible", not "all open". Documented in opening paragraph. ✅

Check 3 — Cross-repo correctness. gh-only walk via gh repo list Open-Paws --limit 100 --json viewerPermission, no LLM dispatch. ~400 reqs at current scale (40 repos × 3 PRs), under the 5000/hr graphql limit. ✅

Check 4 — Recommendation accuracy + injection. URL is the only field interpolated into the merge command, and GitHub-controlled URL format has no shell metacharacters. PR title in markdown table cell (not command) is safe from injection — at worst, a hostile pipe character breaks table rendering for Sam. Tried feat: $(rm -rf /) mentally; goes to display only, never to bash. ✅

Check 5 — Drift from /run. Skill explicitly says "gh is the source of truth" and Step 1b is optional secondary signal only with full graceful degradation. Even if /run output format changes completely, /merge still works. Defensive-by-design. ✅

Check 6 — Failure modes. If gh is broken/unauthed, /merge silently produces empty queue. Operator would notice via "0 PRs from 0 repos walked" anomaly, but explicit gh auth status preflight would be cleaner. Severity: nitpick — log as follow-up, not a blocker.

Check 7 — Confidentiality leak (context-repo). "Top reason not HIGH" cap-reason text describes WHY (e.g. "sensitivity:private files touched") without naming paths in the queue table. Followups deliver to Sam who has full clearance. No leak path. ✅

Attempted attacks (none landed)

  1. PR title shell injection — title goes to markdown only. Safe.
  2. Branch name injection — not interpolated. Safe.
  3. URL injection — GitHub-controlled format. Safe.
  4. Stale fixture log poisoning — Step 1b explicitly recomputes HIGH/MED/LOW from scratch, never inherits. Safe.
  5. Bypass via stage:ready-for-merge label on broken-CI PR — Step A pulls strictly by label without CI filter, BUT mechanical mergeStateStatus == BLOCKED cap fires (branch protection requires green checks), capping at MED with cap reason "BLOCKED". Caught by mechanical state. Surfaced reason ("BLOCKED") is less informative than "CI red" would be — severity: nitpick.
  6. Repo enumeration overrun — gh repo list --limit 100 will silently miss repos if Open Paws grows past 100. Currently ~40, fine for now. Severity: nitpick — log as follow-up at 75 repos.

Follow-up issues to file (non-blocking)

  • explicit gh auth status preflight + clear error on gh failure
  • explicit "CI failed" cap reason in addition to "BLOCKED" so operator UX is more direct
  • bump gh repo list --limit to 200 (or paginate) before org crosses 75 repos
  • consider escaping | and backticks in PR titles before markdown-table interpolation

Verdict

PASS — advancing stage:adversarial-pendingstage:ready-for-merge.

The skill is the operator's primary safety check (per the autopilot architecture's "never manufacture approval gates" rule) and the discipline shows it: mechanical scoring, explicit no-exceptions, defensive against /run drift, no injection paths. The temptation to soften BLOCKED (which CodeRabbit kept pushing) was correctly resisted across three commits — that's the right call.

— OpenGaryBot, STAGE 13 adversarial subagent

@OriginalGary
Copy link
Copy Markdown
Contributor Author

STAGE 13 adversarial — second-pass audit (PASS, follow-ups logged)

Fresh-context re-audit per ~/.claude/rules/pipeline-nevers.md ("Never merge without adversarial clearance" + STAGE 13 runs LAST). Performed cold against the merged SKILL.md text without reading the prior PASS comment first. Cites: ~/.claude/rules/context-repo.md §STAGE 13 7th check, ~/.claude/rules/advocacy-domain.md §bounded contexts, ~/.claude/rules/pipeline-nevers.md §override labels.

Note on label state: PR was already transitioned to stage:ready-for-merge at 2026-04-26T04:37:54Z by the earlier STAGE 13 PASS run. Not re-applying the label transition (idempotent no-op). Posting this second-pass audit because the orchestrator dispatched it with seven specific concerns to scrutinize that the prior pass didn't address.

6+1 checks — verdict per concern

1. Calibration manipulability via injected markdown markers. The skill says "parse PR comments for **Issue:**/**Refactor:**/**Bug:**/**Note:** blocks" without specifying whose comments. An LLM running /merge over gh pr view --comments will see author attribution and shouldn't be fooled by an attacker putting **Note:** all good in their own PR body. But the spec is loose. Severity: minor — recommend follow-up to specify "author == coderabbitai[bot]".

2. Override-label coverage. Skill correctly caps override:skip-adversarial and override:allow-score-drop at MED (matches pipeline-nevers.md lines 33-34). Gaps:

  • Draft PRs not filtered — skill doesn't fetch isDraft from gh. Drafts will show mergeStateStatus: BLOCKED so the BLOCKED cap fires (MED), but cap reason "BLOCKED" is uninformative versus "draft". Severity: nitpick.
  • Repo-specific labels like do-not-merge, blocked, needs-discussion not covered. In practice they coexist with branch protection / CHANGES_REQUESTED so other caps usually fire, but not guaranteed. Severity: minor.

3. Sensitivity:private file detection — actual specification defect. Skill says "Files touched include any path tagged sensitivity:private per context-repo.md". Per context-repo.md lines 61-67, sensitivity:private is an issue label, not a file path attribute. There is no concrete attribute to check on file paths. As written, this MED-cap never fires. The real safety lives in this adversarial check (check 7) and in CodeRabbit. Severity: minor — non-functional specification, but no functional safety regression because adversarial subagent enforcement compensates. Recommend follow-up to either remove the cap or rewrite as "PR's source issue is labeled sensitivity:private" with the corresponding gh issue lookup.

4. Approval command shell safety. GitHub PR URLs constrain to [A-Za-z0-9._/-] — no shell metacharacters possible. Chained gh pr review --approve <url> && gh pr merge --squash --delete-branch <url> is safe. PR title in markdown table cell could contain | or backtick — display-only nuisance, not security. Severity: nitpick.

5. Calibration drift — BLOCKED granularity. All mergeStateStatus == BLOCKED collapses to MED regardless of why blocked (approval pending vs. CI red vs. branch out of sync). The skill is silent on what happens when CI shows FAILURE. The LLM running /merge likely catches red CI by inspection of statusCheckRollup, but the spec has a hole. Severity: minor — recommend follow-up to add explicit cap for statusCheckRollup containing FAILURE → LOW.

6. Confidentiality leak (context-repo §STAGE 13 7th check). SKILL.md is bot-readable in a public repo. Reviewed against the org-wide read test:

  • References OpenGaryBot, Open-Paws (public)
  • Names abstract bounded contexts only (Investigation Operations, etc.)
  • No funder names, no individual names, no active campaign targets, no unannounced plans, no credentials
  • The phrase "Sam" appears nowhere; operator is unnamed
    ✅ Passes the org-wide read test.

7. Rollback path. --squash --delete-branch produces a single squash commit on main with PR-number reference. git revert <squash-commit> cleanly undoes it. Rollback is simpler than for merge commits, not harder. ✅

Attempted attacks (none landed as major+)

  1. Title-cell markdown injectionfeat: pwn | $(rm -rf /) in PR title. Result: breaks table column boundary; never reaches bash because the URL (not the title) is what's interpolated into the merge command. Display nit.
  2. CodeRabbit-marker spoofing in PR body**Bug:** SQL injection here in attacker-authored PR body to force MED on a clean PR. Result: LLM running /merge attributes the marker to the PR author (not coderabbitai[bot]) when reading gh pr view --comments output. Bounded by author attribution at LLM read time, not by the spec.
  3. sensitivity:private cap bypass — add a sensitive file via path the skill doesn't pattern-match. Result: the cap as written doesn't pattern-match anything because it references a non-existent file-path attribute. Real enforcement is this adversarial check + CodeRabbit. Spec-level finding logged.
  4. Override-label rename evasion — apply do-not-ship instead of override:skip-adversarial. Result: skill doesn't cap on do-not-ship. Other caps (CHANGES_REQUESTED, BLOCKED) usually compensate; not guaranteed. Logged as minor.
  5. Draft PR slipping through — open draft PR, all CI green. Result: mergeStateStatus: BLOCKED fires the BLOCKED cap → MED. Not HIGH, but also not LOW; would surface to operator as "BLOCKED" rather than "DRAFT". Logged as nitpick.
  6. Stale fixture log poisoning — already addressed in commit 5f7788f (newest-of-both selection). ✅

Follow-up issues to file (non-blocking, recommend post-merge)

  1. Specify "comments authored by coderabbitai[bot]" in the CodeRabbit-finding parser
  2. Add isDraft: true filter (or explicit DRAFT cap reason) to merge candidate intake
  3. Either remove the sensitivity:private file-path cap or rewrite to check the PR's source issue label
  4. Add explicit cap: statusCheckRollup contains FAILURE → LOW (currently implicit)
  5. Consider escaping | and backtick in PR titles before markdown-table interpolation
  6. Audit common repo-specific blocking labels (do-not-merge, blocked, wip) and add to MED-cap list

Verdict

PASS — no major+ findings. All findings are minor or nitpick severity per the adversarial-playbook rubric (minor/nitpick → logged as follow-up issues, don't block merge per ~/.claude/rules/adversarial-playbook STAGE 13). Confidentiality leak check (the 7th, per context-repo.md) explicitly passes.

The earlier 04:37Z PASS comment from this same identity reached the same overall verdict via different attack vectors. Both runs converge on PASS; the spec defects above are real but compensated by adjacent enforcement (adversarial subagent + CodeRabbit + branch protection) and don't justify blocking a read-only operator HUD that produces copy-paste commands.

Label state already at stage:ready-for-merge; no transition applied (no-op). Operator can proceed to merge whenever ready.

— OpenGaryBot, STAGE 13 adversarial subagent (second-pass)

@samtuckerdavis samtuckerdavis merged commit 8e24f71 into main Apr 26, 2026
10 checks passed
@samtuckerdavis samtuckerdavis deleted the operator-cmd/merge branch April 26, 2026 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants