diff --git a/review/SKILL.md b/review/SKILL.md
index 4d134d1751..9558a60af9 100644
--- a/review/SKILL.md
+++ b/review/SKILL.md
@@ -815,7 +815,7 @@ You are running the `/review` workflow. Analyze the current branch's diff agains
1. Run `git branch --show-current` to get the current branch.
2. If on the base branch, output: **"Nothing to review — you're on the base branch or have no changes against it."** and stop.
-3. Run `git fetch origin --quiet && git diff origin/ --stat` to check if there's a diff. If no diff, output the same message and stop.
+3. Run `git fetch origin --quiet && DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE" --stat` to check if there's a diff. If no diff, output the same message and stop.
---
@@ -827,7 +827,7 @@ Before reviewing code quality, check: **did they build what was requested — no
Read commit messages (`git log origin/..HEAD --oneline`).
**If no PR exists:** rely on commit messages and TODOS.md for stated intent — this is the common case since /review runs before /ship creates the PR.
2. Identify the **stated intent** — what was this branch supposed to accomplish?
-3. Run `git diff origin/...HEAD --stat` and compare the files changed against the stated intent.
+3. Run `DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE" --stat` and compare the files changed against the stated intent.
4. Evaluate with skepticism (incorporating plan completion results if available from an earlier step or adjacent section):
@@ -934,7 +934,7 @@ Before judging completion, classify HOW each item can be verified. The diff alon
### Cross-Reference Against Diff
-Run `git diff origin/...HEAD` and `git log origin/..HEAD --oneline` to understand what was implemented.
+Run `DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE"` and `git log origin/..HEAD --oneline` to understand what was implemented.
For each extracted plan item, run the verification dispatch from the previous section, then classify:
@@ -1082,7 +1082,14 @@ Fetch the latest base branch to avoid false positives from stale local state:
git fetch origin --quiet
```
-Run `git diff origin/` to get the full diff. This includes both committed and uncommitted changes against the latest base branch.
+Compute the merge base, then diff the working tree against that point:
+
+```bash
+DIFF_BASE=$(git merge-base origin/ HEAD)
+git diff "$DIFF_BASE"
+```
+
+This includes both committed and uncommitted changes while excluding commits that landed on the base branch after this branch was created.
## Step 3.4: Workspace-aware queue status (advisory)
@@ -1218,8 +1225,9 @@ STACK=""
[ -f go.mod ] && STACK="${STACK}go "
[ -f Cargo.toml ] && STACK="${STACK}rust "
echo "STACK: ${STACK:-unknown}"
-DIFF_INS=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
-DIFF_DEL=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
+DIFF_BASE=$(git merge-base origin/ HEAD)
+DIFF_INS=$(git diff "$DIFF_BASE" --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
+DIFF_DEL=$(git diff "$DIFF_BASE" --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
DIFF_LINES=$((DIFF_INS + DIFF_DEL))
echo "DIFF_LINES: $DIFF_LINES"
# Detect test framework for specialist test stub generation
@@ -1293,7 +1301,7 @@ If learnings are found, include them: "Past learnings for this domain: {learning
4. Instructions:
"You are a specialist code reviewer. Read the checklist below, then run
-`git diff origin/` to get the full diff. Apply the checklist against the diff.
+`DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE"` to get the full diff. Apply the checklist against the diff.
For each finding, output a JSON object on its own line:
{\"severity\":\"CRITICAL|INFORMATIONAL\",\"confidence\":N,\"path\":\"file\",\"line\":N,\"category\":\"category\",\"summary\":\"description\",\"fix\":\"recommended fix\",\"fingerprint\":\"path:line:category\",\"specialist\":\"name\"}
@@ -1396,7 +1404,7 @@ The Red Team subagent receives:
Prompt: "You are a red team reviewer. The code has already been reviewed by N specialists
who found the following issues: {merged findings summary}. Your job is to find what they
-MISSED. Read the checklist, run `git diff origin/`, and look for gaps.
+MISSED. Read the checklist, run `DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE"`, and look for gaps.
Output findings as JSON objects (same schema as the specialists). Focus on cross-cutting
concerns, integration boundary issues, and failure modes that specialist checklists
don't cover."
@@ -1568,8 +1576,9 @@ Every diff gets adversarial review from both Claude and Codex. LOC is not a prox
**Detect diff size and tool availability:**
```bash
-DIFF_INS=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
-DIFF_DEL=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
+DIFF_BASE=$(git merge-base origin/ HEAD)
+DIFF_INS=$(git diff "$DIFF_BASE" --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
+DIFF_DEL=$(git diff "$DIFF_BASE" --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
DIFF_TOTAL=$((DIFF_INS + DIFF_DEL))
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
# Legacy opt-out — only gates Codex passes, Claude always runs
@@ -1589,7 +1598,7 @@ If `OLD_CFG` is `disabled`: skip Codex passes only. Claude adversarial subagent
Dispatch via the Agent tool. The subagent has fresh context — no checklist bias from the structured review. This genuine independence catches things the primary reviewer is blind to.
Subagent prompt:
-"Read the diff for this branch with `git diff origin/`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment). After listing findings, end your output with ONE line in the canonical format `Recommendation: because ` — examples: `Recommendation: Fix the unbounded retry at queue.ts:78 because it'll DoS the worker pool under sustained 429s` or `Recommendation: Ship as-is because the strongest finding is a theoretical race that requires conditions we can't trigger in production`. The reason must point to a specific finding (or no-fix rationale). Generic reasons like 'because it's safer' do not qualify."
+"Read the diff for this branch with `DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE"`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment). After listing findings, end your output with ONE line in the canonical format `Recommendation: because ` — examples: `Recommendation: Fix the unbounded retry at queue.ts:78 because it'll DoS the worker pool under sustained 429s` or `Recommendation: Ship as-is because the strongest finding is a theoretical race that requires conditions we can't trigger in production`. The reason must point to a specific finding (or no-fix rationale). Generic reasons like 'because it's safer' do not qualify."
Present findings under an `ADVERSARIAL REVIEW (Claude subagent):` header. **FIXABLE findings** flow into the same Fix-First pipeline as the structured review. **INVESTIGATE findings** are presented as informational.
@@ -1604,7 +1613,7 @@ If Codex is available AND `OLD_CFG` is NOT `disabled`:
```bash
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
-codex exec "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems. End your output with ONE line in the canonical format `Recommendation: because `. Generic reasons like 'because it's safer' do not qualify; the reason must point to a specific finding or no-fix rationale." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR_ADV"
+codex exec "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the changes on this branch against the base branch. Run DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE" to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems. End your output with ONE line in the canonical format `Recommendation: because `. Generic reasons like 'because it's safer' do not qualify; the reason must point to a specific finding or no-fix rationale." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR_ADV"
```
Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. After the command completes, read stderr:
diff --git a/review/SKILL.md.tmpl b/review/SKILL.md.tmpl
index fada691125..ae480da3d2 100644
--- a/review/SKILL.md.tmpl
+++ b/review/SKILL.md.tmpl
@@ -38,7 +38,7 @@ You are running the `/review` workflow. Analyze the current branch's diff agains
1. Run `git branch --show-current` to get the current branch.
2. If on the base branch, output: **"Nothing to review — you're on the base branch or have no changes against it."** and stop.
-3. Run `git fetch origin --quiet && git diff origin/ --stat` to check if there's a diff. If no diff, output the same message and stop.
+3. Run `git fetch origin --quiet && DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE" --stat` to check if there's a diff. If no diff, output the same message and stop.
---
@@ -72,7 +72,14 @@ Fetch the latest base branch to avoid false positives from stale local state:
git fetch origin --quiet
```
-Run `git diff origin/` to get the full diff. This includes both committed and uncommitted changes against the latest base branch.
+Compute the merge base, then diff the working tree against that point:
+
+```bash
+DIFF_BASE=$(git merge-base origin/ HEAD)
+git diff "$DIFF_BASE"
+```
+
+This includes both committed and uncommitted changes while excluding commits that landed on the base branch after this branch was created.
## Step 3.4: Workspace-aware queue status (advisory)
diff --git a/scripts/resolvers/review-army.ts b/scripts/resolvers/review-army.ts
index 516ce3c8d4..5c8766e302 100644
--- a/scripts/resolvers/review-army.ts
+++ b/scripts/resolvers/review-army.ts
@@ -30,8 +30,9 @@ STACK=""
[ -f go.mod ] && STACK="\${STACK}go "
[ -f Cargo.toml ] && STACK="\${STACK}rust "
echo "STACK: \${STACK:-unknown}"
-DIFF_INS=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
-DIFF_DEL=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
+DIFF_BASE=$(git merge-base origin/ HEAD)
+DIFF_INS=$(git diff "$DIFF_BASE" --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
+DIFF_DEL=$(git diff "$DIFF_BASE" --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
DIFF_LINES=$((DIFF_INS + DIFF_DEL))
echo "DIFF_LINES: $DIFF_LINES"
# Detect test framework for specialist test stub generation
@@ -105,7 +106,7 @@ If learnings are found, include them: "Past learnings for this domain: {learning
4. Instructions:
"You are a specialist code reviewer. Read the checklist below, then run
-\`git diff origin/\` to get the full diff. Apply the checklist against the diff.
+\`DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE"\` to get the full diff. Apply the checklist against the diff.
For each finding, output a JSON object on its own line:
{\\"severity\\":\\"CRITICAL|INFORMATIONAL\\",\\"confidence\\":N,\\"path\\":\\"file\\",\\"line\\":N,\\"category\\":\\"category\\",\\"summary\\":\\"description\\",\\"fix\\":\\"recommended fix\\",\\"fingerprint\\":\\"path:line:category\\",\\"specialist\\":\\"name\\"}
@@ -217,7 +218,7 @@ The Red Team subagent receives:
Prompt: "You are a red team reviewer. The code has already been reviewed by N specialists
who found the following issues: {merged findings summary}. Your job is to find what they
-MISSED. Read the checklist, run \`git diff origin/\`, and look for gaps.
+MISSED. Read the checklist, run \`DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE"\`, and look for gaps.
Output findings as JSON objects (same schema as the specialists). Focus on cross-cutting
concerns, integration boundary issues, and failure modes that specialist checklists
don't cover."
diff --git a/scripts/resolvers/review.ts b/scripts/resolvers/review.ts
index 263767d699..b39f4088e5 100644
--- a/scripts/resolvers/review.ts
+++ b/scripts/resolvers/review.ts
@@ -396,7 +396,7 @@ Before reviewing code quality, check: **did they build what was requested — no
Read commit messages (\`git log origin/..HEAD --oneline\`).
**If no PR exists:** rely on commit messages and TODOS.md for stated intent — this is the common case since /review runs before /ship creates the PR.
2. Identify the **stated intent** — what was this branch supposed to accomplish?
-3. Run \`git diff origin/...HEAD --stat\` and compare the files changed against the stated intent.
+3. Run \`DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE" --stat\` and compare the files changed against the stated intent.
4. Evaluate with skepticism (incorporating plan completion results if available from an earlier step or adjacent section):
@@ -440,8 +440,9 @@ Every diff gets adversarial review from both Claude and Codex. LOC is not a prox
**Detect diff size and tool availability:**
\`\`\`bash
-DIFF_INS=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
-DIFF_DEL=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
+DIFF_BASE=$(git merge-base origin/ HEAD)
+DIFF_INS=$(git diff "$DIFF_BASE" --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
+DIFF_DEL=$(git diff "$DIFF_BASE" --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
DIFF_TOTAL=$((DIFF_INS + DIFF_DEL))
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
# Legacy opt-out — only gates Codex passes, Claude always runs
@@ -461,7 +462,7 @@ If \`OLD_CFG\` is \`disabled\`: skip Codex passes only. Claude adversarial subag
Dispatch via the Agent tool. The subagent has fresh context — no checklist bias from the structured review. This genuine independence catches things the primary reviewer is blind to.
Subagent prompt:
-"Read the diff for this branch with \`git diff origin/\`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment). After listing findings, end your output with ONE line in the canonical format \`Recommendation: because \` — examples: \`Recommendation: Fix the unbounded retry at queue.ts:78 because it'll DoS the worker pool under sustained 429s\` or \`Recommendation: Ship as-is because the strongest finding is a theoretical race that requires conditions we can't trigger in production\`. The reason must point to a specific finding (or no-fix rationale). Generic reasons like 'because it's safer' do not qualify."
+"Read the diff for this branch with \`DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE"\`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment). After listing findings, end your output with ONE line in the canonical format \`Recommendation: because \` — examples: \`Recommendation: Fix the unbounded retry at queue.ts:78 because it'll DoS the worker pool under sustained 429s\` or \`Recommendation: Ship as-is because the strongest finding is a theoretical race that requires conditions we can't trigger in production\`. The reason must point to a specific finding (or no-fix rationale). Generic reasons like 'because it's safer' do not qualify."
Present findings under an \`ADVERSARIAL REVIEW (Claude subagent):\` header. **FIXABLE findings** flow into the same Fix-First pipeline as the structured review. **INVESTIGATE findings** are presented as informational.
@@ -476,7 +477,7 @@ If Codex is available AND \`OLD_CFG\` is NOT \`disabled\`:
\`\`\`bash
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
-codex exec "${CODEX_BOUNDARY}Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems. End your output with ONE line in the canonical format \`Recommendation: because \`. Generic reasons like 'because it's safer' do not qualify; the reason must point to a specific finding or no-fix rationale." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR_ADV"
+codex exec "${CODEX_BOUNDARY}Review the changes on this branch against the base branch. Run DIFF_BASE=\$(git merge-base origin/ HEAD) && git diff \"\$DIFF_BASE\" to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems. End your output with ONE line in the canonical format \`Recommendation: because \`. Generic reasons like 'because it's safer' do not qualify; the reason must point to a specific finding or no-fix rationale." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR_ADV"
\`\`\`
Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. After the command completes, read stderr:
@@ -800,7 +801,7 @@ Before judging completion, classify HOW each item can be verified. The diff alon
sections.push(`
### Cross-Reference Against Diff
-Run \`git diff origin/...HEAD\` and \`git log origin/..HEAD --oneline\` to understand what was implemented.
+Run \`DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE"\` and \`git log origin/..HEAD --oneline\` to understand what was implemented.
For each extracted plan item, run the verification dispatch from the previous section, then classify:
diff --git a/ship/SKILL.md b/ship/SKILL.md
index 25119fb391..b1600ef5a6 100644
--- a/ship/SKILL.md
+++ b/ship/SKILL.md
@@ -1658,7 +1658,7 @@ Before judging completion, classify HOW each item can be verified. The diff alon
### Cross-Reference Against Diff
-Run `git diff origin/...HEAD` and `git log origin/..HEAD --oneline` to understand what was implemented.
+Run `DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE"` and `git log origin/..HEAD --oneline` to understand what was implemented.
For each extracted plan item, run the verification dispatch from the previous section, then classify:
@@ -1862,7 +1862,7 @@ Before reviewing code quality, check: **did they build what was requested — no
Read commit messages (`git log origin/..HEAD --oneline`).
**If no PR exists:** rely on commit messages and TODOS.md for stated intent — this is the common case since /review runs before /ship creates the PR.
2. Identify the **stated intent** — what was this branch supposed to accomplish?
-3. Run `git diff origin/...HEAD --stat` and compare the files changed against the stated intent.
+3. Run `DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE" --stat` and compare the files changed against the stated intent.
4. Evaluate with skepticism (incorporating plan completion results if available from an earlier step or adjacent section):
@@ -1897,7 +1897,14 @@ Review the diff for structural issues that tests don't catch.
1. Read `.claude/skills/review/checklist.md`. If the file cannot be read, **STOP** and report the error.
-2. Run `git diff origin/` to get the full diff (scoped to feature changes against the freshly-fetched base branch).
+2. Compute the merge base and diff the working tree against it:
+
+ ```bash
+ DIFF_BASE=$(git merge-base origin/ HEAD)
+ git diff "$DIFF_BASE"
+ ```
+
+ This keeps uncommitted work in scope while excluding commits that landed on the base branch after this branch was created.
3. Apply the review checklist in two passes:
- **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
@@ -2000,8 +2007,9 @@ STACK=""
[ -f go.mod ] && STACK="${STACK}go "
[ -f Cargo.toml ] && STACK="${STACK}rust "
echo "STACK: ${STACK:-unknown}"
-DIFF_INS=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
-DIFF_DEL=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
+DIFF_BASE=$(git merge-base origin/ HEAD)
+DIFF_INS=$(git diff "$DIFF_BASE" --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
+DIFF_DEL=$(git diff "$DIFF_BASE" --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
DIFF_LINES=$((DIFF_INS + DIFF_DEL))
echo "DIFF_LINES: $DIFF_LINES"
# Detect test framework for specialist test stub generation
@@ -2075,7 +2083,7 @@ If learnings are found, include them: "Past learnings for this domain: {learning
4. Instructions:
"You are a specialist code reviewer. Read the checklist below, then run
-`git diff origin/` to get the full diff. Apply the checklist against the diff.
+`DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE"` to get the full diff. Apply the checklist against the diff.
For each finding, output a JSON object on its own line:
{\"severity\":\"CRITICAL|INFORMATIONAL\",\"confidence\":N,\"path\":\"file\",\"line\":N,\"category\":\"category\",\"summary\":\"description\",\"fix\":\"recommended fix\",\"fingerprint\":\"path:line:category\",\"specialist\":\"name\"}
@@ -2178,7 +2186,7 @@ The Red Team subagent receives:
Prompt: "You are a red team reviewer. The code has already been reviewed by N specialists
who found the following issues: {merged findings summary}. Your job is to find what they
-MISSED. Read the checklist, run `git diff origin/`, and look for gaps.
+MISSED. Read the checklist, run `DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE"`, and look for gaps.
Output findings as JSON objects (same schema as the specialists). Focus on cross-cutting
concerns, integration boundary issues, and failure modes that specialist checklists
don't cover."
@@ -2314,8 +2322,9 @@ Every diff gets adversarial review from both Claude and Codex. LOC is not a prox
**Detect diff size and tool availability:**
```bash
-DIFF_INS=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
-DIFF_DEL=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
+DIFF_BASE=$(git merge-base origin/ HEAD)
+DIFF_INS=$(git diff "$DIFF_BASE" --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
+DIFF_DEL=$(git diff "$DIFF_BASE" --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
DIFF_TOTAL=$((DIFF_INS + DIFF_DEL))
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
# Legacy opt-out — only gates Codex passes, Claude always runs
@@ -2335,7 +2344,7 @@ If `OLD_CFG` is `disabled`: skip Codex passes only. Claude adversarial subagent
Dispatch via the Agent tool. The subagent has fresh context — no checklist bias from the structured review. This genuine independence catches things the primary reviewer is blind to.
Subagent prompt:
-"Read the diff for this branch with `git diff origin/`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment). After listing findings, end your output with ONE line in the canonical format `Recommendation: because ` — examples: `Recommendation: Fix the unbounded retry at queue.ts:78 because it'll DoS the worker pool under sustained 429s` or `Recommendation: Ship as-is because the strongest finding is a theoretical race that requires conditions we can't trigger in production`. The reason must point to a specific finding (or no-fix rationale). Generic reasons like 'because it's safer' do not qualify."
+"Read the diff for this branch with `DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE"`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment). After listing findings, end your output with ONE line in the canonical format `Recommendation: because ` — examples: `Recommendation: Fix the unbounded retry at queue.ts:78 because it'll DoS the worker pool under sustained 429s` or `Recommendation: Ship as-is because the strongest finding is a theoretical race that requires conditions we can't trigger in production`. The reason must point to a specific finding (or no-fix rationale). Generic reasons like 'because it's safer' do not qualify."
Present findings under an `ADVERSARIAL REVIEW (Claude subagent):` header. **FIXABLE findings** flow into the same Fix-First pipeline as the structured review. **INVESTIGATE findings** are presented as informational.
@@ -2350,7 +2359,7 @@ If Codex is available AND `OLD_CFG` is NOT `disabled`:
```bash
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
-codex exec "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems. End your output with ONE line in the canonical format `Recommendation: because `. Generic reasons like 'because it's safer' do not qualify; the reason must point to a specific finding or no-fix rationale." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR_ADV"
+codex exec "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the changes on this branch against the base branch. Run DIFF_BASE=$(git merge-base origin/ HEAD) && git diff "$DIFF_BASE" to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems. End your output with ONE line in the canonical format `Recommendation: because `. Generic reasons like 'because it's safer' do not qualify; the reason must point to a specific finding or no-fix rationale." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR_ADV"
```
Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. After the command completes, read stderr:
diff --git a/ship/SKILL.md.tmpl b/ship/SKILL.md.tmpl
index 5a7c34661d..7e92aa7633 100644
--- a/ship/SKILL.md.tmpl
+++ b/ship/SKILL.md.tmpl
@@ -295,7 +295,14 @@ Review the diff for structural issues that tests don't catch.
1. Read `.claude/skills/review/checklist.md`. If the file cannot be read, **STOP** and report the error.
-2. Run `git diff origin/` to get the full diff (scoped to feature changes against the freshly-fetched base branch).
+2. Compute the merge base and diff the working tree against it:
+
+ ```bash
+ DIFF_BASE=$(git merge-base origin/ HEAD)
+ git diff "$DIFF_BASE"
+ ```
+
+ This keeps uncommitted work in scope while excluding commits that landed on the base branch after this branch was created.
3. Apply the review checklist in two passes:
- **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
diff --git a/test/gen-skill-docs.test.ts b/test/gen-skill-docs.test.ts
index 4bf8abeeeb..9357fbd81d 100644
--- a/test/gen-skill-docs.test.ts
+++ b/test/gen-skill-docs.test.ts
@@ -1735,6 +1735,19 @@ describe('Codex generation (--host codex)', () => {
expect(reviewContent).not.toContain('CODEX_REVIEWS');
});
+ test('/review and /ship use merge-base diffs for pre-landing review scope', () => {
+ const reviewContent = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
+ const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+
+ expect(reviewContent).toContain('DIFF_BASE=$(git merge-base origin/ HEAD)');
+ expect(reviewContent).toContain('git diff "$DIFF_BASE"');
+ expect(reviewContent).toContain('excluding commits that landed on the base branch after this branch was created');
+
+ expect(shipContent).toContain('DIFF_BASE=$(git merge-base origin/ HEAD)');
+ expect(shipContent).toContain('git diff "$DIFF_BASE"');
+ expect(shipContent).toContain('excluding commits that landed on the base branch after this branch was created');
+ });
+
test('--host codex --dry-run freshness', () => {
const result = Bun.spawnSync(['bun', 'run', 'scripts/gen-skill-docs.ts', '--host', 'codex', '--dry-run'], {
cwd: ROOT,