diff --git a/internal/scaffold/fullsend-repo/agents/review.md b/internal/scaffold/fullsend-repo/agents/review.md index 7a736e7b3..eed98a711 100644 --- a/internal/scaffold/fullsend-repo/agents/review.md +++ b/internal/scaffold/fullsend-repo/agents/review.md @@ -2,7 +2,7 @@ name: review description: >- Code review specialist. Reviews for correctness, security, intent - alignment, and style. + alignment, style, and documentation currency. tools: >- Read, Grep, Glob, Bash disallowedTools: >- @@ -11,6 +11,7 @@ model: opus skills: - code-review - pr-review + - docs-review --- # Review Agent @@ -33,7 +34,7 @@ push commits, or merge PRs — you evaluate and report. ## Identity -You evaluate code changes across six review dimensions: +You evaluate code changes across seven review dimensions: 1. **Correctness** — logic errors, edge cases, test adequacy, test integrity @@ -47,21 +48,31 @@ You evaluate code changes across six review dimensions: non-rendering Unicode, bidirectional overrides 6. **Style/conventions** — naming, patterns, documentation beyond what linters catch +7. **Documentation currency** — whether the PR's code changes have + made in-repo documentation stale, incomplete, or misleading -The `code-review` skill defines the full evaluation procedure for each -dimension. +The `code-review` skill defines the evaluation procedure for dimensions +1–6. The `docs-review` skill handles dimension 7 (documentation +currency). ## Skill routing -This agent has two skills. Select based on invocation context: +This agent has three skills. Select based on invocation context: - **`pr-review`** — the prompt references a PR number, PR URL, or GitHub PR context. This skill gathers PR metadata, delegates code - evaluation to `code-review`, adds PR-specific checks, and posts a - review via the GitHub API. + evaluation to `code-review` and documentation staleness checks to + `docs-review`, adds PR-specific checks, and posts a review via + the GitHub API. - **`code-review`** — the prompt is about a local branch diff with no PR, or another skill is delegating code evaluation. This skill evaluates the diff and source files directly. +- **`docs-review`** — delegated by `pr-review` after code evaluation + completes. Evaluates whether in-repo documentation has been made + stale by the code changes. Follow the skill's checklist and + two-pass evaluation process completely — do not skip entries or + shortcut the evaluation. Read-only — produces findings but does + not update docs. When invoked via `--print` for pre-push review, use `code-review`. When invoked for a GitHub PR, use `pr-review`. diff --git a/internal/scaffold/fullsend-repo/harness/review.yaml b/internal/scaffold/fullsend-repo/harness/review.yaml index 71ddc4df2..d3c958adf 100644 --- a/internal/scaffold/fullsend-repo/harness/review.yaml +++ b/internal/scaffold/fullsend-repo/harness/review.yaml @@ -6,6 +6,7 @@ policy: policies/review.yaml skills: - skills/pr-review - skills/code-review + - skills/docs-review pre_script: scripts/pre-review.sh diff --git a/internal/scaffold/fullsend-repo/skills/docs-review/SKILL.md b/internal/scaffold/fullsend-repo/skills/docs-review/SKILL.md new file mode 100644 index 000000000..196a1afe7 --- /dev/null +++ b/internal/scaffold/fullsend-repo/skills/docs-review/SKILL.md @@ -0,0 +1,178 @@ +--- +name: docs-review +description: >- + Evaluates whether a PR's code changes have made in-repo documentation + stale. Discovers documentation files, matches changed identifiers + against doc content, and produces findings for stale or missing + documentation. Read-only — flags staleness but does not update docs. + Can be delegated to by the pr-review skill. +--- + +# Docs Review + +Code changes can silently invalidate documentation. A renamed function, +a changed API signature, a removed configuration option — each can leave +docs describing behavior that no longer exists. This skill detects that +drift by matching the PR's code changes against in-repo documentation +and flagging docs whose descriptions contradict the new code. + +## Context management + +This skill involves scanning documentation files across the repository, +which can consume significant context. Dispatch a subagent to carry out +the process below, keeping the main review context free for code-review +and PR-specific checks. The subagent should return a list of findings +(or an empty list if no stale docs were found). + +## Process + +Follow these steps in order. Do not skip steps. + +### 1. Receive the change context + +Use the diff and changed file list provided by the invoking skill +(typically `pr-review`). If invoked standalone, obtain the diff: + +```bash +DEFAULT_BRANCH=$(git rev-parse --abbrev-ref origin/HEAD | cut -d/ -f2) +git diff $(git merge-base HEAD "$DEFAULT_BRANCH")..HEAD +``` + +Record the list of files changed in the PR: + +```bash +git diff --name-only $(git merge-base HEAD "$DEFAULT_BRANCH")..HEAD +``` + +If no change can be identified, stop and report — do not guess. + +### 2. Build the identifier checklist + +Go through **every** changed file in the PR. For each file, extract +identifiers from the modified lines (lines starting with `+` or `-`) +and from diff hunk headers (`@@` lines). Write them down as a numbered +checklist — one entry per changed file, with all identifiers from that +file. + +Use the most specific form of each identifier. CLI flag names, +configuration keys, full function names, and type names are good — +they match only relevant docs. Avoid generic short words that would +match hundreds of unrelated files. If a generic term is the only +identifier available for a change, include it, but prefer specific +forms when they exist. + +Do not skip files. Do not prioritize some files over others. Every +changed file gets an entry in the checklist. This checklist is your +contract — you will search docs for every identifier on it. + +### 3. Discover documentation files + +Find all documentation files in the repository. This includes files in +dedicated documentation directories and standalone documentation files +like README.md at any level. Explore the repository structure to +understand where documentation lives — different repos organize docs +differently. + +Search for documentation files (`.md`, `.rst`, `.adoc`) across the +repository. + +If the repository has no documentation files, produce zero findings +and exit — there is nothing to check. + +### 4. Search docs for every identifier + +Write a shell script that takes the identifiers from step 2 and +greps for each one across the documentation files from step 3. +Run the script in a single Bash call: + +```bash +for id in "identifier1" "identifier2" "identifier3"; do + matches=$(grep -rl "$id" --include="*.md" --include="*.rst" --include="*.adoc" . 2>/dev/null) + if [ -n "$matches" ]; then + echo "MATCH: $id -> $matches" + else + echo "NO_MATCH: $id" + fi +done +``` + +Include every identifier from every checklist entry in the `for` +loop. The script handles the searching mechanically — no identifiers +are skipped. + +From the script output, collect all matched doc files into a +candidate list. Exclude documentation files that are already modified +in the same PR — those are being actively updated. + +### 5. Evaluate every candidate (two passes) + +**Pass 1 — Quick scan.** For each candidate doc file from step 4, +view only the lines that matched the grep (use `grep -n` to see them +in context). Based on the matching lines alone, decide whether the +doc might be stale. Record a verdict for every candidate: + +``` +- path/to/doc.md → possibly stale (describes behavior that changed) +- path/to/other.md → not stale (mentions identifier in passing) +- path/to/another.md → not stale (changelog entry) +... +``` + +Every candidate must have a verdict. Do not skip candidates. + +**Pass 2 — Deep read.** For each candidate marked "possibly stale" +in pass 1, read the full file alongside the relevant section of the +diff. Confirm whether the doc is actually stale and write a specific +remediation describing what should be updated. + +When evaluating: + +- **Only flag docs whose content is now incorrect.** A doc that + mentions an identifier is not stale if the described behavior is + unchanged. It is stale only if the behavior, signature, or semantics + changed in a way that makes the doc misleading. +- **Do not flag changelog entries or release notes that describe past + releases.** Historical entries are not stale because the code evolved. + +### 6. Compile findings + +For each confirmed stale doc, record a separate finding. Do not +consolidate multiple stale files into one finding — each stale file +gets its own finding with its own file path: + +- **Severity:** `high` if the doc describes behavior that is now + *incorrect* and would mislead a reader (e.g., wrong API signature, + removed config key still documented). `medium` if the doc is + *incomplete* rather than incorrect (e.g., new parameter not + documented, but existing content is still accurate). `low` if the + doc is slightly outdated but not misleading. Never use `critical` — + stale docs do not constitute a security or correctness risk in the + code itself. Do not produce `info`-level findings — if the staleness + is not worth flagging at `low`, do not flag it at all. +- **Category:** a descriptive label for the type of staleness, e.g., + `stale-doc`, `missing-doc`. Use whatever category best describes the + finding. +- **File:** path to the stale doc file. +- **Line:** line number of the stale reference, if identifiable. +- **Description:** what is stale and why — reference the specific code + change that caused the staleness. +- **Remediation:** what should be updated in the doc. + +If no stale docs are found, produce zero findings. Do not generate +findings to justify having run the skill. + +## Constraints + +The agent definition (`agents/review.md`) is the authoritative list of +prohibitions. This skill does not restate them. If a step in this skill +appears to conflict with the agent definition, the agent definition +wins. + +- **Do not flag docs modified in the same PR.** If the PR already + updates a doc file, that file is being actively maintained and should + not appear in findings. +- **Severity is capped at high.** Stale documentation does not + constitute a critical security or correctness risk in the code. + A `high` finding means the doc would actively mislead readers. +- **This skill is read-only.** It produces findings but does not + modify documentation files. diff --git a/internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md b/internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md index d2a8a5942..9920a1b1a 100644 --- a/internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md +++ b/internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md @@ -2,15 +2,17 @@ name: pr-review description: >- PR-specific review procedure. Gathers GitHub context, delegates code - evaluation to the code-review skill, adds PR-specific checks, and - writes a structured review result. + evaluation to the code-review skill, delegates documentation + staleness checks to the docs-review skill, adds PR-specific checks, + and writes a structured review result. --- # PR Review This skill orchestrates a pull request review by gathering GitHub -context, delegating code evaluation to the `code-review` skill, adding -PR-specific checks, and producing a structured result. In pipeline mode +context, delegating code evaluation to the `code-review` skill, +delegating documentation staleness checks to the `docs-review` skill, +adding PR-specific checks, and producing a structured result. In pipeline mode (`$FULLSEND_OUTPUT_DIR` set), it writes JSON for the post-script to post. In interactive mode, it posts directly via `gh pr review`. It does not evaluate code directly — that is the `code-review` skill's @@ -84,10 +86,23 @@ Pass the diff obtained in step 2 and use the PR metadata and linked issues as additional context for the intent-alignment dimension. The `code-review` skill produces findings and an outcome. Carry those -forward to steps 4 and 5. Proceed to step 4 regardless of outcome — -it covers PR-specific inputs not examined by code-review. +forward to steps 4, 5, and 6. Proceed to step 4 regardless of outcome. -### 4. PR-specific checks +### 4. Check documentation currency + +Invoke the `docs-review` skill to evaluate whether the code changes +in this PR have made any in-repo documentation stale. The docs-review +skill has its own multi-step process (build identifier checklist, +grep for every identifier, two-pass evaluation). Follow that process +completely — do not substitute ad-hoc grep searches. + +Merge the docs-review findings into the findings list from step 3. +Documentation staleness findings are capped at `high` severity (never +`critical`), so they contribute to the outcome but do not dominate it. + +Proceed to step 5 regardless of outcome. + +### 5. PR-specific checks These checks apply only in the PR context and augment the findings from step 3. @@ -114,10 +129,10 @@ Verify the change scope matches the linked issue's authorization. A PR labeled "bug fix" that adds new capability is a feature, regardless of the label. Add a finding if the scope exceeds authorization. -Merge any new findings into the findings list from step 3 and -re-evaluate the overall outcome. +Merge any new findings into the findings list from steps 3 and 4, +and re-evaluate the overall outcome. -### 5. Produce the review result +### 6. Produce the review result Compose the review comment using this structure: @@ -254,13 +269,14 @@ wins. - **Never approve with unresolved critical or high findings.** If any critical or high finding exists, the outcome must be `request-changes`. -- **Never post without completing the `code-review` skill first.** - Partial reviews miss context and produce unreliable verdicts. +- **Never post without completing the `code-review` and `docs-review` + skills first.** Partial reviews miss context and produce unreliable + verdicts. - **Always include the PR head SHA in the review comment.** The review is only valid for the SHA evaluated; new pushes require a new review. - **Report failure rather than posting a partial review.** If you cannot - complete all six dimensions (tool failure, missing context, ambiguous - findings), produce a failure result (see step 5) rather than posting + complete all seven dimensions (tool failure, missing context, ambiguous + findings), produce a failure result (see step 6) rather than posting an incomplete result. - **In pipeline mode, `gh pr review` is reserved for the post-script.** The sandbox token is read-only. Write JSON to