Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 18 additions & 7 deletions internal/scaffold/fullsend-repo/agents/review.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
name: review
description: >-
Code review specialist. Reviews for correctness, security, intent
alignment, and style.
alignment, style, and documentation currency.
tools: >-
Read, Grep, Glob, Bash
disallowedTools: >-
Expand All @@ -11,6 +11,7 @@ model: opus
skills:
- code-review
- pr-review
- docs-review
---

# Review Agent
Expand All @@ -33,7 +34,7 @@ push commits, or merge PRs — you evaluate and report.

## Identity

You evaluate code changes across six review dimensions:
You evaluate code changes across seven review dimensions:

1. **Correctness** — logic errors, edge cases, test adequacy, test
integrity
Expand All @@ -47,21 +48,31 @@ You evaluate code changes across six review dimensions:
non-rendering Unicode, bidirectional overrides
6. **Style/conventions** — naming, patterns, documentation beyond what
linters catch
7. **Documentation currency** — whether the PR's code changes have
made in-repo documentation stale, incomplete, or misleading

The `code-review` skill defines the full evaluation procedure for each
dimension.
The `code-review` skill defines the evaluation procedure for dimensions
1–6. The `docs-review` skill handles dimension 7 (documentation
currency).

## Skill routing

This agent has two skills. Select based on invocation context:
This agent has three skills. Select based on invocation context:

- **`pr-review`** — the prompt references a PR number, PR URL, or
GitHub PR context. This skill gathers PR metadata, delegates code
evaluation to `code-review`, adds PR-specific checks, and posts a
review via the GitHub API.
evaluation to `code-review` and documentation staleness checks to
`docs-review`, adds PR-specific checks, and posts a review via
the GitHub API.
- **`code-review`** — the prompt is about a local branch diff with
no PR, or another skill is delegating code evaluation. This skill
evaluates the diff and source files directly.
- **`docs-review`** — delegated by `pr-review` after code evaluation
completes. Evaluates whether in-repo documentation has been made
stale by the code changes. Follow the skill's checklist and
two-pass evaluation process completely — do not skip entries or
shortcut the evaluation. Read-only — produces findings but does
not update docs.

When invoked via `--print` for pre-push review, use `code-review`.
When invoked for a GitHub PR, use `pr-review`.
Expand Down
1 change: 1 addition & 0 deletions internal/scaffold/fullsend-repo/harness/review.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ policy: policies/review.yaml
skills:
- skills/pr-review
- skills/code-review
- skills/docs-review

pre_script: scripts/pre-review.sh

Expand Down
178 changes: 178 additions & 0 deletions internal/scaffold/fullsend-repo/skills/docs-review/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
---
name: docs-review
description: >-
Evaluates whether a PR's code changes have made in-repo documentation
stale. Discovers documentation files, matches changed identifiers
against doc content, and produces findings for stale or missing
documentation. Read-only — flags staleness but does not update docs.
Can be delegated to by the pr-review skill.
---

# Docs Review

Code changes can silently invalidate documentation. A renamed function,
a changed API signature, a removed configuration option — each can leave
docs describing behavior that no longer exists. This skill detects that
drift by matching the PR's code changes against in-repo documentation
and flagging docs whose descriptions contradict the new code.

## Context management

This skill involves scanning documentation files across the repository,
which can consume significant context. Dispatch a subagent to carry out
the process below, keeping the main review context free for code-review
and PR-specific checks. The subagent should return a list of findings
(or an empty list if no stale docs were found).

## Process

Follow these steps in order. Do not skip steps.

### 1. Receive the change context

Use the diff and changed file list provided by the invoking skill
(typically `pr-review`). If invoked standalone, obtain the diff:

```bash
DEFAULT_BRANCH=$(git rev-parse --abbrev-ref origin/HEAD | cut -d/ -f2)
git diff $(git merge-base HEAD "$DEFAULT_BRANCH")..HEAD
```

Record the list of files changed in the PR:

```bash
git diff --name-only $(git merge-base HEAD "$DEFAULT_BRANCH")..HEAD
```

If no change can be identified, stop and report — do not guess.

### 2. Build the identifier checklist

Go through **every** changed file in the PR. For each file, extract
identifiers from the modified lines (lines starting with `+` or `-`)
and from diff hunk headers (`@@` lines). Write them down as a numbered
checklist — one entry per changed file, with all identifiers from that
file.

Use the most specific form of each identifier. CLI flag names,
configuration keys, full function names, and type names are good —
they match only relevant docs. Avoid generic short words that would
match hundreds of unrelated files. If a generic term is the only
identifier available for a change, include it, but prefer specific
forms when they exist.

Do not skip files. Do not prioritize some files over others. Every
changed file gets an entry in the checklist. This checklist is your
contract — you will search docs for every identifier on it.

### 3. Discover documentation files

Find all documentation files in the repository. This includes files in
dedicated documentation directories and standalone documentation files
like README.md at any level. Explore the repository structure to
understand where documentation lives — different repos organize docs
differently.
Comment on lines +70 to +74
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this might blow the context window of the agent. At the very top of the skill, WDYT about adding an exhortation to encourage the agent to spin up a sub-agent to carry out this entire skill?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ralphbean - Great suggestion! Added a "Context management" section at the top of the skill encouraging the agent to dispatch a subagent


Search for documentation files (`.md`, `.rst`, `.adoc`) across the
repository.

If the repository has no documentation files, produce zero findings
and exit — there is nothing to check.

### 4. Search docs for every identifier

Write a shell script that takes the identifiers from step 2 and
greps for each one across the documentation files from step 3.
Run the script in a single Bash call:

```bash
for id in "identifier1" "identifier2" "identifier3"; do
matches=$(grep -rl "$id" --include="*.md" --include="*.rst" --include="*.adoc" . 2>/dev/null)
if [ -n "$matches" ]; then
echo "MATCH: $id -> $matches"
else
echo "NO_MATCH: $id"
fi
done
```

Include every identifier from every checklist entry in the `for`
loop. The script handles the searching mechanically — no identifiers
are skipped.

From the script output, collect all matched doc files into a
candidate list. Exclude documentation files that are already modified
in the same PR — those are being actively updated.

### 5. Evaluate every candidate (two passes)

**Pass 1 — Quick scan.** For each candidate doc file from step 4,
view only the lines that matched the grep (use `grep -n` to see them
in context). Based on the matching lines alone, decide whether the
doc might be stale. Record a verdict for every candidate:

```
- path/to/doc.md → possibly stale (describes behavior that changed)
- path/to/other.md → not stale (mentions identifier in passing)
- path/to/another.md → not stale (changelog entry)
...
```

Every candidate must have a verdict. Do not skip candidates.

**Pass 2 — Deep read.** For each candidate marked "possibly stale"
in pass 1, read the full file alongside the relevant section of the
diff. Confirm whether the doc is actually stale and write a specific
remediation describing what should be updated.

When evaluating:

- **Only flag docs whose content is now incorrect.** A doc that
mentions an identifier is not stale if the described behavior is
unchanged. It is stale only if the behavior, signature, or semantics
changed in a way that makes the doc misleading.
- **Do not flag changelog entries or release notes that describe past
releases.** Historical entries are not stale because the code evolved.

### 6. Compile findings

For each confirmed stale doc, record a separate finding. Do not
consolidate multiple stale files into one finding — each stale file
gets its own finding with its own file path:

- **Severity:** `high` if the doc describes behavior that is now
*incorrect* and would mislead a reader (e.g., wrong API signature,
removed config key still documented). `medium` if the doc is
*incomplete* rather than incorrect (e.g., new parameter not
documented, but existing content is still accurate). `low` if the
doc is slightly outdated but not misleading. Never use `critical` —
stale docs do not constitute a security or correctness risk in the
code itself. Do not produce `info`-level findings — if the staleness
is not worth flagging at `low`, do not flag it at all.
- **Category:** a descriptive label for the type of staleness, e.g.,
`stale-doc`, `missing-doc`. Use whatever category best describes the
finding.
- **File:** path to the stale doc file.
- **Line:** line number of the stale reference, if identifiable.
- **Description:** what is stale and why — reference the specific code
change that caused the staleness.
- **Remediation:** what should be updated in the doc.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[moderate] This schema is missing the actionable field that code-review findings include. When pr-review merges findings from both skills, the mismatch may cause inconsistent output.

Deferred to #944.


If no stale docs are found, produce zero findings. Do not generate
findings to justify having run the skill.

## Constraints

The agent definition (`agents/review.md`) is the authoritative list of
prohibitions. This skill does not restate them. If a step in this skill
appears to conflict with the agent definition, the agent definition
wins.

- **Do not flag docs modified in the same PR.** If the PR already
updates a doc file, that file is being actively maintained and should
not appear in findings.
- **Severity is capped at high.** Stale documentation does not
constitute a critical security or correctness risk in the code.
A `high` finding means the doc would actively mislead readers.
- **This skill is read-only.** It produces findings but does not
modify documentation files.
44 changes: 30 additions & 14 deletions internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,17 @@
name: pr-review
description: >-
PR-specific review procedure. Gathers GitHub context, delegates code
evaluation to the code-review skill, adds PR-specific checks, and
writes a structured review result.
evaluation to the code-review skill, delegates documentation
staleness checks to the docs-review skill, adds PR-specific checks,
and writes a structured review result.
---

# PR Review

This skill orchestrates a pull request review by gathering GitHub
context, delegating code evaluation to the `code-review` skill, adding
PR-specific checks, and producing a structured result. In pipeline mode
context, delegating code evaluation to the `code-review` skill,
delegating documentation staleness checks to the `docs-review` skill,
adding PR-specific checks, and producing a structured result. In pipeline mode
(`$FULLSEND_OUTPUT_DIR` set), it writes JSON for the post-script to
post. In interactive mode, it posts directly via `gh pr review`. It
does not evaluate code directly — that is the `code-review` skill's
Expand Down Expand Up @@ -84,10 +86,23 @@ Pass the diff obtained in step 2 and use the PR metadata and linked
issues as additional context for the intent-alignment dimension.

The `code-review` skill produces findings and an outcome. Carry those
forward to steps 4 and 5. Proceed to step 4 regardless of outcome —
it covers PR-specific inputs not examined by code-review.
forward to steps 4, 5, and 6. Proceed to step 4 regardless of outcome.

### 4. PR-specific checks
### 4. Check documentation currency

Invoke the `docs-review` skill to evaluate whether the code changes
in this PR have made any in-repo documentation stale. The docs-review
skill has its own multi-step process (build identifier checklist,
grep for every identifier, two-pass evaluation). Follow that process
completely — do not substitute ad-hoc grep searches.

Merge the docs-review findings into the findings list from step 3.
Documentation staleness findings are capped at `high` severity (never
`critical`), so they contribute to the outcome but do not dominate it.

Proceed to step 5 regardless of outcome.

### 5. PR-specific checks

These checks apply only in the PR context and augment the findings from
step 3.
Expand All @@ -114,10 +129,10 @@ Verify the change scope matches the linked issue's authorization. A PR
labeled "bug fix" that adds new capability is a feature, regardless of
the label. Add a finding if the scope exceeds authorization.

Merge any new findings into the findings list from step 3 and
re-evaluate the overall outcome.
Merge any new findings into the findings list from steps 3 and 4,
and re-evaluate the overall outcome.

### 5. Produce the review result
### 6. Produce the review result

Compose the review comment using this structure:

Expand Down Expand Up @@ -254,13 +269,14 @@ wins.
- **Never approve with unresolved critical or high findings.** If any
critical or high finding exists, the outcome must be
`request-changes`.
- **Never post without completing the `code-review` skill first.**
Partial reviews miss context and produce unreliable verdicts.
- **Never post without completing the `code-review` and `docs-review`
skills first.** Partial reviews miss context and produce unreliable
verdicts.
- **Always include the PR head SHA in the review comment.** The review
is only valid for the SHA evaluated; new pushes require a new review.
- **Report failure rather than posting a partial review.** If you cannot
complete all six dimensions (tool failure, missing context, ambiguous
findings), produce a failure result (see step 5) rather than posting
complete all seven dimensions (tool failure, missing context, ambiguous
findings), produce a failure result (see step 6) rather than posting
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[important] "Seven dimensions" conflates pr-review orchestration steps with code-review's dimension framework. code-review owns dimensions 1–6; docs-review is a separate skill invocation, not a 7th dimension within code-review. Works today, but couples the two skills' internals.

Deferred to #943.

an incomplete result.
- **In pipeline mode, `gh pr review` is reserved for the post-script.**
The sandbox token is read-only. Write JSON to
Expand Down
Loading