Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/workflows/validate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,9 @@ jobs:
- name: Check shell scripts
run: bash -n skills/autoreview/scripts/test-review-harness
- name: Check Python scripts
run: python3 -m py_compile skills/autoreview/scripts/autoreview skills/autoreview/scripts/test-review-harness.py
run: python3 -m py_compile skills/autoreview/scripts/autoreview skills/autoreview/scripts/test-review-harness.py skills/autoreview/scripts/autoreview_test.py
- name: Run Python tests
run: python3 -m unittest skills/autoreview/scripts/autoreview_test.py
- name: Check Node scripts
run: node --check skills/agent-transcript/scripts/agent-transcript
- name: Run Node tests
Expand Down
54 changes: 46 additions & 8 deletions skills/autoreview/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: autoreview
description: "Run a structured code review (Codex default, Claude optional) as a closeout check on a local or PR branch before commit or ship."
description: "Run a structured code review (Codex default, optional Claude, Droid, Copilot, or Cursor) as a closeout check on a local or PR branch before commit or ship."
---

# Auto Review
Expand All @@ -11,7 +11,7 @@ Codex review is the default when no engine is set. It usually delivers the best

Use when:

- user asks for Codex review / Claude review / autoreview / second-model review
- user asks for Codex review / Claude review / Cursor review / autoreview / second-model review
- after non-trivial code edits, before final/commit/ship
- reviewing a local branch or PR branch after fixes

Expand All @@ -29,7 +29,7 @@ Use when:
- For security-audit suppression changes, verify accepted findings remain auditable: suppressed findings stay in structured output, active output keeps an unsuppressible suppression notice, and aggregate findings cannot hide unrelated active risk.
- Never switch or override the requested review engine/model. If the review hits model capacity, retry the same command a few times with the same engine/model.
- Be patient with large bundles. Structured review can take up to 30 minutes while the model call is active, especially with Codex tools or web search.
- Treat heartbeat lines like `review still running: ... elapsed=... pid=...` as healthy progress, not a hang. Let the helper continue while heartbeats are advancing. Pass `--stream-engine-output` when live engine text is useful; Codex and Claude filter tool/file chatter, other engines pass raw output through.
- Treat heartbeat lines like `review still running: ... elapsed=... pid=...` as healthy progress, not a hang. Let the helper continue while heartbeats are advancing. Pass `--stream-engine-output` when live engine text is useful; Codex, Claude, and Cursor filter noisy tool/file chatter, other engines pass raw output through.
- Do not kill a review just because it has been quiet for 2-5 minutes, or because it is still running under the 30-minute window. Inspect the process only after missing multiple expected heartbeats, after 30 minutes, or after an obviously failed subprocess; prefer letting the same helper command finish.
- Tools are useful in review mode. The helper allows read-only inspection tools and web search by default so reviewers can check dependency contracts, upstream docs, and current behavior.
- Security perspective is always included, but it should not cripple legitimate functionality. Report security findings only when the change creates a concrete, actionable risk or removes an important safety check.
Expand Down Expand Up @@ -144,6 +144,10 @@ Run multiple reviewers against one frozen bundle:
"$AUTOREVIEW" --panel
```

`--reviewers all` intentionally excludes Cursor. Cursor review can require
additional trust flags when workspace instructions or local MCP config are
present, so opt into Cursor explicitly.

Set reviewer models and thinking/effort explicitly:

```bash
Expand All @@ -158,7 +162,41 @@ Inline syntax is also supported:

Codex maps thinking to `model_reasoning_effort` and accepts `low`, `medium`,
`high`, or `xhigh`. Claude maps thinking to `--effort` and also accepts `max`.
Engines without a real thinking knob reject `--thinking`.
Cursor supports `--model` but rejects `--thinking`. Engines without a real
thinking knob reject `--thinking`.

## Cursor review

Run Cursor explicitly:

```bash
"$AUTOREVIEW" --engine cursor --model auto
```

Cursor review uses `cursor-agent` in `--mode ask` with `--workspace <repo>`,
`--trust`, and `--sandbox enabled` by default. Authenticate with
`cursor-agent login` or `CURSOR_API_KEY`.

Optional Cursor-specific controls:

- `CURSOR_BIN` or `--cursor-bin` to select the CLI binary
- `AUTOREVIEW_CURSOR_SANDBOX=enabled|disabled` or `--cursor-sandbox ...`
- `AUTOREVIEW_CURSOR_APPROVE_MCPS=1`, `--cursor-approve-mcps`, or `--no-cursor-approve-mcps` for trusted MCP-enabled environments
- `AUTOREVIEW_CURSOR_ALLOW_WORKSPACE_INSTRUCTIONS=1`, `--cursor-allow-workspace-instructions`, or `--no-cursor-allow-workspace-instructions` when the repo contains Cursor rules, `AGENTS.md`, `CLAUDE.md`, or local MCP config

Cursor caveats:

- `--thinking`, `--no-tools`, and `--no-web-search` are not supported
- Cursor CLI currently documents no `--ignore-rules` / safe-mode equivalent for
review runs
- Cursor CLI also does not document a schema flag like Codex/Claude, so the
helper validates Cursor JSON locally and retries malformed structured output a
few times before failing
- by default the helper fails closed when workspace `.cursor/rules`,
`AGENTS.md`, `CLAUDE.md`, or local MCP config would be loaded; opt in with
`--cursor-allow-workspace-instructions` only for trusted repos
- local MCP config also fails closed unless `--cursor-approve-mcps` is set
- keep MCP auto-approval deliberate, even in trusted repos

## Context Efficiency

Expand Down Expand Up @@ -196,15 +234,15 @@ The helper:
- accepts `--mode uncommitted` as an alias for `--mode local`
- otherwise uses current PR base if `gh pr view` works
- otherwise uses `origin/main` for non-main branches
- supports `--engine codex`, `claude`, `droid`, and `copilot`; default is `AUTOREVIEW_ENGINE` or `codex`; Codex should remain the default when nothing is set
- supports `--engine codex`, `claude`, `droid`, `copilot`, and `cursor`; default is `AUTOREVIEW_ENGINE` or `codex`; Codex should remain the default when nothing is set
- resolves bare `git`, `gh`, reviewer, and PowerShell shell commands from absolute `PATH` entries only, never from the reviewed checkout; explicit relative `--*-bin` paths are resolved from the reviewed repository root
- use `--mode commit --commit <ref>` for already-committed work, especially clean `main` after landing
- should be left in `--mode auto` or forced to `--mode branch` for PR/branch work; do not force `--mode local` after committing
- writes only to stdout unless `--output`, `--json-output`, or live streamed engine stderr is set
- supports `--dry-run`, `--parallel-tests`, `--parallel-tests-shell`, `--prompt`, `--prompt-file`, `--dataset`, `--no-tools`, `--no-web-search`, and commit refs
- supports `--stream-engine-output` or `AUTOREVIEW_STREAM_ENGINE_OUTPUT=1` for live engine text while preserving structured validation; Codex and Claude hide tool/file event details, emit compact activity summaries, and report usage at turn completion
- supports opt-in review panels with `--panel` / `--reviewers`, plus per-engine `--model` and `--thinking`
- allows read-only tools and web search by default where the selected CLI supports them; forbids nested review in the prompt; Codex is run through `codex exec` with read-only sandbox and structured output
- supports `--stream-engine-output` or `AUTOREVIEW_STREAM_ENGINE_OUTPUT=1` for live engine text while preserving structured validation; Codex, Claude, and Cursor hide noisy tool/status events, emit compact activity summaries, and report usage at turn completion
- supports opt-in review panels with `--panel` / `--reviewers`, plus per-engine `--model` and `--thinking` where the selected engine supports it
- allows read-only tools and web search by default where the selected CLI supports them; forbids nested review in the prompt; Codex is run through `codex exec` with read-only sandbox and structured output, and Cursor is run through `cursor-agent --mode ask --workspace <repo> --sandbox ...` with session/request metadata echoed for traceability and a fail-closed workspace-instruction gate
- prints `review still running: <engine> elapsed=<seconds>s pid=<pid>` to stderr at long-running intervals while waiting for the selected review engine, unless streamed output or compact Codex activity has been visible recently
- prints `autoreview clean: no accepted/actionable findings reported` when the selected review command exits 0
- exits nonzero when accepted/actionable findings are present
Expand Down
Loading