Skip to content

feat(autoreview): add Cursor review engine#21

Open
coygeek wants to merge 3 commits into
openclaw:mainfrom
coygeek:feat/autoreview-cursor-cli
Open

feat(autoreview): add Cursor review engine#21
coygeek wants to merge 3 commits into
openclaw:mainfrom
coygeek:feat/autoreview-cursor-cli

Conversation

@coygeek

@coygeek coygeek commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add Cursor as an autoreview engine with cursor-agent ask-mode execution
  • fail closed on workspace rules and local MCP config unless explicitly trusted
  • add Cursor parsing tests and wire them into CI validation

Validation

  • scripts/validate-skills
  • ruby -c scripts/install-skills
  • ruby -c scripts/validate-skills
  • ruby - <<'RUBY' ... load \"scripts/install-skills.test.rb\" ... RUBY (local Ruby 2.6 compatibility shim for assert_path_exists; CI uses Ruby 3.3)
  • bash -n skills/autoreview/scripts/test-review-harness
  • python3 -m py_compile skills/autoreview/scripts/autoreview skills/autoreview/scripts/test-review-harness.py skills/autoreview/scripts/autoreview_test.py
  • python3 -m unittest skills/autoreview/scripts/autoreview_test.py
  • node --check skills/agent-transcript/scripts/agent-transcript
  • node --test skills/agent-transcript/scripts/agent-transcript.test.mjs skills/session-viewer/scripts/session-viewer.test.ts
  • python3 skills/autoreview/scripts/autoreview --mode branch --base origin/main --engine cursor --model auto --cursor-allow-workspace-instructions --stream-engine-output

I will add a follow-up PR comment with live proof logs from the local Cursor runs.

coygeek and others added 3 commits June 10, 2026 14:31
Add a Cursor-backed autoreview path with fail-closed workspace and MCP gates, stream/json parsing, and deterministic parser tests. This lets trusted repos use Cursor for closeout review while preserving structured validation and live proof workflows.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Limit the new Cursor extraction logic to explicit report candidates, keep assistant-message fallback behind terminal-result precedence, and keep reviewer-all opt-in behavior compatible with existing panels.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Prefer terminal stream payloads over assistant drafts, raise on malformed terminal results so Cursor retries can fire, and cover the fallback behavior with a regression test.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@coygeek

coygeek commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Live proof from the final local Cursor-backed implementation.

Validators

Command set:

  • scripts/validate-skills
  • ruby -c scripts/install-skills
  • ruby -c scripts/validate-skills
  • ruby - <<'RUBY' ... load "scripts/install-skills.test.rb" ... RUBY (local Ruby 2.6 compatibility shim for assert_path_exists; CI uses Ruby 3.3)
  • bash -n skills/autoreview/scripts/test-review-harness
  • python3 -m py_compile skills/autoreview/scripts/autoreview skills/autoreview/scripts/test-review-harness.py skills/autoreview/scripts/autoreview_test.py
  • python3 -m unittest skills/autoreview/scripts/autoreview_test.py
  • node --check skills/agent-transcript/scripts/agent-transcript
  • node --test skills/agent-transcript/scripts/agent-transcript.test.mjs skills/session-viewer/scripts/session-viewer.test.ts
validated 5 skills
Syntax OK
Syntax OK
2 runs, 12 assertions, 0 failures, 0 errors, 0 skips
Ran 9 tests in 0.001s
OK
ℹ tests 30
ℹ pass 30
ℹ fail 0

Current branch reviewed with Cursor

Command:

  • python3 skills/autoreview/scripts/autoreview --mode branch --base origin/main --engine cursor --model auto --cursor-allow-workspace-instructions --stream-engine-output
autoreview target: branch
branch: feat/autoreview-cursor-cli
engine: cursor
model: auto
ref: origin/main
cursor metadata: session_id=a1d8084d-e9a9-478e-97b0-835c816cb052 request_id=f892992e-2c2e-4774-8c97-f6576159ad17
autoreview clean: no accepted/actionable findings reported
overall: patch is correct (0.86)

Malicious fixture, Cursor finds real issues

Command:

  • python3 "$AUTOREVIEW" --mode local --engine cursor --model auto --prompt "...real security bugs..."
autoreview findings: 4
[P0] Shell command injection in deleteUpload via unsanitized name
[P0] Path traversal in uploadPath after removing slash sanitization
[P0] Arbitrary directory deletion via deleteUpload path traversal
[P0] Password exposed in publicUser API response
overall: patch is incorrect (1)
cursor metadata: session_id=c7e2d4e5-20e8-4064-b665-0d62a0aeedd5 request_id=b6dcaf9b-3b7f-47ad-a477-1fae229313b0

Benign fixture, Cursor stays clean

Command:

  • python3 "$AUTOREVIEW" --mode local --engine cursor --model auto --cursor-allow-workspace-instructions --stream-engine-output --prompt "...intentionally safe..."
cursor isolation warning: workspace .cursor/rules will be loaded; workspace AGENTS.md will be loaded.
autoreview clean: no accepted/actionable findings reported
overall: patch is correct (0.92)
cursor metadata: session_id=76b8bd65-70a8-4751-9a64-fd7a4ab69037 request_id=42f91f86-b020-42b7-bb12-eb08fc32e730

@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. labels Jun 18, 2026
@clawsweeper

clawsweeper Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codex review: found issues before merge. Reviewed June 18, 2026, 9:55 AM ET / 13:55 UTC.

Summary
The PR adds Cursor as an optional autoreview engine with cursor-agent execution, trust gates for workspace instructions/MCP config, parsing tests, harness wiring, and CI validation updates.

Reproducibility: not applicable. This PR adds a new optional review engine rather than fixing a reproducible bug. The contributor did provide live terminal output showing Cursor-backed branch review and fixture behavior after the implementation.

Review metrics: 2 noteworthy metrics.

  • PR size: 6 files changed, 537 additions, 26 deletions. The feature is bounded but touches review execution, docs, tests, harnesses, and CI.
  • Current-main drift: 15 commits after PR base. The branch predates multiple autoreview engine and isolation changes, which makes conflict resolution behaviorally important.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🐚 platinum hermit
Patch quality: 🦪 silver shellfish
Result: blocked by patch quality or review findings.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Rebase onto current main while preserving Pi/OpenCode support and newer hardening.
  • Run Cursor through the current sanitized engine environment and refresh validation after that rebase.

Risk before merge

  • [P1] The PR cannot merge cleanly against current main and must be rebased without dropping newer Pi/OpenCode support, model-default behavior, heartbeat metrics, or bundle/input hardening.
  • [P1] The Cursor path introduces another trusted local agent execution boundary; the current PR head runs cursor-agent without the newer safe_engine_env pattern used by current main's external engines.
  • [P1] Maintainers still need to decide whether Cursor's workspace instruction and MCP trust model belongs in this shared public skill.

Maintainer options:

  1. Rebase and harden Cursor execution (recommended)
    Resolve the conflicts by layering Cursor onto current main and run cursor-agent with the sanitized engine environment used by the other external reviewers.
  2. Accept the Cursor trust model
    Maintainers can intentionally accept Cursor as an optional trusted review engine after confirming the workspace-instruction and MCP approval gates match repository policy.
  3. Pause until engine policy is settled
    If Cursor's sandbox/rule-loading behavior is not an accepted shared-skill boundary yet, pause this PR rather than landing another engine path.

Next step before merge

  • [P2] This needs maintainer review for the Cursor trust model and a conflict-aware rebase; the concrete env-isolation defect should be fixed before merge.

Security
Needs attention: The diff adds a new external agent execution path, and the PR head does not yet use the sanitized engine environment pattern now present on current main.

Review findings

  • [P1] Run Cursor with the sanitized engine environment — skills/autoreview/scripts/autoreview:732-739
Review details

Best possible solution:

Rebase Cursor support onto current main, run cursor-agent through the same sanitized engine environment as the other external reviewers, preserve the explicit fail-closed trust gates, and then let maintainers decide whether to accept Cursor as an optional engine.

Do we have a high-confidence way to reproduce the issue?

Not applicable; this PR adds a new optional review engine rather than fixing a reproducible bug. The contributor did provide live terminal output showing Cursor-backed branch review and fixture behavior after the implementation.

Is this the best way to solve the issue?

No, not as submitted. The direction is plausible, but the branch must be rebased onto current main and the Cursor execution path should adopt the current sanitized engine environment before maintainers decide on the feature.

Full review comments:

  • [P1] Run Cursor with the sanitized engine environment — skills/autoreview/scripts/autoreview:732-739
    Current main now runs external reviewer CLIs through safe_engine_env(...) so repo-local PATH/config/env state cannot influence the reviewer process. The new Cursor path invokes cursor-agent from the reviewed repo with the ambient environment, so a reviewed workspace can still affect this trusted execution path outside the explicit workspace-instruction/MCP gates. Rebase onto the current helper and pass the same sanitized engine env for Cursor before merging.
    Confidence: 0.82

Overall correctness: patch is incorrect
Overall confidence: 0.84

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against 844dfb22a969.

Label changes

Label changes:

  • add P2: This is a bounded shared-skill feature with useful proof but no urgent regression or release-blocking impact.
  • add merge-risk: 🚨 compatibility: A careless conflict resolution could remove or regress newer autoreview engines and helper contracts added on current main after the PR base.
  • add merge-risk: 🚨 security-boundary: The PR adds a new local agent execution path whose environment, workspace-instruction, and MCP trust boundaries need maintainer-visible review.
  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR comment includes after-fix validator output, a live Cursor-backed branch review, and malicious/benign fixture output with Cursor session/request metadata.
  • add rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🐚 platinum hermit and patch quality is 🦪 silver shellfish.
  • add status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Sufficient (live_output): The PR comment includes after-fix validator output, a live Cursor-backed branch review, and malicious/benign fixture output with Cursor session/request metadata.

Label justifications:

  • P2: This is a bounded shared-skill feature with useful proof but no urgent regression or release-blocking impact.
  • merge-risk: 🚨 compatibility: A careless conflict resolution could remove or regress newer autoreview engines and helper contracts added on current main after the PR base.
  • merge-risk: 🚨 security-boundary: The PR adds a new local agent execution path whose environment, workspace-instruction, and MCP trust boundaries need maintainer-visible review.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🐚 platinum hermit and patch quality is 🦪 silver shellfish.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Sufficient (live_output): The PR comment includes after-fix validator output, a live Cursor-backed branch review, and malicious/benign fixture output with Cursor session/request metadata.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR comment includes after-fix validator output, a live Cursor-backed branch review, and malicious/benign fixture output with Cursor session/request metadata.
Evidence reviewed

Security concerns:

  • [medium] Cursor inherits ambient engine environment — skills/autoreview/scripts/autoreview:732
    run_cursor starts cursor-agent without the current-main safe_engine_env(...) wrapper used by other external review engines, leaving environment and command-resolution boundary hardening incomplete for a trusted reviewer path.
    Confidence: 0.82

What I checked:

  • Repository policy read: AGENTS.md was read fully and applies because this PR changes the shared autoreview skill and helper scripts; it requires canonical skills under skills//SKILL.md and validation with scripts/validate-skills. (AGENTS.md:1, 844dfb22a969)
  • Current main lacks Cursor: Current main's engine list includes codex, claude, droid, copilot, pi, and opencode, but not cursor. (skills/autoreview/scripts/autoreview:22, 844dfb22a969)
  • PR adds Cursor execution path: The PR head adds run_cursor with fail-closed workspace instruction and MCP checks, then invokes cursor-agent through run_with_heartbeat. (skills/autoreview/scripts/autoreview:687, 9e49f219c908)
  • Current main hardened engine env: Current main runs other external review engines with safe_engine_env and newer bundle/input hardening that the Cursor branch predates. (skills/autoreview/scripts/autoreview:1325, 844dfb22a969)
  • PR is conflicting: GitHub reports the PR as CONFLICTING; its merge base is 3446a70, and current main has 15 commits after that base. (9e49f219c908)
  • Contributor proof: The PR comment includes validator output, a live Cursor-backed branch review, and malicious/benign fixture output with Cursor session/request metadata. (9e49f219c908)

Likely related people:

  • coygeek: Authored merged autoreview isolation and OpenCode engine work in the same helper area, and the PR's branch also comes from this contributor. (role: recent area contributor; confidence: high; commits: f94cbd7db456, 720c704bfe92; files: skills/autoreview/scripts/autoreview, skills/autoreview/SKILL.md)
  • steipete: Blame and history show the shared skill and autoreview helper originating from Peter Steinberger with many subsequent updates and conflict-resolution work in adjacent PRs. (role: original and frequent area contributor; confidence: high; commits: f793ead2f692, 834ce50c9ad3, 78daf7e0d271; files: skills/autoreview/scripts/autoreview, skills/autoreview/SKILL.md)
  • Dizesales: Recent current-main hardening changed the review bundle input and safety path that a rebased Cursor engine needs to preserve. (role: recent hardening contributor; confidence: medium; commits: 51f2918af908; files: skills/autoreview/scripts/autoreview)
  • Vincent Koc: Most recent current-main autoreview scope-governor changes touch the same helper and docs that the PR must rebase across. (role: recent adjacent contributor; confidence: medium; commits: 844dfb22a969; files: skills/autoreview/scripts/autoreview, skills/autoreview/SKILL.md)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. P2 Normal priority bug or improvement with limited blast radius. proof: sufficient Contributor real behavior proof is sufficient. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant