diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index 6ce73d00..f5874d36 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -24,7 +24,7 @@ { "name": "bitwarden-code-review", "source": "./plugins/bitwarden-code-review", - "version": "1.11.0", + "version": "1.12.0", "description": "Comprehensive code review system with organization-wide standards." }, { diff --git a/README.md b/README.md index bfc8c8f6..71971007 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ A curated collection of plugins for AI-assisted development at Bitwarden. Enable | [bitwarden-tech-lead](plugins/bitwarden-tech-lead/) | 2.3.1 | Tech lead for technical planning, architecture coherence, and surfacing patterns to Technical Strategy Ideas | | [bitwarden-shepherd](plugins/bitwarden-shepherd/) | 1.0.0 | Champion of a technical strategy — shepherds a TSI through evaluation into the funnel, then through to adoption | | [bitwarden-atlassian-tools](plugins/bitwarden-atlassian-tools/) | 2.2.7 | Read-only Atlassian access via MCP server with deep Jira issue research skill | -| [bitwarden-code-review](plugins/bitwarden-code-review/) | 1.11.0 | Autonomous code review agent following Bitwarden engineering standards with GitHub integration | +| [bitwarden-code-review](plugins/bitwarden-code-review/) | 1.12.0 | Autonomous code review agent following Bitwarden engineering standards with GitHub integration | | [bitwarden-delivery-tools](plugins/bitwarden-delivery-tools/) | 1.5.0 | Delivery lifecycle skills: initiative funnel navigation, work transitions, tech breakdowns and cross-team signoffs, commits, PRs, preflight, labeling | | [bitwarden-designer](plugins/bitwarden-designer/) | 0.1.0 | Product designer persona: Code of Conduct and 30/60/90 critique, critique facilitation; dispatches into bitwarden-design-tools | | [bitwarden-design-tools](plugins/bitwarden-design-tools/) | 0.1.0 | Design toolkit: content style guide, Figma Dev Mode MCP, Bitwarden brand application, handoff prep, Design System governance, Product and Design Jira | diff --git a/plugins/bitwarden-code-review/.claude-plugin/plugin.json b/plugins/bitwarden-code-review/.claude-plugin/plugin.json index 51349959..212a5b57 100644 --- a/plugins/bitwarden-code-review/.claude-plugin/plugin.json +++ b/plugins/bitwarden-code-review/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "bitwarden-code-review", - "version": "1.11.0", + "version": "1.12.0", "description": "Comprehensive code review system with organization-wide standards.", "author": { "name": "Bitwarden", diff --git a/plugins/bitwarden-code-review/CHANGELOG.md b/plugins/bitwarden-code-review/CHANGELOG.md index 0e9c590c..dbd84627 100644 --- a/plugins/bitwarden-code-review/CHANGELOG.md +++ b/plugins/bitwarden-code-review/CHANGELOG.md @@ -5,6 +5,17 @@ All notable changes to the Bitwarden Code Review Plugin will be documented in th The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [1.12.0] - 2026-06-18 + +### Changed + +- `performing-multi-agent-code-review`: per-stage model flags (`--model-analysis`, `--model-security`, `--model-validation`, `--model-audit`) with a security floor rule; subagents inherit the session model instead of forcing Opus; severity audit defaults to Sonnet. +- `performing-multi-agent-code-review`: updated the architecture subagent to be a general agent type with a stronger prompt. Reduce complexity by not requiring engineers to install plugins they don't need. Also found minimal to zero actual benefit to using the tech-lead agent. +- `performing-multi-agent-code-review`: `{model}` in file names and report headers is the resolved model nickname; `-mixed` suffix when stage flags differ +- `performing-multi-agent-code-review`: enhanced the reference documents to provide example shapes of the DTOs that pass data between the subagents and the orchestration agent. +- `performing-multi-agent-code-review`: removed duplicate and unnecessarily verbose instructions +- Improved the README.md to better describe the purpose and usage of the multi-agent review. + ## [1.11.0] - 2026-05-12 ### Added diff --git a/plugins/bitwarden-code-review/README.md b/plugins/bitwarden-code-review/README.md index 64b77d20..3a8d61b2 100644 --- a/plugins/bitwarden-code-review/README.md +++ b/plugins/bitwarden-code-review/README.md @@ -1,11 +1,13 @@ # Bitwarden Code Review Plugin -Comprehensive AI-powered code review agent following Bitwarden engineering standards. +AI-powered code review for Bitwarden — an autonomous review agent for everyday PRs, plus a rigorous multi-agent pipeline for the changes that warrant a deeper look. ## Overview This plugin provides an autonomous code review agent that conducts thorough, professional code reviews following Bitwarden's organizational standards. The agent focuses on security, correctness, and high-value feedback while maintaining a high signal-to-noise ratio. +It offers two complementary lenses. The autonomous `bitwarden-code-reviewer` agent reviews a pull request the way a human reviewer would and posts inline comments to GitHub. The `performing-multi-agent-code-review` skill takes a different approach — it has Claude evaluate code _as Claude_ across a pipeline of specialized sub-agents, trading human-style commentary for depth and high-signal findings written to a local report. + ## Features - **Autonomous Review Agent**: Single agent handles all code review tasks without manual invocation @@ -14,6 +16,7 @@ This plugin provides an autonomous code review agent that conducts thorough, pro - **Security-First Approach**: Prioritizes security vulnerabilities, data exposure, and authentication issues - **Structured Thinking**: Uses explicit reasoning blocks to improve review quality and consistency - **Confidence Scoring**: Pre-filters findings with a 0-100 confidence score (≥75 threshold) before validation to reduce false positives +- **Multi-Agent Review Pipeline**: A separate `performing-multi-agent-code-review` skill runs six specialized sub-agents — architecture compliance, code quality, bug analysis, security & logic, validation, and severity audit — for depth on complex changes ## Skills @@ -39,7 +42,7 @@ See [`classifying-review-findings`](./skills/classifying-review-findings/SKILL.m ### Directory Structure -``` +```bash bitwarden-code-review/ ├── .claude/ │ └── settings.json # Security boundaries @@ -84,6 +87,50 @@ The agent is automatically invoked by Claude when: Use the bitwarden-code-reviewer agent to review this PR ``` +### Multi-agent code review skill + +The `performing-multi-agent-code-review` skill is built for complex changes where one reviewer — human or AI — can't hold every concern in view at once. Instead of a single pass, it puts a team of specialized agents on the same diff, each reviewing from its own perspective — architecture and pattern compliance, code quality, bugs, and security & logic — then validates and severity-audits everything they surface. It started from [Anthropic's `code-review` command](https://github.com/anthropics/claude-code/blob/main/plugins/code-review/commands/code-review.md) and was rebuilt for the control that command lacks: it wires in our own `bitwarden-security-engineer` plugin, runs on any model you choose, reviews **draft or published** PRs — plus local changes, branch comparisons, and commit ranges — and writes its report to a local file instead of posting to GitHub. + +It's a deliberately different lens from the `bitwarden-code-reviewer` agent. That agent reviews the way a human reviewer would; this skill has **Claude evaluate code as Claude** — optimized for signal, not parity. It spotlights blockers, real bugs, and known-bad patterns, and stays out of the nit-picking lane. Each potential finding is scored 0–100, and only those clearing an 80-confidence bar are raised; every one it keeps is cited with file, line, and the agent that caught it. What clears the bar still gets challenged — the validation and severity-audit agents can overturn a finding — but an overturned finding is never dropped: it moves into a collapsed **Reviewed and Dismissed** section, tagged with its original severity, original confidence, and the reason it was set aside. That section is deliberate — a human sees everything the first pass surfaced and can judge when a dismissal was wrong and the original call was right; without it, that signal is lost. + +Agents start cold — a sub-agent inherits none of the main session's context — so the skill briefs every one of them explicitly. Each receives Bitwarden's zero-knowledge invariant and the P01–P06 threat-model directive verbatim. Feature context — the PR's intent, the ticket, the product framing — is handed out deliberately: the architecture and security agents get the full "why" so they can reason adversarially from intent, while the quality and bug agents see only the diff, so their first read stays unbiased. + +The `performing-multi-agent-code-review` skill slots cleanly into a [Claude Code agent-teams](https://code.claude.com/docs/en/agent-teams) loop — run it at the end of a coding session, address what it finds, refactor — or as a depth-of-review gate on a draft PR before you publish. + +#### Examples + +Invoke the skill explicitly with the slash command, or with natural language — it triggers on phrasing like "thorough", "deep", "multi-pass", or "multi-agent" review, and on commit-range framing like "review the last week of commits". With no model flags, the review runs on your session's model, with the severity audit defaulting to sonnet. + +**1. Local changes, all defaults.** The simplest run — review your uncommitted work before you commit: + +> Perform a thorough multi-agent code review of my local changes. + +**2. A specific pull request.** Pass a PR number or URL; it reviews draft and published PRs alike: + +```markdown +/bitwarden-code-review:performing-multi-agent-code-review https://github.com/bitwarden/ios/pull/1234567 --model opus +``` + +**3. Changes over a period of time.** Commit-range mode reviews the cumulative diff across a time window, commit count, or explicit ref pair. Run it from inside the target repo; it confirms the resolved range with you before spending tokens: + +```markdown +Review the last week of commits using /bitwarden-code-review:performing-multi-agent-code-review --model opus +``` + +**4. Full control, per stage.** Tune each stage independently — opus for analysis and validation, opus for security, sonnet for the audit — and send the report to a specific directory: + +```bash +/bitwarden-code-review:performing-multi-agent-code-review \ + https://github.com/bitwarden/gh-actions/pull/604 \ + --model-analysis opus \ + --model-security opus \ + --model-validation opus \ + --model-audit sonnet \ + --output-dir ./reviews +``` + +A **security floor** keeps `--model-security` from ever dropping below your global model, so threat-model evaluation never silently degrades. Omit `--output-dir` and the report lands in `${CLAUDE_PLUGIN_DATA}/code-reviews/`, organized across projects and never git-tracked. + ### In GitHub Actions See the production implementation: [bitwarden/gh-actions `_review-code.yml`](https://github.com/bitwarden/gh-actions/blob/main/.github/workflows/_review-code.yml) diff --git a/plugins/bitwarden-code-review/agents/bitwarden-code-reviewer/AGENT.md b/plugins/bitwarden-code-review/agents/bitwarden-code-reviewer/AGENT.md index db1c2c13..6afff3b0 100644 --- a/plugins/bitwarden-code-review/agents/bitwarden-code-reviewer/AGENT.md +++ b/plugins/bitwarden-code-review/agents/bitwarden-code-reviewer/AGENT.md @@ -1,6 +1,6 @@ --- name: bitwarden-code-reviewer -version: 1.10.1 +version: 1.12.0 description: Conducts thorough code reviews following Bitwarden standards. Finds all issues first pass, avoids false positives, respects codebase conventions. Invoke when user mentions "code review", "review code", "review", "PR", or "pull request". model: opus skills: avoiding-false-positives, classifying-review-findings, posting-bitwarden-review-comments, posting-review-summary, reviewing-dependency-changes diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md index 8412ccd4..a1e04173 100644 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md @@ -1,8 +1,8 @@ --- name: performing-multi-agent-code-review -description: Perform a rigorous, multi-agent code review with architecture-compliance, parallel quality/security analysis, finding validation, and severity audit. Use whenever the user asks for a structured, deep, thorough, multi-pass, or multi-agent code review — or a review that includes architecture/pattern compliance, confidence-scored findings, or a severity audit — even if they don't say the exact phrase "multi-agent". Prefer this over a single-agent review when the user wants high-signal findings with validation. Also use whenever the user asks for a code review across a commit range, time window, or N most recent commits in a locally checked-out repo (e.g. "review the last week of commits in bitwarden/server", "review the last 20 commits", "review changes since 2026-04-23") — these route to the commit-range mode below. +description: Perform a rigorous, multi-agent code review with architecture-compliance, parallel quality/security analysis, finding validation, and severity audit. Use when the user asks for a structured, deep, thorough, multi-pass, or multi-agent code review — or a review that includes architecture/pattern compliance, confidence-scored findings, or a severity audit. Use when the user asks for a code review across a commit range, time window, or N most recent commits in a locally checked-out repo. allowed-tools: "Bash(gh pr diff:*), Bash(gh pr view:*), Bash(git diff:*), Bash(git status:*), Bash(git rev-parse:*), Bash(git log:*), Read, Write, Grep, Glob, Skill, AskUserQuestion" -argument-hint: "[pr-number | commit-range] [--model ] [--output-dir ]" +argument-hint: "[pr-number | commit-range] [--model ] [--model-analysis ] [--model-security ] [--model-validation ] [--model-audit ] [--output-dir ]" --- # Overview @@ -11,42 +11,53 @@ Execute a structured, multi-agent code review on a set of code changes. Follow t ## Prerequisites -This skill depends on the following sibling plugins. If any are not installed, **abort the review with a clear error message** identifying the missing plugin — do not attempt to proceed with a degraded pipeline. +This skill depends on the following sibling plugins. -- **`bitwarden-tech-lead`** — provides the architecture review subagent. -- **`bitwarden-security-engineer`** — provides security context and analysis skills. +- **`bitwarden-security-engineer`** -Before Step 1, verify each prerequisite is resolvable. If a prerequisite is missing, print: +Before Step 1, verify each prerequisite plugin is installed. The signal is resolvability — a required subagent type or skill that does not appear in your available tooling means the plugin is missing. If any is missing, **abort with the message below** — do not proceed with a degraded pipeline. > Prerequisite plugin `` is not installed. Install it and retry. Review aborted. -…and stop. +## Output Location -## Mode +If `--output-dir ` is present in `$ARGUMENTS`, resolve immediately upon invocation and use that path verbatim. Otherwise, default to `${CLAUDE_PLUGIN_DATA}/code-reviews/`. +Do not test whether it exists, prompt the user to confirm, nor offer alternatives. +If the caller passed a bad path, the write in Step 9 will fail and surface the error. -Read `references/modes.md`. Loaded in Step 1; the orchestrator determines the mode from the invocation, runs the resolution sequence (commit-range mode only), and uses the matching diff-source commands to populate Step 1's gathered context. Modes are orchestrator-only and not propagated to subagents. +## Model Selection -## Output Location +Resolve per-stage models upon invocation before Step 1 begins. + +Flag values are the Agent tool's model nicknames, in ascending tier order: `haiku` < `sonnet` < `opus`. +The **global model** is `--model` if specified, otherwise the session's model. + +| Stage | Agents | Flag | Default | +| -------------- | ----------------------------------- | -------------------- | ---------- | +| Analysis | Step 2 architect; Step 3 Agents 1–2 | `--model-analysis` | global | +| Security | Step 3 Agent 3 (security & logic) | `--model-security` | global | +| Validation | Step 4 | `--model-validation` | global | +| Severity audit | Step 5 | `--model-audit` | **sonnet** | -Resolve immediately upon invocation — before Step 1 begins. The resolved path is used verbatim in Step 9. +Each stage resolves to its flag if present, otherwise its default; an explicit `--model` also overrides the audit's sonnet default. -If `--output-dir ` is present in `$ARGUMENTS`, use that path verbatim — do not test whether it exists, do not prompt the user to confirm, and do not offer alternatives. If the caller passed a bad path, the write in Step 9 will fail and surface the error; that is the intended behavior. +**Security floor.** `--model-security` may only pin at or above the global model. On a lower pin, run security at the global model and note the ignored pin in the announcement. Rationale: P01–P06 evaluation quality must not silently degrade. -Otherwise, default to `${CLAUDE_PLUGIN_DATA}/code-reviews/` — organized across projects, never git-tracked. +**Analysis downgrade caveat.** Bugs missed by a cheaper analysis model cannot be recovered by downstream validation. + +**Announce** the resolved stage → model table before starting the review. + +**Steps 6–9 run in the main agent** — the merge needs the full finding state and reference content in context. One model per stage; parallel same-stage runs collide finding IDs. ## Operating Rules Applies to all agents and subagents. -- Model: Default to the opus model unless `--model` is specified. -- Announce which model is being used before starting the review. - Don't write to GitHub. All findings go to a local markdown file. - Tool discipline (see Orchestration → Tool Discipline) applies to the main agent and is propagated verbatim to every subagent. Rationale for the WebFetch/WebSearch ban: bypasses `gh` auth, skips audit trails, can return stale cached pages. ## Orchestration -Applied when launching subagents. - ### Project Preamble Propagation Subagents do not inherit the main agent's CLAUDE.md context. Every subagent prompt in Steps 2–5 MUST open with the two required blocks below, in order, followed by the conditional block if it applies. @@ -61,7 +72,7 @@ Subagents do not inherit the main agent's CLAUDE.md context. Every subagent prom > > **Threat-model directive.** Evaluate every change against P01–P06 and the requirements under VD/EK/AT/SC/TC (loaded via the `bitwarden-security-context` skill per the preceding block). For each finding that touches vault data, keys, auth tokens, or user authenticity, name the principle or category it implicates. -**Conditional — repo-specific forwarding.** A repo's checked-in `CLAUDE.md` may contain a section that explicitly instructs you to forward it to subagents (e.g., _"when spawning subagents, include..."_ or _"propagate this to subagents"_). If so, paste that section verbatim. If not, the two required blocks alone suffice. +**Conditional — repo-specific forwarding.** A repo's checked-in `CLAUDE.md` may contain a section that explicitly instructs you to forward it to subagents. If so, paste that section verbatim. ### Tool Discipline @@ -86,15 +97,15 @@ Feature context — issue descriptions, Jira tickets, PR history, removed-predec - **Context-allowed** (Step 2 architecture agent; Step 3 Agent 3 security & logic): pass full feature context. These agents think adversarially from intent. - **Context-forbidden** (Step 3 Agent 1 code quality; Step 3 Agent 2 bug analysis): **ONLY** pass the diff and the Review Rules. **DO NOT** paste issue summaries, Jira tickets, or PR description prose into these prompts. -- **Style-matching requirement.** The main agent's tone and framing across parallel agents leaks — a rich-context prompt for the security agent alongside a bare prompt for the bug agent still implicitly biases the bug agent through the shared authored reality. When drafting context-forbidden prompts, match the terse style of the diff-only sibling prompts; do not echo the framing of the context-allowed siblings. +- **Style-matching requirement.** The main agent's tone and framing across parallel agents leaks — a rich-context prompt for the security agent alongside a bare prompt for the bug agent still implicitly frames how the bug agent reads the diff. When drafting context-forbidden prompts, match the terse style of the diff-only sibling prompts; do not echo the framing of the context-allowed siblings. ## Discovery Standards -Read `references/discovery-standards.md`. Referenced by Step 2 (architect doc/code consistency pass) and Step 3 Agent 1 (Hygiene Sweep). The Line Number Accuracy rule is propagated verbatim into every Step 2–5 subagent prompt. +Read `references/discovery-standards.md`. Referenced by Step 2 (architect — doc/code consistency pass and Hygiene Sweep) and Step 3 Agent 1 (Hygiene Sweep). ## Evaluation Standards -Read `references/evaluation-standards.md`. Severity Levels, Do Not Flag, and Confidence Scoring are propagated verbatim into every Step 2–5 subagent prompt; the Finding Shape schema lives in `references/finding-shape.md` and is also propagated verbatim. +Read `references/evaluation-standards.md`. Defines Severity Levels, Do Not Flag, and Confidence Scoring; the Finding Shape schema lives in `references/finding-shape.md`. ## Review Rules @@ -118,11 +129,11 @@ Execute these steps in order. Do not skip, reorder, or combine steps. - Determine the mode per `references/modes.md`. Fetch the list of changed files with the mode's command: `gh pr diff {number} --name-only` (PR), `git diff HEAD --name-only` (local), `git diff origin/HEAD...HEAD --name-only` (branch comparison), or `git diff .. --name-only` (commit range). In PR mode, also fetch the title and description with `gh pr view`. - **READ** CLAUDE.md, README.md, and any other relevant .md files in or near the directories containing modified files. - **READ** `references/report-template.md` for formatting the final report in Step 7. - - **READ** `references/finding-shape.md`. Its contents are pasted verbatim into every Step 2–5 subagent prompt. - - **READ** `references/discovery-standards.md`. The Hygiene Sweep is referenced by name in the Step 3 Agent 1 prompt; Line Number Accuracy is propagated verbatim into every Step 2–5 subagent prompt. - - **READ** `references/evaluation-standards.md`. Severity Levels, Do Not Flag, and Confidence Scoring are propagated verbatim into every Step 2–5 subagent prompt. + - **READ** `references/finding-shape.md`. + - **READ** `references/discovery-standards.md`. The Hygiene Sweep is referenced by name in the Step 2 architect and Step 3 Agent 1 prompts. + - **READ** `references/evaluation-standards.md`. -2. Launch a single architecture & pattern compliance agent using the `bitwarden-tech-lead:bitwarden-tech-lead` subagent type. Give it the diff, the list of changed file paths, and — in PR mode only — the PR title and description. +2. Launch a single architecture & pattern compliance agent using the `general-purpose` subagent type, with the resolved analysis model (see Model Selection). Open the subagent prompt with: "You are a software architect reviewing code changes for architectural and pattern compliance." Give it the diff, the list of changed file paths, and — in PR mode only — the PR title and description. Unlike the diff agents in Step 3, this agent reads BEYOND the diff to check whether changes fit the codebase. @@ -130,13 +141,13 @@ Execute these steps in order. Do not skip, reorder, or combine steps. - Read the full files being modified (not just diff hunks) to understand surrounding context. - Read CLAUDE.md, README.md, and other relevant .md files in or near the modified directories; verify each change complies with explicit project rules. - Use Glob and Grep to find how similar code is structured elsewhere in the codebase. - - **Doc/code consistency pass** — flag contradictions this diff creates between the code and same-repo documentation, configuration, or agent-facing files (e.g., a `CLAUDE.md` entry describing handler behavior the diff now changes; a README example that no longer matches the new signature; `.claude/` agent instructions referencing behavior the PR removes). Only flag divergence this change creates or worsens — do not audit pre-existing drift. + - **Doc/code consistency pass** — flag contradictions this diff creates between the code and same-repo documentation, configuration, or agent-facing files — README.md and CLAUDE.md most of all. Only flag divergence this change creates or worsens — do not audit pre-existing drift. **Scope.** Raise pattern inconsistencies, architectural boundary violations, duplicated abstractions, and new conventions introduced where an established one applies. Do NOT raise correctness bugs, security issues, or code-quality concerns — those belong to Step 3. - Apply the Review Rules. Threshold ≥ 80. Emit findings as a JSON array per the Finding Shape schema. + Apply the Review Rules. Also include the **Hygiene Sweep** definition from `references/discovery-standards.md` — its lenses are within the architect's scope. Threshold ≥ 80. Emit findings as a JSON array per the Finding Shape schema. -3. Launch 3 agents as instructed below. Each receives the diff and the Review Rules; each emits findings as a JSON array per the Finding Shape schema. Confidence Scoring from `references/evaluation-standards.md` applies to all three — threshold ≥ 80. In PR mode, pass the PR title and description only to Agent 3 per Context Partitioning — Agents 1 and 2 receive diff + Review Rules only. Send all 3 Agent tool calls in a single message (do NOT use run_in_background). +3. Send all 3 Agent tool calls in a single message (**DO NOT** use run_in_background because the agents must run synchronously to guarantee findings are validated together at Step 4). Agents 1–2 use the resolved analysis model, Agent 3 uses the resolved security model (see Model Selection). Each receives the diff and the Review Rules; each emits findings as a JSON array per the Finding Shape schema. Confidence Scoring from `references/evaluation-standards.md` applies to all three — threshold ≥ 80. In PR mode, pass the PR title and description only to Agent 3 per Context Partitioning — Agents 1 and 2 receive diff + Review Rules only. **Agent 1: Code quality agent** Use the `general-purpose` subagent type. Read the diff as a senior engineer seeing it for the first time — surface anything that hurts correctness, clarity, or long-term maintainability, including code duplication, missing critical error handling, and inadequate test coverage. @@ -155,16 +166,16 @@ Execute these steps in order. Do not skip, reorder, or combine steps. - **Consent gates** — are authorization actions clearly labeled with sufficient context? - **Output authenticity** — are responses distinguishable from attacker-forged messages? -4. Launch a single `general-purpose` validation subagent for all findings from Steps 2 and 3. The subagent receives the diff fetched with the mode's diff command from Step 1, the full array of finding objects, the Review Rules, and — in PR mode only — the PR title and description. The subagent returns an array of Step 4 objects (one per input finding) per the Finding Shape schema. +4. Launch a single `general-purpose` validation subagent for all findings from Steps 2 and 3, with the resolved validation model (see Model Selection). The subagent receives the diff fetched with the mode's diff command from Step 1, the full array of finding objects, the Review Rules, and — in PR mode only — the PR title and description. The subagent returns an array of Step 4 objects (one per input finding) per the Finding Shape schema. - **Chunking escape hatch.** If raw findings from Steps 2 and 3 number more than 25, partition them into chunks of ≤ 15 (preserving collateral context within each chunk; do not split a `source_agent` group across chunks if it would put related findings on opposite sides) and launch one validation subagent per chunk in a single message (do NOT use run_in_background). + **Chunking escape hatch.** If raw findings from Steps 2 and 3 number more than 25, partition them into chunks of ≤ 15 (preserving collateral context within each chunk; do not split a `source_agent` group across chunks if it would put related findings on opposite sides) and launch one validation subagent per chunk in a single message (**DO NOT** use run_in_background because the agents must run synchronously to guarantee accuracy). A finding is **dismissed** if ANY of the following are true: - It is a pre-existing finding, not introduced by this change. In commit-range mode, treat the cumulative diff of `..` as "this change" and the parent of `` as the pre-existing baseline. - **Bugs**: The problem does not actually exist in the code (e.g., the variable is not truly undefined, the logic error does not actually produce wrong results) - It is a nitpick that a senior engineer would not flag in a real code review - It would be caught by a linter (**do not run** the linter to verify) - - It is a general code quality concern that wouldn't be flagged in a real code review. In other words, do not state generics. All findings **MUST** be specific and actionable. + - It is a vague code quality concern — findings **MUST** be specific and actionable. **Collateral-change check.** When a finding is about to be dismissed as "deliberate divergence from an established pattern" or "documented exception," before dismissing it check whether supporting code was updated _consistent with_ the divergence. Specifically, scan the diff for: - Allowlist, registry, or lookup-table entries that assume the old pattern and are now stale or dead. @@ -173,7 +184,7 @@ Execute these steps in order. Do not skip, reorder, or combine steps. If the divergence is deliberate but its collateral was not updated, the collateral is a new finding (typically ♻️ Refactor) — do not dismiss the original finding silently; route the collateral problem as its own finding instead. -5. Launch a single `general-purpose` severity-audit agent. Give it all validated findings from step 4, the diff, and the Review Rules. For each finding, the agent must: +5. Launch a single `general-purpose` severity-audit agent, with the resolved audit model — sonnet unless overridden (see Model Selection). Give it all validated findings from step 4, the diff, and the Review Rules. For each finding, the agent must: - Confirm the severity assigned by the review agent, or - Downgrade it to a lower severity if the evidence doesn't support the original rating, or - Dismiss it entirely if it does not meet the bar for any severity level. @@ -182,10 +193,12 @@ Execute these steps in order. Do not skip, reorder, or combine steps. 6. Merge all Step 4 and Step 5 returns by `id` into the master finding map. Before merging Step 5 returns, insert the full Finding object for each Step 4 collateral finding (`source_agent: "validation"`, `id: "val-N"`) into the master map — their creation-time fields come from those Finding objects, not from Step 4's status returns. Creation-time fields are immutable (see `references/finding-shape.md`). For dismissed findings, set `dismissal_stage` to `"Step 4 validation"` or `"Step 5 severity audit"` based on which step set the dismissal status — it renders as `**Dismissed at:**`. Partition by final status: validated (Step 5 `confirmed` or `downgraded`) becomes the main Findings section; dismissed (Step 4 `dismissed` or Step 5 `dismissed`) preserves original severity, original confidence, dismissal stage, and dismissal reason for rendering in the Dismissed block. -7. Format the report using the template in `references/report-template.md`. Cite every validated AND dismissed finding with full file path and line: `file/path.ext:{line}` (or `:{start}-{end}` for ranges). Omit any severity section with zero findings. If zero findings total, replace the Findings section with: "No findings found." For every rendered finding (validated and dismissed), populate the `**Caught by:**` line from the finding's `source_agent` field, translated to the friendly label per the table in `references/report-template.md`. Dismissed findings additionally render `**Original severity:**`, `**Original confidence:**`, `**Dismissed at:**`, and `**Dismissed because:**` per the template — past runs have silently dropped these, so do not omit any of them; per-finding traceability requires the full set. +7. Format the report using the template in `references/report-template.md`; `examples/sample-report.md` shows a complete rendered example, including the dismissed-finding stanza. Cite every validated AND dismissed finding with full file path and line: `file/path.ext:{line}` (or `:{start}-{end}` for ranges). Omit any severity section with zero findings. If zero findings total, replace the Findings section with: "No findings found." For every rendered finding (validated and dismissed), populate the `**Caught by:**` line from the finding's `source_agent` field, translated to the friendly label per the table in `references/report-template.md`. Dismissed findings additionally render `**Original severity:**`, `**Original confidence:**`, `**Dismissed at:**`, and `**Dismissed because:**` per the template — past runs have silently dropped these, so do not omit any of them. 8. Print the full formatted report to the terminal. -9. Write the formatted report to the output directory resolved in **Output Location**. Do not test whether the directory exists, do not create it, and do not prompt the user — write directly. If the write fails because the caller-supplied path is invalid, surface the error as-is. After a successful write, print the full resolved path. +9. Write the formatted report to the output directory resolved in **Output Location**. Do not test if the directory exists. Do not attempt to create the directory. Write the file directly. If the write fails then surface the error as-is. After a successful write, print the full resolved path. File name: `code-review-{model}-PR-{number}.md` (PR mode), `code-review-{model}-{YYYY-MM-DD}.md` (local mode), `code-review-{model}-{branch}-{YYYY-MM-DD}.md` (branch comparison mode), or `code-review-{model}-{from-short}..{to-short}.md` (commit-range mode, where `{from-short}`/`{to-short}` are 7-char SHAs or shorter ref names). + + `{model}` is the resolved global model's nickname, never a dated model ID. Append `-mixed` when an explicit stage flag differs from the global model; the audit's sonnet default does not count. The report's Model Header follows its own rule — see `references/report-template.md`. diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/sample-report.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/sample-report.md new file mode 100644 index 00000000..fea3f449 --- /dev/null +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/sample-report.md @@ -0,0 +1,52 @@ +# Code Review: Rename example plugin and refresh manifests (#123) + +**Date:** 2026-06-10 | **Reviewed by:** Claude Code | **Model:** opus (audit: sonnet) + +## Summary + +| Severity | Count | +| ------------ | ----- | +| 🛑 Blocker | 0 | +| ⚠️ Important | 1 | +| ♻️ Refactor | 0 | + +The rename is structurally sound — the new manifest, marketplace entry, and AGENT.md are internally consistent and correctly wired together. One issue holds it back: the renamed README retains the old plugin identity throughout its body (title, install command, usage examples), so users following it will reference a plugin name that no longer exists. + +## Findings + +### ⚠️ Important + +#### Renamed README retains old plugin identity throughout body + +`plugins/example/README.md:11` +**Caught by:** Architecture agent + +
Details + +The diff only edits the Overview sentence. Every other line retains the old plugin identity: the H1 title (line 1), the agent table row (line 11), and the install command (line 30) still reference the deleted plugin name. These contradict the renamed `plugin.json` and the new `AGENT.md`. Update the title, table row, and install command to the new name in this PR, or add the README to the documented deferral list. + +
+ +## Reviewed and Dismissed + +
🔍 2 initial findings dismissed after validation + +#### README overview still describes old plugin identity + +`plugins/example/README.md:5` +**Caught by:** Code quality agent +**Original severity:** ♻️ Refactor +**Original confidence:** 90/100 +**Dismissed at:** Step 4 validation +**Dismissed because:** Substantively covered by the architecture finding at higher severity; no distinct actionable scope beyond what that finding already requires. + +#### quality findings cite wrong file line for plugin.json description field + +`plugins/example/.claude-plugin/plugin.json:4` +**Caught by:** Validation agent (collateral) +**Original severity:** ♻️ Refactor +**Original confidence:** 100/100 +**Dismissed at:** Step 5 severity audit +**Dismissed because:** A meta-observation about sibling findings is not a code issue in the change under review. + +
diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/discovery-standards.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/discovery-standards.md index 742cc0ce..2dabdb56 100644 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/discovery-standards.md +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/discovery-standards.md @@ -1,10 +1,10 @@ # Discovery Standards -Loaded by the orchestrator in Step 1. The **Hygiene Sweep** is invoked by name from the Step 3 Agent 1 (code quality) prompt. The **Line Number Accuracy** rule is propagated verbatim into every Step 2–5 subagent prompt. +Loaded by the orchestrator in Step 1. The **Hygiene Sweep** is invoked by name from the Step 2 architect and Step 3 Agent 1 (code quality) prompts. The **Line Number Accuracy** rule is propagated verbatim into every Step 2–5 subagent prompt. ## Hygiene Sweep -Agent 1 (code quality) performs a hygiene sweep of the diff before submitting findings; the Step 2 architect performs an analogous doc/code consistency pass per its own directive. When referenced, look specifically for: +Agent 1 (code quality) performs a hygiene sweep of the diff before submitting findings; the Step 2 architect applies the same sweep within its scope. When referenced, look specifically for: - **Dead code added by this PR** — allowlist/registry/lookup-table entries added for features that don't flow through the validated entry point; unused imports; unreachable branches. - **Stale references** — documentation, comments, error messages, or assertions in this diff that contradict the same diff's implementation. diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/finding-shape.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/finding-shape.md index f6e77c32..4491de9d 100644 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/finding-shape.md +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/finding-shape.md @@ -33,6 +33,30 @@ One entry per incoming finding, keyed by `id`: **Collateral findings** produced during Step 4 (per the collateral-change check) use the full **Finding object** schema above with `source_agent: "validation"` and `id: "val-N"`. They append to Step 5's input. +Extra fields beyond this schema are ignored by the merge — creation-time fields come only from the original Finding object, never from Step 4 or Step 5 returns. + +```json +// Example Step 4 return +[ + { "id": "arch-1", "status": "validated" }, + { + "id": "quality-3", + "status": "dismissed", + "dismissal_reason": "Substantively covered by arch-1 at higher severity." + }, + { + "id": "val-1", + "file": "plugins/example/.claude-plugin/plugin.json", + "line": "4", + "severity": "refactor", + "confidence": 100, + "title": "quality findings cite wrong file line for plugin.json description field", + "detail": "Both citations are diff-offset artifacts; the description field is at file line 4.", + "source_agent": "validation" + } +] +``` + ## Step 5 return (severity audit) One entry per incoming finding, keyed by `id`: @@ -44,6 +68,18 @@ One entry per incoming finding, keyed by `id`: | `final_severity` | string | Severity value. Omit when `status = "dismissed"`. | | `dismissal_reason` | string | Present only when `status = "dismissed"`. | +```json +// Example Step 5 return +[ + { "id": "arch-1", "status": "confirmed", "final_severity": "important" }, + { + "id": "val-1", + "status": "dismissed", + "dismissal_reason": "A meta-observation about sibling findings is not a code issue in the change under review." + } +] +``` + ## Orchestrator behavior - Maintains a master finding map keyed by `id`. diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/report-template.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/report-template.md index edfa538f..0294c719 100644 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/report-template.md +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/report-template.md @@ -1,5 +1,14 @@ # Report Template +## Model Header + +`{model}` is the resolved global model's nickname, never a dated model ID. When any stage's resolved model differs from the global model — including the audit's sonnet default — list each differing stage in parentheses: + +``` +**Model:** opus +**Model:** opus (audit: sonnet) +``` + ## Severity Icons - 🛑 **Blocker** — Must fix before merge