From b85c8e05c1d12e79239499b1094778ba31076d98 Mon Sep 17 00:00:00 2001 From: Mick Letofsky Date: Mon, 15 Jun 2026 11:31:24 +0200 Subject: [PATCH 01/13] Initial commit of improvements to multi-agent review --- .claude-plugin/marketplace.json | 2 +- README.md | 4 +- .../.claude-plugin/plugin.json | 2 +- plugins/bitwarden-code-review/CHANGELOG.md | 13 +++++ plugins/bitwarden-code-review/README.md | 51 +++++++++++++++++- .../agents/bitwarden-code-reviewer/AGENT.md | 2 +- .../SKILL.md | 44 ++++++++++++---- .../examples/merge-walkthrough.md | 25 +++++++++ .../examples/sample-report.md | 52 +++++++++++++++++++ .../examples/step4-validation-return.json | 18 +++++++ .../examples/step5-severity-audit-return.json | 8 +++ .../references/finding-shape.md | 4 ++ .../references/report-template.md | 9 ++++ 13 files changed, 219 insertions(+), 15 deletions(-) create mode 100644 plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/merge-walkthrough.md create mode 100644 plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/sample-report.md create mode 100644 plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/step4-validation-return.json create mode 100644 plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/step5-severity-audit-return.json diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index 63902851..23729dc3 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -24,7 +24,7 @@ { "name": "bitwarden-code-review", "source": "./plugins/bitwarden-code-review", - "version": "1.11.0", + "version": "1.12.0", "description": "Comprehensive code review system with organization-wide standards." }, { diff --git a/README.md b/README.md index 3597c415..ed929738 100644 --- a/README.md +++ b/README.md @@ -9,8 +9,8 @@ A curated collection of plugins for AI-assisted development at Bitwarden. Enable | [bitwarden-tech-lead](plugins/bitwarden-tech-lead/) | 2.3.0 | Tech lead for technical planning, architecture coherence, and surfacing patterns to Technical Strategy Ideas | | [bitwarden-shepherd](plugins/bitwarden-shepherd/) | 1.0.0 | Champion of a technical strategy — shepherds a TSI through evaluation into the funnel, then through to adoption | | [bitwarden-atlassian-tools](plugins/bitwarden-atlassian-tools/) | 2.2.7 | Read-only Atlassian access via MCP server with deep Jira issue research skill | -| [bitwarden-code-review](plugins/bitwarden-code-review/) | 1.11.0 | Autonomous code review agent following Bitwarden engineering standards with GitHub integration | -| [bitwarden-delivery-tools](plugins/bitwarden-delivery-tools/) | 1.4.0 | Delivery lifecycle skills: initiative funnel navigation, work transitions, tech breakdowns and cross-team signoffs, commits, PRs, preflight, labeling | +| [bitwarden-code-review](plugins/bitwarden-code-review/) | 1.12.0 | Autonomous code review agent following Bitwarden engineering standards with GitHub integration | +| [bitwarden-delivery-tools](plugins/bitwarden-delivery-tools/) | 1.4.0 | Delivery lifecycle skills: initiative funnel navigation, work transitions, tech breakdowns and cross-team signoffs, commits, PRs, preflight, labeling | | [bitwarden-designer](plugins/bitwarden-designer/) | 0.1.0 | Product designer persona: Code of Conduct and 30/60/90 critique, critique facilitation; dispatches into bitwarden-design-tools | | [bitwarden-design-tools](plugins/bitwarden-design-tools/) | 0.1.0 | Design toolkit: content style guide, Figma Dev Mode MCP, Bitwarden brand application, handoff prep, Design System governance, Product and Design Jira | | [bitwarden-devops-engineer](plugins/bitwarden-devops-engineer/) | 0.1.3 | DevOps engineering assistant: workflow compliance linting, action security auditing, and org-wide CI/CD remediation | diff --git a/plugins/bitwarden-code-review/.claude-plugin/plugin.json b/plugins/bitwarden-code-review/.claude-plugin/plugin.json index 51349959..212a5b57 100644 --- a/plugins/bitwarden-code-review/.claude-plugin/plugin.json +++ b/plugins/bitwarden-code-review/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "bitwarden-code-review", - "version": "1.11.0", + "version": "1.12.0", "description": "Comprehensive code review system with organization-wide standards.", "author": { "name": "Bitwarden", diff --git a/plugins/bitwarden-code-review/CHANGELOG.md b/plugins/bitwarden-code-review/CHANGELOG.md index 0e9c590c..1aa18b86 100644 --- a/plugins/bitwarden-code-review/CHANGELOG.md +++ b/plugins/bitwarden-code-review/CHANGELOG.md @@ -5,6 +5,19 @@ All notable changes to the Bitwarden Code Review Plugin will be documented in th The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [1.12.0] - 2026-06-10 + +### Added + +- `performing-multi-agent-code-review`: per-stage model flags (`--model-analysis`, `--model-security`, `--model-validation`, `--model-audit`) with a security floor rule +- `performing-multi-agent-code-review`: `examples/` (sample report + merge walkthrough) + +### Changed + +- `performing-multi-agent-code-review`: subagents inherit the session model instead of forcing Opus; severity audit defaults to Sonnet +- `performing-multi-agent-code-review`: `{model}` in file names and report headers is the resolved model nickname; `-mixed` suffix when stage flags differ +- Improved the README.md to better describe the purpose and usage of the multi-agent review. + ## [1.11.0] - 2026-05-12 ### Added diff --git a/plugins/bitwarden-code-review/README.md b/plugins/bitwarden-code-review/README.md index 64b77d20..38fa6dbb 100644 --- a/plugins/bitwarden-code-review/README.md +++ b/plugins/bitwarden-code-review/README.md @@ -1,11 +1,13 @@ # Bitwarden Code Review Plugin -Comprehensive AI-powered code review agent following Bitwarden engineering standards. +AI-powered code review for Bitwarden — an autonomous review agent for everyday PRs, plus a rigorous multi-agent pipeline for the changes that warrant a deeper look. ## Overview This plugin provides an autonomous code review agent that conducts thorough, professional code reviews following Bitwarden's organizational standards. The agent focuses on security, correctness, and high-value feedback while maintaining a high signal-to-noise ratio. +It offers two complementary lenses. The autonomous `bitwarden-code-reviewer` agent reviews a pull request the way a human reviewer would and posts inline comments to GitHub. The `performing-multi-agent-code-review` skill takes a different approach — it has Claude evaluate code _as Claude_ across a pipeline of specialized sub-agents, trading human-style commentary for depth and high-signal findings written to a local report. Use the agent for everyday PR review; reach for the skill when a change is complex or security-sensitive enough that one reviewer can't hold every concern in view at once. + ## Features - **Autonomous Review Agent**: Single agent handles all code review tasks without manual invocation @@ -14,6 +16,9 @@ This plugin provides an autonomous code review agent that conducts thorough, pro - **Security-First Approach**: Prioritizes security vulnerabilities, data exposure, and authentication issues - **Structured Thinking**: Uses explicit reasoning blocks to improve review quality and consistency - **Confidence Scoring**: Pre-filters findings with a 0-100 confidence score (≥75 threshold) before validation to reduce false positives +- **Multi-Agent Review Pipeline**: A separate `performing-multi-agent-code-review` skill runs six specialized sub-agents — architecture compliance, code quality, bug analysis, security & logic, validation, and severity audit — for depth on complex or security-sensitive changes +- **Any Model, Per Stage**: Per-stage model flags dial cost against depth; a security floor prevents the security pass from ever running below your global model +- **Local-Only, Multi-Mode**: Reviews draft or published PRs, local changes, branch comparisons, or commit ranges — and writes findings to a local report, never to GitHub ## Skills @@ -84,6 +89,50 @@ The agent is automatically invoked by Claude when: Use the bitwarden-code-reviewer agent to review this PR ``` +### Multi-agent code review skill + +The `performing-multi-agent-code-review` skill is built for changes complex or security-sensitive enough that one reviewer — human or AI — can't hold every concern in view at once. Instead of a single pass, it puts a team of specialized agents on the same diff, each reviewing from its own perspective — architecture and pattern compliance, code quality, bugs, and security & logic — then validates and severity-audits everything they surface. It started from [Anthropic's `code-review` command](https://github.com/anthropics/claude-code/blob/main/plugins/code-review/commands/code-review.md) and was rebuilt for the control that command lacks: it wires in our own `bitwarden-security-engineer` and `bitwarden-tech-lead` plugins, runs on any model you choose, reviews **draft or published** PRs — plus local changes, branch comparisons, and commit ranges — and writes its report to a local file instead of posting to GitHub. + +It's a deliberately different lens from the `bitwarden-code-reviewer` agent. That agent reviews the way a human reviewer would; this skill has **Claude evaluate code as Claude** — optimized for signal, not parity. It spotlights blockers, real bugs, and known-bad patterns, and stays out of the nit-picking lane. Each potential finding is scored 0–100, and only those clearing an 80-confidence bar are raised; every one it keeps is cited with file, line, and the agent that caught it. What clears the bar still gets challenged — the validation and severity-audit agents can overturn a finding — but an overturned finding is never dropped: it moves into a collapsed **Reviewed and Dismissed** section, tagged with its original severity, original confidence, and the reason it was set aside. That section is deliberate — a human sees everything the first pass surfaced and can judge when a dismissal was wrong and the original call was right; without it, that signal is lost. + +**Relaying the "why" to each teammate.** Those agents start cold — a sub-agent inherits none of the main session's context — so the skill briefs every one of them explicitly. Each receives Bitwarden's zero-knowledge invariant and the P01–P06 threat-model directive verbatim. Feature context — the PR's intent, the ticket, the product framing — is handed out deliberately: the architecture and security agents get the full "why" so they can reason adversarially from intent, while the quality and bug agents see only the diff, so their first read stays unbiased. + +It slots cleanly into a [Claude Code agent-teams](https://code.claude.com/docs/en/agent-teams) loop — run it at the end of a coding session, address what it finds, refactor — or as a depth-of-review gate on a draft PR before you publish. + +#### Examples + +Invoke the skill explicitly with the slash command, or with natural language — it triggers on phrasing like "thorough", "deep", "multi-pass", or "multi-agent" review, and on commit-range framing like "review the last week of commits". With no model flags, the review runs on your session's model, with the severity audit defaulting to sonnet. + +**1. Local changes, all defaults.** The simplest run — review your uncommitted work before you commit: + +> Perform a thorough multi-agent code review of my local changes. + +**2. A specific pull request.** Pass a PR number or URL; it reviews draft and published PRs alike: + +```markdown +/bitwarden-code-review:performing-multi-agent-code-review https://github.com/bitwarden/ios/pulls/1234567 --model opus +``` + +**3. Changes over a period of time.** Commit-range mode reviews the cumulative diff across a time window, commit count, or explicit ref pair. Run it from inside the target repo; it confirms the resolved range with you before spending tokens: + +```markdown +Review the last week of commits using /bitwarden-code-review:performing-multi-agent-code-review --model-analysis opus --model-security fable --model-validation opus --model-audit sonnet --output-dir ./code-reviews +``` + +**4. Full control, per stage.** Tune each stage independently — opus for analysis and validation, fable for security, sonnet for the audit — and send the report to a specific directory: + +```bash +/bitwarden-code-review:performing-multi-agent-code-review \ + https://github.com/bitwarden/gh-actions/pull/604 \ + --model-analysis opus \ + --model-security fable \ + --model-validation opus \ + --model-audit sonnet \ + --output-dir ./reviews +``` + +A **security floor** keeps `--model-security` from ever dropping below your global model, so threat-model evaluation never silently degrades. Omit `--output-dir` and the report lands in `${CLAUDE_PLUGIN_DATA}`, organized across projects and never git-tracked. + ### In GitHub Actions See the production implementation: [bitwarden/gh-actions `_review-code.yml`](https://github.com/bitwarden/gh-actions/blob/main/.github/workflows/_review-code.yml) diff --git a/plugins/bitwarden-code-review/agents/bitwarden-code-reviewer/AGENT.md b/plugins/bitwarden-code-review/agents/bitwarden-code-reviewer/AGENT.md index db1c2c13..6afff3b0 100644 --- a/plugins/bitwarden-code-review/agents/bitwarden-code-reviewer/AGENT.md +++ b/plugins/bitwarden-code-review/agents/bitwarden-code-reviewer/AGENT.md @@ -1,6 +1,6 @@ --- name: bitwarden-code-reviewer -version: 1.10.1 +version: 1.12.0 description: Conducts thorough code reviews following Bitwarden standards. Finds all issues first pass, avoids false positives, respects codebase conventions. Invoke when user mentions "code review", "review code", "review", "PR", or "pull request". model: opus skills: avoiding-false-positives, classifying-review-findings, posting-bitwarden-review-comments, posting-review-summary, reviewing-dependency-changes diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md index 8412ccd4..d2e7508a 100644 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md @@ -2,7 +2,7 @@ name: performing-multi-agent-code-review description: Perform a rigorous, multi-agent code review with architecture-compliance, parallel quality/security analysis, finding validation, and severity audit. Use whenever the user asks for a structured, deep, thorough, multi-pass, or multi-agent code review — or a review that includes architecture/pattern compliance, confidence-scored findings, or a severity audit — even if they don't say the exact phrase "multi-agent". Prefer this over a single-agent review when the user wants high-signal findings with validation. Also use whenever the user asks for a code review across a commit range, time window, or N most recent commits in a locally checked-out repo (e.g. "review the last week of commits in bitwarden/server", "review the last 20 commits", "review changes since 2026-04-23") — these route to the commit-range mode below. allowed-tools: "Bash(gh pr diff:*), Bash(gh pr view:*), Bash(git diff:*), Bash(git status:*), Bash(git rev-parse:*), Bash(git log:*), Read, Write, Grep, Glob, Skill, AskUserQuestion" -argument-hint: "[pr-number | commit-range] [--model ] [--output-dir ]" +argument-hint: "[pr-number | commit-range] [--model ] [--model-analysis ] [--model-security ] [--model-validation ] [--model-audit ] [--output-dir ]" --- # Overview @@ -16,7 +16,7 @@ This skill depends on the following sibling plugins. If any are not installed, * - **`bitwarden-tech-lead`** — provides the architecture review subagent. - **`bitwarden-security-engineer`** — provides security context and analysis skills. -Before Step 1, verify each prerequisite is resolvable. If a prerequisite is missing, print: +Before Step 1, verify each prerequisite is resolvable (verification is heuristic — if a subagent type fails to resolve, treat that as the signal). If a prerequisite is missing, print: > Prerequisite plugin `` is not installed. Install it and retry. Review aborted. @@ -34,12 +34,36 @@ If `--output-dir ` is present in `$ARGUMENTS`, use that path verbatim — Otherwise, default to `${CLAUDE_PLUGIN_DATA}/code-reviews/` — organized across projects, never git-tracked. +## Model Selection + +Resolve per-stage models upon invocation before Step 1 begins. + +Flag values are the Agent tool's model nicknames, in ascending tier order: `haiku` < `sonnet` < `opus` < `fable`. +The **global model** is `--model` if specified, otherwise the session's model. + +| Stage | Agents | Flag | Default | +| -------------- | ----------------------------------- | -------------------- | ---------- | +| Analysis | Step 2 architect; Step 3 Agents 1–2 | `--model-analysis` | global | +| Security | Step 3 Agent 3 (security & logic) | `--model-security` | global | +| Validation | Step 4 | `--model-validation` | global | +| Severity audit | Step 5 | `--model-audit` | **sonnet** | + +Each stage resolves to its flag if present, otherwise its default. +Exception: an explicit `--model` overrides the audit's sonnet default. + +**Security floor.** `--model-security` may only pin at or above the global model. On a lower pin, run security at the global model and note the ignored pin in the announcement. Rationale: P01–P06 evaluation quality must not silently degrade. + +**Analysis downgrade caveat.** Bugs missed by a cheaper analysis model cannot be recovered by downstream validation. + +**Announce** the resolved stage → model table before starting the review. + +**Steps 6–9 run in the main agent** — the merge needs the full finding state and reference content in context. One model per stage; parallel same-stage runs collide finding IDs. + ## Operating Rules Applies to all agents and subagents. -- Model: Default to the opus model unless `--model` is specified. -- Announce which model is being used before starting the review. +- Models: resolved per stage in Model Selection above. - Don't write to GitHub. All findings go to a local markdown file. - Tool discipline (see Orchestration → Tool Discipline) applies to the main agent and is propagated verbatim to every subagent. Rationale for the WebFetch/WebSearch ban: bypasses `gh` auth, skips audit trails, can return stale cached pages. @@ -122,7 +146,7 @@ Execute these steps in order. Do not skip, reorder, or combine steps. - **READ** `references/discovery-standards.md`. The Hygiene Sweep is referenced by name in the Step 3 Agent 1 prompt; Line Number Accuracy is propagated verbatim into every Step 2–5 subagent prompt. - **READ** `references/evaluation-standards.md`. Severity Levels, Do Not Flag, and Confidence Scoring are propagated verbatim into every Step 2–5 subagent prompt. -2. Launch a single architecture & pattern compliance agent using the `bitwarden-tech-lead:bitwarden-tech-lead` subagent type. Give it the diff, the list of changed file paths, and — in PR mode only — the PR title and description. +2. Launch a single architecture & pattern compliance agent using the `bitwarden-tech-lead:bitwarden-tech-lead` subagent type, with the resolved analysis model (see Model Selection). Give it the diff, the list of changed file paths, and — in PR mode only — the PR title and description. Unlike the diff agents in Step 3, this agent reads BEYOND the diff to check whether changes fit the codebase. @@ -136,7 +160,7 @@ Execute these steps in order. Do not skip, reorder, or combine steps. Apply the Review Rules. Threshold ≥ 80. Emit findings as a JSON array per the Finding Shape schema. -3. Launch 3 agents as instructed below. Each receives the diff and the Review Rules; each emits findings as a JSON array per the Finding Shape schema. Confidence Scoring from `references/evaluation-standards.md` applies to all three — threshold ≥ 80. In PR mode, pass the PR title and description only to Agent 3 per Context Partitioning — Agents 1 and 2 receive diff + Review Rules only. Send all 3 Agent tool calls in a single message (do NOT use run_in_background). +3. Send all 3 Agent tool calls in a single message (do NOT use run_in_background). Launch the 3 agents as instructed below — Agents 1–2 with the resolved analysis model, Agent 3 with the resolved security model (see Model Selection). Each receives the diff and the Review Rules; each emits findings as a JSON array per the Finding Shape schema. Confidence Scoring from `references/evaluation-standards.md` applies to all three — threshold ≥ 80. In PR mode, pass the PR title and description only to Agent 3 per Context Partitioning — Agents 1 and 2 receive diff + Review Rules only. **Agent 1: Code quality agent** Use the `general-purpose` subagent type. Read the diff as a senior engineer seeing it for the first time — surface anything that hurts correctness, clarity, or long-term maintainability, including code duplication, missing critical error handling, and inadequate test coverage. @@ -155,7 +179,7 @@ Execute these steps in order. Do not skip, reorder, or combine steps. - **Consent gates** — are authorization actions clearly labeled with sufficient context? - **Output authenticity** — are responses distinguishable from attacker-forged messages? -4. Launch a single `general-purpose` validation subagent for all findings from Steps 2 and 3. The subagent receives the diff fetched with the mode's diff command from Step 1, the full array of finding objects, the Review Rules, and — in PR mode only — the PR title and description. The subagent returns an array of Step 4 objects (one per input finding) per the Finding Shape schema. +4. Launch a single `general-purpose` validation subagent for all findings from Steps 2 and 3, with the resolved validation model (see Model Selection). The subagent receives the diff fetched with the mode's diff command from Step 1, the full array of finding objects, the Review Rules, and — in PR mode only — the PR title and description. The subagent returns an array of Step 4 objects (one per input finding) per the Finding Shape schema. **Chunking escape hatch.** If raw findings from Steps 2 and 3 number more than 25, partition them into chunks of ≤ 15 (preserving collateral context within each chunk; do not split a `source_agent` group across chunks if it would put related findings on opposite sides) and launch one validation subagent per chunk in a single message (do NOT use run_in_background). @@ -173,7 +197,7 @@ Execute these steps in order. Do not skip, reorder, or combine steps. If the divergence is deliberate but its collateral was not updated, the collateral is a new finding (typically ♻️ Refactor) — do not dismiss the original finding silently; route the collateral problem as its own finding instead. -5. Launch a single `general-purpose` severity-audit agent. Give it all validated findings from step 4, the diff, and the Review Rules. For each finding, the agent must: +5. Launch a single `general-purpose` severity-audit agent, with the resolved audit model — sonnet unless overridden (see Model Selection). Give it all validated findings from step 4, the diff, and the Review Rules. For each finding, the agent must: - Confirm the severity assigned by the review agent, or - Downgrade it to a lower severity if the evidence doesn't support the original rating, or - Dismiss it entirely if it does not meet the bar for any severity level. @@ -182,10 +206,12 @@ Execute these steps in order. Do not skip, reorder, or combine steps. 6. Merge all Step 4 and Step 5 returns by `id` into the master finding map. Before merging Step 5 returns, insert the full Finding object for each Step 4 collateral finding (`source_agent: "validation"`, `id: "val-N"`) into the master map — their creation-time fields come from those Finding objects, not from Step 4's status returns. Creation-time fields are immutable (see `references/finding-shape.md`). For dismissed findings, set `dismissal_stage` to `"Step 4 validation"` or `"Step 5 severity audit"` based on which step set the dismissal status — it renders as `**Dismissed at:**`. Partition by final status: validated (Step 5 `confirmed` or `downgraded`) becomes the main Findings section; dismissed (Step 4 `dismissed` or Step 5 `dismissed`) preserves original severity, original confidence, dismissal stage, and dismissal reason for rendering in the Dismissed block. -7. Format the report using the template in `references/report-template.md`. Cite every validated AND dismissed finding with full file path and line: `file/path.ext:{line}` (or `:{start}-{end}` for ranges). Omit any severity section with zero findings. If zero findings total, replace the Findings section with: "No findings found." For every rendered finding (validated and dismissed), populate the `**Caught by:**` line from the finding's `source_agent` field, translated to the friendly label per the table in `references/report-template.md`. Dismissed findings additionally render `**Original severity:**`, `**Original confidence:**`, `**Dismissed at:**`, and `**Dismissed because:**` per the template — past runs have silently dropped these, so do not omit any of them; per-finding traceability requires the full set. +7. Format the report using the template in `references/report-template.md`; `examples/sample-report.md` shows a complete rendered example, including the dismissed-finding stanza. Cite every validated AND dismissed finding with full file path and line: `file/path.ext:{line}` (or `:{start}-{end}` for ranges). Omit any severity section with zero findings. If zero findings total, replace the Findings section with: "No findings found." For every rendered finding (validated and dismissed), populate the `**Caught by:**` line from the finding's `source_agent` field, translated to the friendly label per the table in `references/report-template.md`. Dismissed findings additionally render `**Original severity:**`, `**Original confidence:**`, `**Dismissed at:**`, and `**Dismissed because:**` per the template — past runs have silently dropped these, so do not omit any of them; per-finding traceability requires the full set. 8. Print the full formatted report to the terminal. 9. Write the formatted report to the output directory resolved in **Output Location**. Do not test whether the directory exists, do not create it, and do not prompt the user — write directly. If the write fails because the caller-supplied path is invalid, surface the error as-is. After a successful write, print the full resolved path. File name: `code-review-{model}-PR-{number}.md` (PR mode), `code-review-{model}-{YYYY-MM-DD}.md` (local mode), `code-review-{model}-{branch}-{YYYY-MM-DD}.md` (branch comparison mode), or `code-review-{model}-{from-short}..{to-short}.md` (commit-range mode, where `{from-short}`/`{to-short}` are 7-char SHAs or shorter ref names). + + `{model}` is the resolved global model's nickname, never a dated model ID. Append `-mixed` when an explicit stage flag differs from the global model; the audit's sonnet default does not count. The report's Model Header follows its own rule — see `references/report-template.md`. diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/merge-walkthrough.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/merge-walkthrough.md new file mode 100644 index 00000000..e8d354bd --- /dev/null +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/merge-walkthrough.md @@ -0,0 +1,25 @@ +# Merge Walkthrough + +Condensed from a real run. Traces three findings through the Step 6 merge, including the trickiest path: a collateral finding created at Step 4 and dismissed at Step 5. + +Steps 2/3 produced two findings: `arch-1` (important, confidence 90, architect) and `quality-3` (refactor, confidence 90, quality). + +## After the Step 4 merge + +Step 4's return is [`step4-validation-return.json`](step4-validation-return.json). It dismisses `quality-3` as a duplicate and emits one collateral finding, `val-1`, as a full Finding object in the same array. The orchestrator inserts `val-1` into the master map — its creation-time fields come from that Finding object — and it joins Step 5's input. `quality-3` gets `dismissal_stage: "Step 4 validation"`. + +| id | severity | status | dismissal_stage | goes to Step 5? | +| --------- | --------- | --------- | ------------------- | --------------- | +| arch-1 | important | validated | — | yes | +| quality-3 | refactor | dismissed | "Step 4 validation" | no | +| val-1 | refactor | — | — | yes | + +## After the Step 5 merge + +Step 5's return is [`step5-severity-audit-return.json`](step5-severity-audit-return.json). It confirms `arch-1` and dismisses `val-1`, which gets `dismissal_stage: "Step 5 severity audit"`. Creation-time fields are untouched throughout. + +| id | severity | final status | dismissal_stage | renders in | +| --------- | --------- | ------------ | ----------------------- | --------------- | +| arch-1 | important | confirmed | — | Findings | +| quality-3 | refactor | dismissed | "Step 4 validation" | Dismissed block | +| val-1 | refactor | dismissed | "Step 5 severity audit" | Dismissed block | diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/sample-report.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/sample-report.md new file mode 100644 index 00000000..fea3f449 --- /dev/null +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/sample-report.md @@ -0,0 +1,52 @@ +# Code Review: Rename example plugin and refresh manifests (#123) + +**Date:** 2026-06-10 | **Reviewed by:** Claude Code | **Model:** opus (audit: sonnet) + +## Summary + +| Severity | Count | +| ------------ | ----- | +| 🛑 Blocker | 0 | +| ⚠️ Important | 1 | +| ♻️ Refactor | 0 | + +The rename is structurally sound — the new manifest, marketplace entry, and AGENT.md are internally consistent and correctly wired together. One issue holds it back: the renamed README retains the old plugin identity throughout its body (title, install command, usage examples), so users following it will reference a plugin name that no longer exists. + +## Findings + +### ⚠️ Important + +#### Renamed README retains old plugin identity throughout body + +`plugins/example/README.md:11` +**Caught by:** Architecture agent + +
Details + +The diff only edits the Overview sentence. Every other line retains the old plugin identity: the H1 title (line 1), the agent table row (line 11), and the install command (line 30) still reference the deleted plugin name. These contradict the renamed `plugin.json` and the new `AGENT.md`. Update the title, table row, and install command to the new name in this PR, or add the README to the documented deferral list. + +
+ +## Reviewed and Dismissed + +
🔍 2 initial findings dismissed after validation + +#### README overview still describes old plugin identity + +`plugins/example/README.md:5` +**Caught by:** Code quality agent +**Original severity:** ♻️ Refactor +**Original confidence:** 90/100 +**Dismissed at:** Step 4 validation +**Dismissed because:** Substantively covered by the architecture finding at higher severity; no distinct actionable scope beyond what that finding already requires. + +#### quality findings cite wrong file line for plugin.json description field + +`plugins/example/.claude-plugin/plugin.json:4` +**Caught by:** Validation agent (collateral) +**Original severity:** ♻️ Refactor +**Original confidence:** 100/100 +**Dismissed at:** Step 5 severity audit +**Dismissed because:** A meta-observation about sibling findings is not a code issue in the change under review. + +
diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/step4-validation-return.json b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/step4-validation-return.json new file mode 100644 index 00000000..34da6ac1 --- /dev/null +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/step4-validation-return.json @@ -0,0 +1,18 @@ +[ + { "id": "arch-1", "status": "validated" }, + { + "id": "quality-3", + "status": "dismissed", + "dismissal_reason": "Substantively covered by arch-1 at higher severity." + }, + { + "id": "val-1", + "file": "plugins/example/.claude-plugin/plugin.json", + "line": "4", + "severity": "refactor", + "confidence": 100, + "title": "quality findings cite wrong file line for plugin.json description field", + "detail": "Both citations are diff-offset artifacts; the description field is at file line 4.", + "source_agent": "validation" + } +] diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/step5-severity-audit-return.json b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/step5-severity-audit-return.json new file mode 100644 index 00000000..dfbc3162 --- /dev/null +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/step5-severity-audit-return.json @@ -0,0 +1,8 @@ +[ + { "id": "arch-1", "status": "confirmed", "final_severity": "important" }, + { + "id": "val-1", + "status": "dismissed", + "dismissal_reason": "A meta-observation about sibling findings is not a code issue in the change under review." + } +] diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/finding-shape.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/finding-shape.md index f6e77c32..9fee1eee 100644 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/finding-shape.md +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/finding-shape.md @@ -33,6 +33,8 @@ One entry per incoming finding, keyed by `id`: **Collateral findings** produced during Step 4 (per the collateral-change check) use the full **Finding object** schema above with `source_agent: "validation"` and `id: "val-N"`. They append to Step 5's input. +Extra fields beyond this schema are ignored by the merge — creation-time fields come only from the original Finding object, never from Step 4 or Step 5 returns. + ## Step 5 return (severity audit) One entry per incoming finding, keyed by `id`: @@ -51,3 +53,5 @@ One entry per incoming finding, keyed by `id`: - Creation-time fields — `severity`, `confidence`, `source_agent`, `title`, `detail`, `file`, `line` — are set by the Step 2/3 agent and **MUST NOT** be rewritten in Step 4, Step 5, or Step 6 merge. Step 4 and Step 5 returns carry only `id`, `status`, and disposition fields by design; the merge MUST preserve all creation-time fields from the original Step 2/3 finding. - For dismissed findings, the orchestrator records a `dismissal_stage` field on the master-map entry: `"Step 4 validation"` if Step 4 set the dismissal status, or `"Step 5 severity audit"` if Step 5 did. This field is rendered in the final report as `**Dismissed at:**`. - Step 6 partitions the master map by final status (validated vs dismissed); Steps 7–9 format, print, and write the report. + +A worked merge trace — including the collateral-finding path (`val-1` created at Step 4, dismissed at Step 5) — is in `examples/merge-walkthrough.md` with its Step 4/5 return JSON alongside. diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/report-template.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/report-template.md index edfa538f..5aef2f8d 100644 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/report-template.md +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/report-template.md @@ -1,5 +1,14 @@ # Report Template +## Model Header + +`{model}` is the resolved global model's nickname, never a dated model ID. When any stage's resolved model differs from the global model — including the audit's sonnet default — list each differing stage in parentheses: + +``` +**Model:** opus +**Model:** opus (security: fable, audit: sonnet) +``` + ## Severity Icons - 🛑 **Blocker** — Must fix before merge From 6c35599c7007178fc0a48928411d5aeacaca9692 Mon Sep 17 00:00:00 2001 From: Todd Martin <106564991+trmartin4@users.noreply.github.com> Date: Mon, 15 Jun 2026 11:47:13 -0400 Subject: [PATCH 02/13] feat(breakdowns): Breakdown initialization and spec definition skills (#141) * Initial skils for new breakdown. * PR feedback. * Clarifications on gate. * PR feedback. * Allowing MCP for querying Jira. * PR feedback. * More explicitly handle reading content. * Cleaned up process flow. * Remove spec-kit reference Co-authored-by: Mick Letofsky * Removed "the" from skill names. * Removed duplicate folder. * Removing Spec-Kit, again. * Updated to use folder structure to support separate files for tasks. * Prompt for breakdown location. * Added required tools. * PR feedback. * PR feedback. * Updated template reference. --------- Co-authored-by: Mick Letofsky --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index ed929738..c71eed83 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ A curated collection of plugins for AI-assisted development at Bitwarden. Enable | [bitwarden-shepherd](plugins/bitwarden-shepherd/) | 1.0.0 | Champion of a technical strategy — shepherds a TSI through evaluation into the funnel, then through to adoption | | [bitwarden-atlassian-tools](plugins/bitwarden-atlassian-tools/) | 2.2.7 | Read-only Atlassian access via MCP server with deep Jira issue research skill | | [bitwarden-code-review](plugins/bitwarden-code-review/) | 1.12.0 | Autonomous code review agent following Bitwarden engineering standards with GitHub integration | -| [bitwarden-delivery-tools](plugins/bitwarden-delivery-tools/) | 1.4.0 | Delivery lifecycle skills: initiative funnel navigation, work transitions, tech breakdowns and cross-team signoffs, commits, PRs, preflight, labeling | +| [bitwarden-delivery-tools](plugins/bitwarden-delivery-tools/) | 1.4.0 | Delivery lifecycle skills: initiative funnel navigation, work transitions, tech breakdowns and cross-team signoffs, commits, PRs, preflight, labeling | | [bitwarden-designer](plugins/bitwarden-designer/) | 0.1.0 | Product designer persona: Code of Conduct and 30/60/90 critique, critique facilitation; dispatches into bitwarden-design-tools | | [bitwarden-design-tools](plugins/bitwarden-design-tools/) | 0.1.0 | Design toolkit: content style guide, Figma Dev Mode MCP, Bitwarden brand application, handoff prep, Design System governance, Product and Design Jira | | [bitwarden-devops-engineer](plugins/bitwarden-devops-engineer/) | 0.1.3 | DevOps engineering assistant: workflow compliance linting, action security auditing, and org-wide CI/CD remediation | From daf8220007835cc9ef0b42e18d6f5a3f911cfea2 Mon Sep 17 00:00:00 2001 From: Mick Letofsky Date: Mon, 15 Jun 2026 11:31:24 +0200 Subject: [PATCH 03/13] Initial commit of improvements to multi-agent review --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index c71eed83..88b55912 100644 --- a/README.md +++ b/README.md @@ -14,8 +14,7 @@ A curated collection of plugins for AI-assisted development at Bitwarden. Enable | [bitwarden-designer](plugins/bitwarden-designer/) | 0.1.0 | Product designer persona: Code of Conduct and 30/60/90 critique, critique facilitation; dispatches into bitwarden-design-tools | | [bitwarden-design-tools](plugins/bitwarden-design-tools/) | 0.1.0 | Design toolkit: content style guide, Figma Dev Mode MCP, Bitwarden brand application, handoff prep, Design System governance, Product and Design Jira | | [bitwarden-devops-engineer](plugins/bitwarden-devops-engineer/) | 0.1.3 | DevOps engineering assistant: workflow compliance linting, action security auditing, and org-wide CI/CD remediation | -| [bitwarden-init](plugins/bitwarden-init/) | 1.2.0 | Initialize and enhance CLAUDE.md files with Bitwarden's standardized template format | -| [bitwarden-product-analyst](plugins/bitwarden-product-analyst/) | 0.1.5 | Product analyst agent for creating comprehensive Bitwarden requirements documents from multiple sources | +| [bitwarden-delivery-tools](plugins/bitwarden-delivery-tools/) | 1.4.0 | Delivery lifecycle skills: initiative funnel navigation, work transitions, tech breakdowns and cross-team signoffs, commits, PRs, preflight, labeling | | [bitwarden-security-engineer](plugins/bitwarden-security-engineer/) | 1.2.0 | Application security engineering: vulnerability triage, threat modeling, and secure code analysis | | [bitwarden-software-engineer](plugins/bitwarden-software-engineer/) | 1.0.0 | Software engineer agent for a Bitwarden product team. Implements stories, tasks, and bugs with code quality, performance, security, and team comms in mind. | | [claude-config-validator](plugins/claude-config-validator/) | 1.1.1 | Validates Claude Code configuration files for security, structure, and quality | From 34f8e5393ee8af5ddfc8808d7eec4aa138c9ae2a Mon Sep 17 00:00:00 2001 From: Mick Letofsky Date: Wed, 17 Jun 2026 13:47:17 +0200 Subject: [PATCH 04/13] Fix strange merge issue --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 88b55912..c71eed83 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,8 @@ A curated collection of plugins for AI-assisted development at Bitwarden. Enable | [bitwarden-designer](plugins/bitwarden-designer/) | 0.1.0 | Product designer persona: Code of Conduct and 30/60/90 critique, critique facilitation; dispatches into bitwarden-design-tools | | [bitwarden-design-tools](plugins/bitwarden-design-tools/) | 0.1.0 | Design toolkit: content style guide, Figma Dev Mode MCP, Bitwarden brand application, handoff prep, Design System governance, Product and Design Jira | | [bitwarden-devops-engineer](plugins/bitwarden-devops-engineer/) | 0.1.3 | DevOps engineering assistant: workflow compliance linting, action security auditing, and org-wide CI/CD remediation | -| [bitwarden-delivery-tools](plugins/bitwarden-delivery-tools/) | 1.4.0 | Delivery lifecycle skills: initiative funnel navigation, work transitions, tech breakdowns and cross-team signoffs, commits, PRs, preflight, labeling | +| [bitwarden-init](plugins/bitwarden-init/) | 1.2.0 | Initialize and enhance CLAUDE.md files with Bitwarden's standardized template format | +| [bitwarden-product-analyst](plugins/bitwarden-product-analyst/) | 0.1.5 | Product analyst agent for creating comprehensive Bitwarden requirements documents from multiple sources | | [bitwarden-security-engineer](plugins/bitwarden-security-engineer/) | 1.2.0 | Application security engineering: vulnerability triage, threat modeling, and secure code analysis | | [bitwarden-software-engineer](plugins/bitwarden-software-engineer/) | 1.0.0 | Software engineer agent for a Bitwarden product team. Implements stories, tasks, and bugs with code quality, performance, security, and team comms in mind. | | [claude-config-validator](plugins/claude-config-validator/) | 1.1.1 | Validates Claude Code configuration files for security, structure, and quality | From 5c63702f807bd0d44497ebac42a8ba3654ef7ab2 Mon Sep 17 00:00:00 2001 From: Mick Letofsky Date: Wed, 17 Jun 2026 17:12:31 +0200 Subject: [PATCH 05/13] Skill creator cleanup and remove tech-lead referencing --- .../SKILL.md | 9 +++-- .../examples/merge-walkthrough.md | 25 ------------- .../examples/step4-validation-return.json | 18 ---------- .../examples/step5-severity-audit-return.json | 8 ----- .../references/finding-shape.md | 36 +++++++++++++++++-- 5 files changed, 38 insertions(+), 58 deletions(-) delete mode 100644 plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/merge-walkthrough.md delete mode 100644 plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/step4-validation-return.json delete mode 100644 plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/step5-severity-audit-return.json diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md index d2e7508a..4895e425 100644 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md @@ -13,7 +13,6 @@ Execute a structured, multi-agent code review on a set of code changes. Follow t This skill depends on the following sibling plugins. If any are not installed, **abort the review with a clear error message** identifying the missing plugin — do not attempt to proceed with a degraded pipeline. -- **`bitwarden-tech-lead`** — provides the architecture review subagent. - **`bitwarden-security-engineer`** — provides security context and analysis skills. Before Step 1, verify each prerequisite is resolvable (verification is heuristic — if a subagent type fails to resolve, treat that as the signal). If a prerequisite is missing, print: @@ -146,7 +145,7 @@ Execute these steps in order. Do not skip, reorder, or combine steps. - **READ** `references/discovery-standards.md`. The Hygiene Sweep is referenced by name in the Step 3 Agent 1 prompt; Line Number Accuracy is propagated verbatim into every Step 2–5 subagent prompt. - **READ** `references/evaluation-standards.md`. Severity Levels, Do Not Flag, and Confidence Scoring are propagated verbatim into every Step 2–5 subagent prompt. -2. Launch a single architecture & pattern compliance agent using the `bitwarden-tech-lead:bitwarden-tech-lead` subagent type, with the resolved analysis model (see Model Selection). Give it the diff, the list of changed file paths, and — in PR mode only — the PR title and description. +2. Launch a single architecture & pattern compliance agent using the `general-purpose` subagent type, with the resolved analysis model (see Model Selection). Open the subagent prompt with: "You are a software architect reviewing code changes for architectural and pattern compliance." Give it the diff, the list of changed file paths, and — in PR mode only — the PR title and description. Unlike the diff agents in Step 3, this agent reads BEYOND the diff to check whether changes fit the codebase. @@ -158,9 +157,9 @@ Execute these steps in order. Do not skip, reorder, or combine steps. **Scope.** Raise pattern inconsistencies, architectural boundary violations, duplicated abstractions, and new conventions introduced where an established one applies. Do NOT raise correctness bugs, security issues, or code-quality concerns — those belong to Step 3. - Apply the Review Rules. Threshold ≥ 80. Emit findings as a JSON array per the Finding Shape schema. + Apply the Review Rules. Also include the **Hygiene Sweep** definition from `references/discovery-standards.md` in the subagent prompt — the Hygiene Sweep's stale-references and cross-site-inconsistency lenses complement the doc/code consistency pass and are within the architect's scope. Threshold ≥ 80. Emit findings as a JSON array per the Finding Shape schema. -3. Send all 3 Agent tool calls in a single message (do NOT use run_in_background). Launch the 3 agents as instructed below — Agents 1–2 with the resolved analysis model, Agent 3 with the resolved security model (see Model Selection). Each receives the diff and the Review Rules; each emits findings as a JSON array per the Finding Shape schema. Confidence Scoring from `references/evaluation-standards.md` applies to all three — threshold ≥ 80. In PR mode, pass the PR title and description only to Agent 3 per Context Partitioning — Agents 1 and 2 receive diff + Review Rules only. +3. Send all 3 Agent tool calls in a single message (**DO NOT** use run_in_background because the agents must run synchronously to guarantee findings are validated together at Step 4). Launch the 3 agents as instructed below — Agents 1–2 with the resolved analysis model, Agent 3 with the resolved security model (see Model Selection). Each receives the diff and the Review Rules; each emits findings as a JSON array per the Finding Shape schema. Confidence Scoring from `references/evaluation-standards.md` applies to all three — threshold ≥ 80. In PR mode, pass the PR title and description only to Agent 3 per Context Partitioning — Agents 1 and 2 receive diff + Review Rules only. **Agent 1: Code quality agent** Use the `general-purpose` subagent type. Read the diff as a senior engineer seeing it for the first time — surface anything that hurts correctness, clarity, or long-term maintainability, including code duplication, missing critical error handling, and inadequate test coverage. @@ -181,7 +180,7 @@ Execute these steps in order. Do not skip, reorder, or combine steps. 4. Launch a single `general-purpose` validation subagent for all findings from Steps 2 and 3, with the resolved validation model (see Model Selection). The subagent receives the diff fetched with the mode's diff command from Step 1, the full array of finding objects, the Review Rules, and — in PR mode only — the PR title and description. The subagent returns an array of Step 4 objects (one per input finding) per the Finding Shape schema. - **Chunking escape hatch.** If raw findings from Steps 2 and 3 number more than 25, partition them into chunks of ≤ 15 (preserving collateral context within each chunk; do not split a `source_agent` group across chunks if it would put related findings on opposite sides) and launch one validation subagent per chunk in a single message (do NOT use run_in_background). + **Chunking escape hatch.** If raw findings from Steps 2 and 3 number more than 25, partition them into chunks of ≤ 15 (preserving collateral context within each chunk; do not split a `source_agent` group across chunks if it would put related findings on opposite sides) and launch one validation subagent per chunk in a single message (**DO NOT** use run_in_background because the agents must run synchronously to guarantee accuracy). A finding is **dismissed** if ANY of the following are true: - It is a pre-existing finding, not introduced by this change. In commit-range mode, treat the cumulative diff of `..` as "this change" and the parent of `` as the pre-existing baseline. diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/merge-walkthrough.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/merge-walkthrough.md deleted file mode 100644 index e8d354bd..00000000 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/merge-walkthrough.md +++ /dev/null @@ -1,25 +0,0 @@ -# Merge Walkthrough - -Condensed from a real run. Traces three findings through the Step 6 merge, including the trickiest path: a collateral finding created at Step 4 and dismissed at Step 5. - -Steps 2/3 produced two findings: `arch-1` (important, confidence 90, architect) and `quality-3` (refactor, confidence 90, quality). - -## After the Step 4 merge - -Step 4's return is [`step4-validation-return.json`](step4-validation-return.json). It dismisses `quality-3` as a duplicate and emits one collateral finding, `val-1`, as a full Finding object in the same array. The orchestrator inserts `val-1` into the master map — its creation-time fields come from that Finding object — and it joins Step 5's input. `quality-3` gets `dismissal_stage: "Step 4 validation"`. - -| id | severity | status | dismissal_stage | goes to Step 5? | -| --------- | --------- | --------- | ------------------- | --------------- | -| arch-1 | important | validated | — | yes | -| quality-3 | refactor | dismissed | "Step 4 validation" | no | -| val-1 | refactor | — | — | yes | - -## After the Step 5 merge - -Step 5's return is [`step5-severity-audit-return.json`](step5-severity-audit-return.json). It confirms `arch-1` and dismisses `val-1`, which gets `dismissal_stage: "Step 5 severity audit"`. Creation-time fields are untouched throughout. - -| id | severity | final status | dismissal_stage | renders in | -| --------- | --------- | ------------ | ----------------------- | --------------- | -| arch-1 | important | confirmed | — | Findings | -| quality-3 | refactor | dismissed | "Step 4 validation" | Dismissed block | -| val-1 | refactor | dismissed | "Step 5 severity audit" | Dismissed block | diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/step4-validation-return.json b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/step4-validation-return.json deleted file mode 100644 index 34da6ac1..00000000 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/step4-validation-return.json +++ /dev/null @@ -1,18 +0,0 @@ -[ - { "id": "arch-1", "status": "validated" }, - { - "id": "quality-3", - "status": "dismissed", - "dismissal_reason": "Substantively covered by arch-1 at higher severity." - }, - { - "id": "val-1", - "file": "plugins/example/.claude-plugin/plugin.json", - "line": "4", - "severity": "refactor", - "confidence": 100, - "title": "quality findings cite wrong file line for plugin.json description field", - "detail": "Both citations are diff-offset artifacts; the description field is at file line 4.", - "source_agent": "validation" - } -] diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/step5-severity-audit-return.json b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/step5-severity-audit-return.json deleted file mode 100644 index dfbc3162..00000000 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/examples/step5-severity-audit-return.json +++ /dev/null @@ -1,8 +0,0 @@ -[ - { "id": "arch-1", "status": "confirmed", "final_severity": "important" }, - { - "id": "val-1", - "status": "dismissed", - "dismissal_reason": "A meta-observation about sibling findings is not a code issue in the change under review." - } -] diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/finding-shape.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/finding-shape.md index 9fee1eee..4491de9d 100644 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/finding-shape.md +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/finding-shape.md @@ -35,6 +35,28 @@ One entry per incoming finding, keyed by `id`: Extra fields beyond this schema are ignored by the merge — creation-time fields come only from the original Finding object, never from Step 4 or Step 5 returns. +```json +// Example Step 4 return +[ + { "id": "arch-1", "status": "validated" }, + { + "id": "quality-3", + "status": "dismissed", + "dismissal_reason": "Substantively covered by arch-1 at higher severity." + }, + { + "id": "val-1", + "file": "plugins/example/.claude-plugin/plugin.json", + "line": "4", + "severity": "refactor", + "confidence": 100, + "title": "quality findings cite wrong file line for plugin.json description field", + "detail": "Both citations are diff-offset artifacts; the description field is at file line 4.", + "source_agent": "validation" + } +] +``` + ## Step 5 return (severity audit) One entry per incoming finding, keyed by `id`: @@ -46,6 +68,18 @@ One entry per incoming finding, keyed by `id`: | `final_severity` | string | Severity value. Omit when `status = "dismissed"`. | | `dismissal_reason` | string | Present only when `status = "dismissed"`. | +```json +// Example Step 5 return +[ + { "id": "arch-1", "status": "confirmed", "final_severity": "important" }, + { + "id": "val-1", + "status": "dismissed", + "dismissal_reason": "A meta-observation about sibling findings is not a code issue in the change under review." + } +] +``` + ## Orchestrator behavior - Maintains a master finding map keyed by `id`. @@ -53,5 +87,3 @@ One entry per incoming finding, keyed by `id`: - Creation-time fields — `severity`, `confidence`, `source_agent`, `title`, `detail`, `file`, `line` — are set by the Step 2/3 agent and **MUST NOT** be rewritten in Step 4, Step 5, or Step 6 merge. Step 4 and Step 5 returns carry only `id`, `status`, and disposition fields by design; the merge MUST preserve all creation-time fields from the original Step 2/3 finding. - For dismissed findings, the orchestrator records a `dismissal_stage` field on the master-map entry: `"Step 4 validation"` if Step 4 set the dismissal status, or `"Step 5 severity audit"` if Step 5 did. This field is rendered in the final report as `**Dismissed at:**`. - Step 6 partitions the master map by final status (validated vs dismissed); Steps 7–9 format, print, and write the report. - -A worked merge trace — including the collateral-finding path (`val-1` created at Step 4, dismissed at Step 5) — is in `examples/merge-walkthrough.md` with its Step 4/5 return JSON alongside. From d29ac3c8879707248e11f739368d8749ec1ee96e Mon Sep 17 00:00:00 2001 From: Mick Letofsky Date: Thu, 18 Jun 2026 10:04:55 +0200 Subject: [PATCH 06/13] Enhance readme and clarified changelog --- plugins/bitwarden-code-review/CHANGELOG.md | 11 ++++------- plugins/bitwarden-code-review/README.md | 14 ++++++-------- 2 files changed, 10 insertions(+), 15 deletions(-) diff --git a/plugins/bitwarden-code-review/CHANGELOG.md b/plugins/bitwarden-code-review/CHANGELOG.md index 1aa18b86..fd2c10a0 100644 --- a/plugins/bitwarden-code-review/CHANGELOG.md +++ b/plugins/bitwarden-code-review/CHANGELOG.md @@ -5,17 +5,14 @@ All notable changes to the Bitwarden Code Review Plugin will be documented in th The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [1.12.0] - 2026-06-10 - -### Added - -- `performing-multi-agent-code-review`: per-stage model flags (`--model-analysis`, `--model-security`, `--model-validation`, `--model-audit`) with a security floor rule -- `performing-multi-agent-code-review`: `examples/` (sample report + merge walkthrough) +## [1.12.0] - 2026-06-18 ### Changed -- `performing-multi-agent-code-review`: subagents inherit the session model instead of forcing Opus; severity audit defaults to Sonnet +- `performing-multi-agent-code-review`: per-stage model flags (`--model-analysis`, `--model-security`, `--model-validation`, `--model-audit`) with a security floor rule; subagents inherit the session model instead of forcing Opus; severity audit defaults to Sonnet. +- `performing-multi-agent-code-review`: updated the architecture subagent to be a general agent type with a stronger prompt. Reduce complexity by not requiring engineers to install plugins they don't need. Also found minimal to zero actual benefit to using the tech-lead agent. - `performing-multi-agent-code-review`: `{model}` in file names and report headers is the resolved model nickname; `-mixed` suffix when stage flags differ +- `performing-multi-agent-code-review`: enhanced the reference documents to provide example shapes of the DTOs that pass data between the subagents and the orchestration agent. - Improved the README.md to better describe the purpose and usage of the multi-agent review. ## [1.11.0] - 2026-05-12 diff --git a/plugins/bitwarden-code-review/README.md b/plugins/bitwarden-code-review/README.md index 38fa6dbb..d0f04a5f 100644 --- a/plugins/bitwarden-code-review/README.md +++ b/plugins/bitwarden-code-review/README.md @@ -6,7 +6,7 @@ AI-powered code review for Bitwarden — an autonomous review agent for everyday This plugin provides an autonomous code review agent that conducts thorough, professional code reviews following Bitwarden's organizational standards. The agent focuses on security, correctness, and high-value feedback while maintaining a high signal-to-noise ratio. -It offers two complementary lenses. The autonomous `bitwarden-code-reviewer` agent reviews a pull request the way a human reviewer would and posts inline comments to GitHub. The `performing-multi-agent-code-review` skill takes a different approach — it has Claude evaluate code _as Claude_ across a pipeline of specialized sub-agents, trading human-style commentary for depth and high-signal findings written to a local report. Use the agent for everyday PR review; reach for the skill when a change is complex or security-sensitive enough that one reviewer can't hold every concern in view at once. +It offers two complementary lenses. The autonomous `bitwarden-code-reviewer` agent reviews a pull request the way a human reviewer would and posts inline comments to GitHub. The `performing-multi-agent-code-review` skill takes a different approach — it has Claude evaluate code _as Claude_ across a pipeline of specialized sub-agents, trading human-style commentary for depth and high-signal findings written to a local report. ## Features @@ -16,9 +16,7 @@ It offers two complementary lenses. The autonomous `bitwarden-code-reviewer` age - **Security-First Approach**: Prioritizes security vulnerabilities, data exposure, and authentication issues - **Structured Thinking**: Uses explicit reasoning blocks to improve review quality and consistency - **Confidence Scoring**: Pre-filters findings with a 0-100 confidence score (≥75 threshold) before validation to reduce false positives -- **Multi-Agent Review Pipeline**: A separate `performing-multi-agent-code-review` skill runs six specialized sub-agents — architecture compliance, code quality, bug analysis, security & logic, validation, and severity audit — for depth on complex or security-sensitive changes -- **Any Model, Per Stage**: Per-stage model flags dial cost against depth; a security floor prevents the security pass from ever running below your global model -- **Local-Only, Multi-Mode**: Reviews draft or published PRs, local changes, branch comparisons, or commit ranges — and writes findings to a local report, never to GitHub +- **Multi-Agent Review Pipeline**: A separate `performing-multi-agent-code-review` skill runs six specialized sub-agents — architecture compliance, code quality, bug analysis, security & logic, validation, and severity audit — for depth on complex changes ## Skills @@ -44,7 +42,7 @@ See [`classifying-review-findings`](./skills/classifying-review-findings/SKILL.m ### Directory Structure -``` +```bash bitwarden-code-review/ ├── .claude/ │ └── settings.json # Security boundaries @@ -91,13 +89,13 @@ Use the bitwarden-code-reviewer agent to review this PR ### Multi-agent code review skill -The `performing-multi-agent-code-review` skill is built for changes complex or security-sensitive enough that one reviewer — human or AI — can't hold every concern in view at once. Instead of a single pass, it puts a team of specialized agents on the same diff, each reviewing from its own perspective — architecture and pattern compliance, code quality, bugs, and security & logic — then validates and severity-audits everything they surface. It started from [Anthropic's `code-review` command](https://github.com/anthropics/claude-code/blob/main/plugins/code-review/commands/code-review.md) and was rebuilt for the control that command lacks: it wires in our own `bitwarden-security-engineer` and `bitwarden-tech-lead` plugins, runs on any model you choose, reviews **draft or published** PRs — plus local changes, branch comparisons, and commit ranges — and writes its report to a local file instead of posting to GitHub. +The `performing-multi-agent-code-review` skill is built for complex changes where one reviewer — human or AI — can't hold every concern in view at once. Instead of a single pass, it puts a team of specialized agents on the same diff, each reviewing from its own perspective — architecture and pattern compliance, code quality, bugs, and security & logic — then validates and severity-audits everything they surface. It started from [Anthropic's `code-review` command](https://github.com/anthropics/claude-code/blob/main/plugins/code-review/commands/code-review.md) and was rebuilt for the control that command lacks: it wires in our own `bitwarden-security-engineer` plugin, runs on any model you choose, reviews **draft or published** PRs — plus local changes, branch comparisons, and commit ranges — and writes its report to a local file instead of posting to GitHub. It's a deliberately different lens from the `bitwarden-code-reviewer` agent. That agent reviews the way a human reviewer would; this skill has **Claude evaluate code as Claude** — optimized for signal, not parity. It spotlights blockers, real bugs, and known-bad patterns, and stays out of the nit-picking lane. Each potential finding is scored 0–100, and only those clearing an 80-confidence bar are raised; every one it keeps is cited with file, line, and the agent that caught it. What clears the bar still gets challenged — the validation and severity-audit agents can overturn a finding — but an overturned finding is never dropped: it moves into a collapsed **Reviewed and Dismissed** section, tagged with its original severity, original confidence, and the reason it was set aside. That section is deliberate — a human sees everything the first pass surfaced and can judge when a dismissal was wrong and the original call was right; without it, that signal is lost. -**Relaying the "why" to each teammate.** Those agents start cold — a sub-agent inherits none of the main session's context — so the skill briefs every one of them explicitly. Each receives Bitwarden's zero-knowledge invariant and the P01–P06 threat-model directive verbatim. Feature context — the PR's intent, the ticket, the product framing — is handed out deliberately: the architecture and security agents get the full "why" so they can reason adversarially from intent, while the quality and bug agents see only the diff, so their first read stays unbiased. +Agents start cold — a sub-agent inherits none of the main session's context — so the skill briefs every one of them explicitly. Each receives Bitwarden's zero-knowledge invariant and the P01–P06 threat-model directive verbatim. Feature context — the PR's intent, the ticket, the product framing — is handed out deliberately: the architecture and security agents get the full "why" so they can reason adversarially from intent, while the quality and bug agents see only the diff, so their first read stays unbiased. -It slots cleanly into a [Claude Code agent-teams](https://code.claude.com/docs/en/agent-teams) loop — run it at the end of a coding session, address what it finds, refactor — or as a depth-of-review gate on a draft PR before you publish. +The `performing-multi-agent-code-review` skill slots cleanly into a [Claude Code agent-teams](https://code.claude.com/docs/en/agent-teams) loop — run it at the end of a coding session, address what it finds, refactor — or as a depth-of-review gate on a draft PR before you publish. #### Examples From 2765fd741a8c7a9c92ce087b6c35d8c29b43f409 Mon Sep 17 00:00:00 2001 From: Mick Letofsky Date: Thu, 18 Jun 2026 10:25:29 +0200 Subject: [PATCH 07/13] Tighten the skill description as advised --- .../skills/performing-multi-agent-code-review/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md index 4895e425..e00d6a56 100644 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md @@ -1,6 +1,6 @@ --- name: performing-multi-agent-code-review -description: Perform a rigorous, multi-agent code review with architecture-compliance, parallel quality/security analysis, finding validation, and severity audit. Use whenever the user asks for a structured, deep, thorough, multi-pass, or multi-agent code review — or a review that includes architecture/pattern compliance, confidence-scored findings, or a severity audit — even if they don't say the exact phrase "multi-agent". Prefer this over a single-agent review when the user wants high-signal findings with validation. Also use whenever the user asks for a code review across a commit range, time window, or N most recent commits in a locally checked-out repo (e.g. "review the last week of commits in bitwarden/server", "review the last 20 commits", "review changes since 2026-04-23") — these route to the commit-range mode below. +description: Perform a rigorous, multi-agent code review with architecture-compliance, parallel quality/security analysis, finding validation, and severity audit. Use when the user asks for a structured, deep, thorough, multi-pass, or multi-agent code review — or a review that includes architecture/pattern compliance, confidence-scored findings, or a severity audit. Use when the user asks for a code review across a commit range, time window, or N most recent commits in a locally checked-out repo. allowed-tools: "Bash(gh pr diff:*), Bash(gh pr view:*), Bash(git diff:*), Bash(git status:*), Bash(git rev-parse:*), Bash(git log:*), Read, Write, Grep, Glob, Skill, AskUserQuestion" argument-hint: "[pr-number | commit-range] [--model ] [--model-analysis ] [--model-security ] [--model-validation ] [--model-audit ] [--output-dir ]" --- From 402406ef1381b853c94d4def0ec0e78e8e1d6f8d Mon Sep 17 00:00:00 2001 From: Mick Letofsky Date: Thu, 18 Jun 2026 18:26:05 +0200 Subject: [PATCH 08/13] Tighten wording, reduce duplicates, surgical changes --- plugins/bitwarden-code-review/CHANGELOG.md | 1 + plugins/bitwarden-code-review/README.md | 2 +- .../SKILL.md | 52 +++++++------------ 3 files changed, 22 insertions(+), 33 deletions(-) diff --git a/plugins/bitwarden-code-review/CHANGELOG.md b/plugins/bitwarden-code-review/CHANGELOG.md index fd2c10a0..dbd84627 100644 --- a/plugins/bitwarden-code-review/CHANGELOG.md +++ b/plugins/bitwarden-code-review/CHANGELOG.md @@ -13,6 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - `performing-multi-agent-code-review`: updated the architecture subagent to be a general agent type with a stronger prompt. Reduce complexity by not requiring engineers to install plugins they don't need. Also found minimal to zero actual benefit to using the tech-lead agent. - `performing-multi-agent-code-review`: `{model}` in file names and report headers is the resolved model nickname; `-mixed` suffix when stage flags differ - `performing-multi-agent-code-review`: enhanced the reference documents to provide example shapes of the DTOs that pass data between the subagents and the orchestration agent. +- `performing-multi-agent-code-review`: removed duplicate and unnecessarily verbose instructions - Improved the README.md to better describe the purpose and usage of the multi-agent review. ## [1.11.0] - 2026-05-12 diff --git a/plugins/bitwarden-code-review/README.md b/plugins/bitwarden-code-review/README.md index d0f04a5f..9c8a1152 100644 --- a/plugins/bitwarden-code-review/README.md +++ b/plugins/bitwarden-code-review/README.md @@ -108,7 +108,7 @@ Invoke the skill explicitly with the slash command, or with natural language — **2. A specific pull request.** Pass a PR number or URL; it reviews draft and published PRs alike: ```markdown -/bitwarden-code-review:performing-multi-agent-code-review https://github.com/bitwarden/ios/pulls/1234567 --model opus +/bitwarden-code-review:performing-multi-agent-code-review https://github.com/bitwarden/ios/pull/1234567 --model opus ``` **3. Changes over a period of time.** Commit-range mode reviews the cumulative diff across a time window, commit count, or explicit ref pair. Run it from inside the target repo; it confirms the resolved range with you before spending tokens: diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md index e00d6a56..e4d1b012 100644 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md @@ -11,27 +11,19 @@ Execute a structured, multi-agent code review on a set of code changes. Follow t ## Prerequisites -This skill depends on the following sibling plugins. If any are not installed, **abort the review with a clear error message** identifying the missing plugin — do not attempt to proceed with a degraded pipeline. +This skill depends on the following sibling plugins. -- **`bitwarden-security-engineer`** — provides security context and analysis skills. +- **`bitwarden-security-engineer`** -Before Step 1, verify each prerequisite is resolvable (verification is heuristic — if a subagent type fails to resolve, treat that as the signal). If a prerequisite is missing, print: +Before Step 1, verify each prerequisite plugin is installed. The signal is resolvability — a required subagent type or skill that does not appear in your available tooling means the plugin is missing. If any is missing, **abort with the message below** — do not proceed with a degraded pipeline. > Prerequisite plugin `` is not installed. Install it and retry. Review aborted. -…and stop. - -## Mode - -Read `references/modes.md`. Loaded in Step 1; the orchestrator determines the mode from the invocation, runs the resolution sequence (commit-range mode only), and uses the matching diff-source commands to populate Step 1's gathered context. Modes are orchestrator-only and not propagated to subagents. - ## Output Location -Resolve immediately upon invocation — before Step 1 begins. The resolved path is used verbatim in Step 9. - -If `--output-dir ` is present in `$ARGUMENTS`, use that path verbatim — do not test whether it exists, do not prompt the user to confirm, and do not offer alternatives. If the caller passed a bad path, the write in Step 9 will fail and surface the error; that is the intended behavior. - -Otherwise, default to `${CLAUDE_PLUGIN_DATA}/code-reviews/` — organized across projects, never git-tracked. +If `--output-dir ` is present in `$ARGUMENTS`, resolve immediately upon invocation and use that path verbatim. Otherwise, default to `${CLAUDE_PLUGIN_DATA}/code-reviews/`. +Do not test whether it exists, prompt the user to confirm, nor offer alternatives. +If the caller passed a bad path, the write in Step 9 will fail and surface the error. ## Model Selection @@ -47,8 +39,7 @@ The **global model** is `--model` if specified, otherwise the session's model. | Validation | Step 4 | `--model-validation` | global | | Severity audit | Step 5 | `--model-audit` | **sonnet** | -Each stage resolves to its flag if present, otherwise its default. -Exception: an explicit `--model` overrides the audit's sonnet default. +Each stage resolves to its flag if present, otherwise its default; an explicit `--model` also overrides the audit's sonnet default. **Security floor.** `--model-security` may only pin at or above the global model. On a lower pin, run security at the global model and note the ignored pin in the announcement. Rationale: P01–P06 evaluation quality must not silently degrade. @@ -62,14 +53,11 @@ Exception: an explicit `--model` overrides the audit's sonnet default. Applies to all agents and subagents. -- Models: resolved per stage in Model Selection above. - Don't write to GitHub. All findings go to a local markdown file. - Tool discipline (see Orchestration → Tool Discipline) applies to the main agent and is propagated verbatim to every subagent. Rationale for the WebFetch/WebSearch ban: bypasses `gh` auth, skips audit trails, can return stale cached pages. ## Orchestration -Applied when launching subagents. - ### Project Preamble Propagation Subagents do not inherit the main agent's CLAUDE.md context. Every subagent prompt in Steps 2–5 MUST open with the two required blocks below, in order, followed by the conditional block if it applies. @@ -84,7 +72,7 @@ Subagents do not inherit the main agent's CLAUDE.md context. Every subagent prom > > **Threat-model directive.** Evaluate every change against P01–P06 and the requirements under VD/EK/AT/SC/TC (loaded via the `bitwarden-security-context` skill per the preceding block). For each finding that touches vault data, keys, auth tokens, or user authenticity, name the principle or category it implicates. -**Conditional — repo-specific forwarding.** A repo's checked-in `CLAUDE.md` may contain a section that explicitly instructs you to forward it to subagents (e.g., _"when spawning subagents, include..."_ or _"propagate this to subagents"_). If so, paste that section verbatim. If not, the two required blocks alone suffice. +**Conditional — repo-specific forwarding.** A repo's checked-in `CLAUDE.md` may contain a section that explicitly instructs you to forward it to subagents. If so, paste that section verbatim. ### Tool Discipline @@ -109,15 +97,15 @@ Feature context — issue descriptions, Jira tickets, PR history, removed-predec - **Context-allowed** (Step 2 architecture agent; Step 3 Agent 3 security & logic): pass full feature context. These agents think adversarially from intent. - **Context-forbidden** (Step 3 Agent 1 code quality; Step 3 Agent 2 bug analysis): **ONLY** pass the diff and the Review Rules. **DO NOT** paste issue summaries, Jira tickets, or PR description prose into these prompts. -- **Style-matching requirement.** The main agent's tone and framing across parallel agents leaks — a rich-context prompt for the security agent alongside a bare prompt for the bug agent still implicitly biases the bug agent through the shared authored reality. When drafting context-forbidden prompts, match the terse style of the diff-only sibling prompts; do not echo the framing of the context-allowed siblings. +- **Style-matching requirement.** The main agent's tone and framing across parallel agents leaks — a rich-context prompt for the security agent alongside a bare prompt for the bug agent still implicitly frames how the bug agent reads the diff. When drafting context-forbidden prompts, match the terse style of the diff-only sibling prompts; do not echo the framing of the context-allowed siblings. ## Discovery Standards -Read `references/discovery-standards.md`. Referenced by Step 2 (architect doc/code consistency pass) and Step 3 Agent 1 (Hygiene Sweep). The Line Number Accuracy rule is propagated verbatim into every Step 2–5 subagent prompt. +Read `references/discovery-standards.md`. Referenced by Step 2 (architect doc/code consistency pass) and Step 3 Agent 1 (Hygiene Sweep). ## Evaluation Standards -Read `references/evaluation-standards.md`. Severity Levels, Do Not Flag, and Confidence Scoring are propagated verbatim into every Step 2–5 subagent prompt; the Finding Shape schema lives in `references/finding-shape.md` and is also propagated verbatim. +Read `references/evaluation-standards.md`. Defines Severity Levels, Do Not Flag, and Confidence Scoring; the Finding Shape schema lives in `references/finding-shape.md`. ## Review Rules @@ -141,9 +129,9 @@ Execute these steps in order. Do not skip, reorder, or combine steps. - Determine the mode per `references/modes.md`. Fetch the list of changed files with the mode's command: `gh pr diff {number} --name-only` (PR), `git diff HEAD --name-only` (local), `git diff origin/HEAD...HEAD --name-only` (branch comparison), or `git diff .. --name-only` (commit range). In PR mode, also fetch the title and description with `gh pr view`. - **READ** CLAUDE.md, README.md, and any other relevant .md files in or near the directories containing modified files. - **READ** `references/report-template.md` for formatting the final report in Step 7. - - **READ** `references/finding-shape.md`. Its contents are pasted verbatim into every Step 2–5 subagent prompt. - - **READ** `references/discovery-standards.md`. The Hygiene Sweep is referenced by name in the Step 3 Agent 1 prompt; Line Number Accuracy is propagated verbatim into every Step 2–5 subagent prompt. - - **READ** `references/evaluation-standards.md`. Severity Levels, Do Not Flag, and Confidence Scoring are propagated verbatim into every Step 2–5 subagent prompt. + - **READ** `references/finding-shape.md`. + - **READ** `references/discovery-standards.md`. The Hygiene Sweep is referenced by name in the Step 3 Agent 1 prompt. + - **READ** `references/evaluation-standards.md`. 2. Launch a single architecture & pattern compliance agent using the `general-purpose` subagent type, with the resolved analysis model (see Model Selection). Open the subagent prompt with: "You are a software architect reviewing code changes for architectural and pattern compliance." Give it the diff, the list of changed file paths, and — in PR mode only — the PR title and description. @@ -153,13 +141,13 @@ Execute these steps in order. Do not skip, reorder, or combine steps. - Read the full files being modified (not just diff hunks) to understand surrounding context. - Read CLAUDE.md, README.md, and other relevant .md files in or near the modified directories; verify each change complies with explicit project rules. - Use Glob and Grep to find how similar code is structured elsewhere in the codebase. - - **Doc/code consistency pass** — flag contradictions this diff creates between the code and same-repo documentation, configuration, or agent-facing files (e.g., a `CLAUDE.md` entry describing handler behavior the diff now changes; a README example that no longer matches the new signature; `.claude/` agent instructions referencing behavior the PR removes). Only flag divergence this change creates or worsens — do not audit pre-existing drift. + - **Doc/code consistency pass** — flag contradictions this diff creates between the code and same-repo documentation, configuration, or agent-facing files — README.md and CLAUDE.md most of all. Only flag divergence this change creates or worsens — do not audit pre-existing drift. **Scope.** Raise pattern inconsistencies, architectural boundary violations, duplicated abstractions, and new conventions introduced where an established one applies. Do NOT raise correctness bugs, security issues, or code-quality concerns — those belong to Step 3. - Apply the Review Rules. Also include the **Hygiene Sweep** definition from `references/discovery-standards.md` in the subagent prompt — the Hygiene Sweep's stale-references and cross-site-inconsistency lenses complement the doc/code consistency pass and are within the architect's scope. Threshold ≥ 80. Emit findings as a JSON array per the Finding Shape schema. + Apply the Review Rules. Also include the **Hygiene Sweep** definition from `references/discovery-standards.md` — its lenses are within the architect's scope. Threshold ≥ 80. Emit findings as a JSON array per the Finding Shape schema. -3. Send all 3 Agent tool calls in a single message (**DO NOT** use run_in_background because the agents must run synchronously to guarantee findings are validated together at Step 4). Launch the 3 agents as instructed below — Agents 1–2 with the resolved analysis model, Agent 3 with the resolved security model (see Model Selection). Each receives the diff and the Review Rules; each emits findings as a JSON array per the Finding Shape schema. Confidence Scoring from `references/evaluation-standards.md` applies to all three — threshold ≥ 80. In PR mode, pass the PR title and description only to Agent 3 per Context Partitioning — Agents 1 and 2 receive diff + Review Rules only. +3. Send all 3 Agent tool calls in a single message (**DO NOT** use run_in_background because the agents must run synchronously to guarantee findings are validated together at Step 4). Agents 1–2 use the resolved analysis model, Agent 3 uses the resolved security model (see Model Selection). Each receives the diff and the Review Rules; each emits findings as a JSON array per the Finding Shape schema. Confidence Scoring from `references/evaluation-standards.md` applies to all three — threshold ≥ 80. In PR mode, pass the PR title and description only to Agent 3 per Context Partitioning — Agents 1 and 2 receive diff + Review Rules only. **Agent 1: Code quality agent** Use the `general-purpose` subagent type. Read the diff as a senior engineer seeing it for the first time — surface anything that hurts correctness, clarity, or long-term maintainability, including code duplication, missing critical error handling, and inadequate test coverage. @@ -187,7 +175,7 @@ Execute these steps in order. Do not skip, reorder, or combine steps. - **Bugs**: The problem does not actually exist in the code (e.g., the variable is not truly undefined, the logic error does not actually produce wrong results) - It is a nitpick that a senior engineer would not flag in a real code review - It would be caught by a linter (**do not run** the linter to verify) - - It is a general code quality concern that wouldn't be flagged in a real code review. In other words, do not state generics. All findings **MUST** be specific and actionable. + - It is a vague code quality concern — findings **MUST** be specific and actionable. **Collateral-change check.** When a finding is about to be dismissed as "deliberate divergence from an established pattern" or "documented exception," before dismissing it check whether supporting code was updated _consistent with_ the divergence. Specifically, scan the diff for: - Allowlist, registry, or lookup-table entries that assume the old pattern and are now stale or dead. @@ -205,11 +193,11 @@ Execute these steps in order. Do not skip, reorder, or combine steps. 6. Merge all Step 4 and Step 5 returns by `id` into the master finding map. Before merging Step 5 returns, insert the full Finding object for each Step 4 collateral finding (`source_agent: "validation"`, `id: "val-N"`) into the master map — their creation-time fields come from those Finding objects, not from Step 4's status returns. Creation-time fields are immutable (see `references/finding-shape.md`). For dismissed findings, set `dismissal_stage` to `"Step 4 validation"` or `"Step 5 severity audit"` based on which step set the dismissal status — it renders as `**Dismissed at:**`. Partition by final status: validated (Step 5 `confirmed` or `downgraded`) becomes the main Findings section; dismissed (Step 4 `dismissed` or Step 5 `dismissed`) preserves original severity, original confidence, dismissal stage, and dismissal reason for rendering in the Dismissed block. -7. Format the report using the template in `references/report-template.md`; `examples/sample-report.md` shows a complete rendered example, including the dismissed-finding stanza. Cite every validated AND dismissed finding with full file path and line: `file/path.ext:{line}` (or `:{start}-{end}` for ranges). Omit any severity section with zero findings. If zero findings total, replace the Findings section with: "No findings found." For every rendered finding (validated and dismissed), populate the `**Caught by:**` line from the finding's `source_agent` field, translated to the friendly label per the table in `references/report-template.md`. Dismissed findings additionally render `**Original severity:**`, `**Original confidence:**`, `**Dismissed at:**`, and `**Dismissed because:**` per the template — past runs have silently dropped these, so do not omit any of them; per-finding traceability requires the full set. +7. Format the report using the template in `references/report-template.md`; `examples/sample-report.md` shows a complete rendered example, including the dismissed-finding stanza. Cite every validated AND dismissed finding with full file path and line: `file/path.ext:{line}` (or `:{start}-{end}` for ranges). Omit any severity section with zero findings. If zero findings total, replace the Findings section with: "No findings found." For every rendered finding (validated and dismissed), populate the `**Caught by:**` line from the finding's `source_agent` field, translated to the friendly label per the table in `references/report-template.md`. Dismissed findings additionally render `**Original severity:**`, `**Original confidence:**`, `**Dismissed at:**`, and `**Dismissed because:**` per the template — past runs have silently dropped these, so do not omit any of them. 8. Print the full formatted report to the terminal. -9. Write the formatted report to the output directory resolved in **Output Location**. Do not test whether the directory exists, do not create it, and do not prompt the user — write directly. If the write fails because the caller-supplied path is invalid, surface the error as-is. After a successful write, print the full resolved path. +9. Write the formatted report to the output directory resolved in **Output Location**. Do not test if the directory exists. Do not attempt to create the directory. Write the file directly. If the write fails then surface the error as-is. After a successful write, print the full resolved path. File name: `code-review-{model}-PR-{number}.md` (PR mode), `code-review-{model}-{YYYY-MM-DD}.md` (local mode), `code-review-{model}-{branch}-{YYYY-MM-DD}.md` (branch comparison mode), or `code-review-{model}-{from-short}..{to-short}.md` (commit-range mode, where `{from-short}`/`{to-short}` are 7-char SHAs or shorter ref names). From 4a8fee4279ed37a12b713cc5c35042b6ef065eda Mon Sep 17 00:00:00 2001 From: Mick Letofsky Date: Thu, 18 Jun 2026 18:52:05 +0200 Subject: [PATCH 09/13] Minor local code review changes --- plugins/bitwarden-code-review/README.md | 2 +- .../references/discovery-standards.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/plugins/bitwarden-code-review/README.md b/plugins/bitwarden-code-review/README.md index 9c8a1152..71c7aedb 100644 --- a/plugins/bitwarden-code-review/README.md +++ b/plugins/bitwarden-code-review/README.md @@ -114,7 +114,7 @@ Invoke the skill explicitly with the slash command, or with natural language — **3. Changes over a period of time.** Commit-range mode reviews the cumulative diff across a time window, commit count, or explicit ref pair. Run it from inside the target repo; it confirms the resolved range with you before spending tokens: ```markdown -Review the last week of commits using /bitwarden-code-review:performing-multi-agent-code-review --model-analysis opus --model-security fable --model-validation opus --model-audit sonnet --output-dir ./code-reviews +Review the last week of commits using /bitwarden-code-review:performing-multi-agent-code-review --model opus ``` **4. Full control, per stage.** Tune each stage independently — opus for analysis and validation, fable for security, sonnet for the audit — and send the report to a specific directory: diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/discovery-standards.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/discovery-standards.md index 742cc0ce..2dabdb56 100644 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/discovery-standards.md +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/discovery-standards.md @@ -1,10 +1,10 @@ # Discovery Standards -Loaded by the orchestrator in Step 1. The **Hygiene Sweep** is invoked by name from the Step 3 Agent 1 (code quality) prompt. The **Line Number Accuracy** rule is propagated verbatim into every Step 2–5 subagent prompt. +Loaded by the orchestrator in Step 1. The **Hygiene Sweep** is invoked by name from the Step 2 architect and Step 3 Agent 1 (code quality) prompts. The **Line Number Accuracy** rule is propagated verbatim into every Step 2–5 subagent prompt. ## Hygiene Sweep -Agent 1 (code quality) performs a hygiene sweep of the diff before submitting findings; the Step 2 architect performs an analogous doc/code consistency pass per its own directive. When referenced, look specifically for: +Agent 1 (code quality) performs a hygiene sweep of the diff before submitting findings; the Step 2 architect applies the same sweep within its scope. When referenced, look specifically for: - **Dead code added by this PR** — allowlist/registry/lookup-table entries added for features that don't flow through the validated entry point; unused imports; unreachable branches. - **Stale references** — documentation, comments, error messages, or assertions in this diff that contradict the same diff's implementation. From 0028c1514bbed6e94fbb5304a189ac82038f415d Mon Sep 17 00:00:00 2001 From: Mick Letofsky Date: Thu, 18 Jun 2026 19:27:23 +0200 Subject: [PATCH 10/13] Remove fable --- .../skills/performing-multi-agent-code-review/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md index e4d1b012..042f6717 100644 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md @@ -29,7 +29,7 @@ If the caller passed a bad path, the write in Step 9 will fail and surface the e Resolve per-stage models upon invocation before Step 1 begins. -Flag values are the Agent tool's model nicknames, in ascending tier order: `haiku` < `sonnet` < `opus` < `fable`. +Flag values are the Agent tool's model nicknames, in ascending tier order: `haiku` < `sonnet` < `opus`. The **global model** is `--model` if specified, otherwise the session's model. | Stage | Agents | Flag | Default | From 55ca053339163452c7c40a81aabdf1dbbd0a28c6 Mon Sep 17 00:00:00 2001 From: Mick Letofsky Date: Thu, 18 Jun 2026 19:57:32 +0200 Subject: [PATCH 11/13] Remove all fable references --- plugins/bitwarden-code-review/README.md | 4 ++-- .../references/report-template.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/plugins/bitwarden-code-review/README.md b/plugins/bitwarden-code-review/README.md index 71c7aedb..19cc5e1a 100644 --- a/plugins/bitwarden-code-review/README.md +++ b/plugins/bitwarden-code-review/README.md @@ -117,13 +117,13 @@ Invoke the skill explicitly with the slash command, or with natural language — Review the last week of commits using /bitwarden-code-review:performing-multi-agent-code-review --model opus ``` -**4. Full control, per stage.** Tune each stage independently — opus for analysis and validation, fable for security, sonnet for the audit — and send the report to a specific directory: +**4. Full control, per stage.** Tune each stage independently — opus for analysis and validation, opus for security, sonnet for the audit — and send the report to a specific directory: ```bash /bitwarden-code-review:performing-multi-agent-code-review \ https://github.com/bitwarden/gh-actions/pull/604 \ --model-analysis opus \ - --model-security fable \ + --model-security opus \ --model-validation opus \ --model-audit sonnet \ --output-dir ./reviews diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/report-template.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/report-template.md index 5aef2f8d..e1044802 100644 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/report-template.md +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/report-template.md @@ -6,7 +6,7 @@ ``` **Model:** opus -**Model:** opus (security: fable, audit: sonnet) +**Model:** opus (security: opus, audit: sonnet) ``` ## Severity Icons From ffd1fb9e747119fdf279793a1b365f3844909040 Mon Sep 17 00:00:00 2001 From: Mick Letofsky Date: Thu, 18 Jun 2026 20:50:16 +0200 Subject: [PATCH 12/13] Last couple minor changes --- plugins/bitwarden-code-review/README.md | 2 +- .../skills/performing-multi-agent-code-review/SKILL.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/plugins/bitwarden-code-review/README.md b/plugins/bitwarden-code-review/README.md index 19cc5e1a..3a8d61b2 100644 --- a/plugins/bitwarden-code-review/README.md +++ b/plugins/bitwarden-code-review/README.md @@ -129,7 +129,7 @@ Review the last week of commits using /bitwarden-code-review:performing-multi-ag --output-dir ./reviews ``` -A **security floor** keeps `--model-security` from ever dropping below your global model, so threat-model evaluation never silently degrades. Omit `--output-dir` and the report lands in `${CLAUDE_PLUGIN_DATA}`, organized across projects and never git-tracked. +A **security floor** keeps `--model-security` from ever dropping below your global model, so threat-model evaluation never silently degrades. Omit `--output-dir` and the report lands in `${CLAUDE_PLUGIN_DATA}/code-reviews/`, organized across projects and never git-tracked. ### In GitHub Actions diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md index 042f6717..a1e04173 100644 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/SKILL.md @@ -101,7 +101,7 @@ Feature context — issue descriptions, Jira tickets, PR history, removed-predec ## Discovery Standards -Read `references/discovery-standards.md`. Referenced by Step 2 (architect doc/code consistency pass) and Step 3 Agent 1 (Hygiene Sweep). +Read `references/discovery-standards.md`. Referenced by Step 2 (architect — doc/code consistency pass and Hygiene Sweep) and Step 3 Agent 1 (Hygiene Sweep). ## Evaluation Standards @@ -130,7 +130,7 @@ Execute these steps in order. Do not skip, reorder, or combine steps. - **READ** CLAUDE.md, README.md, and any other relevant .md files in or near the directories containing modified files. - **READ** `references/report-template.md` for formatting the final report in Step 7. - **READ** `references/finding-shape.md`. - - **READ** `references/discovery-standards.md`. The Hygiene Sweep is referenced by name in the Step 3 Agent 1 prompt. + - **READ** `references/discovery-standards.md`. The Hygiene Sweep is referenced by name in the Step 2 architect and Step 3 Agent 1 prompts. - **READ** `references/evaluation-standards.md`. 2. Launch a single architecture & pattern compliance agent using the `general-purpose` subagent type, with the resolved analysis model (see Model Selection). Open the subagent prompt with: "You are a software architect reviewing code changes for architectural and pattern compliance." Give it the diff, the list of changed file paths, and — in PR mode only — the PR title and description. From 63d07186bdd7b8ec72631bd9f72e4498d760853f Mon Sep 17 00:00:00 2001 From: Mick Letofsky Date: Fri, 19 Jun 2026 15:22:17 +0200 Subject: [PATCH 13/13] Final claude nit pick --- .../references/report-template.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/report-template.md b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/report-template.md index e1044802..0294c719 100644 --- a/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/report-template.md +++ b/plugins/bitwarden-code-review/skills/performing-multi-agent-code-review/references/report-template.md @@ -6,7 +6,7 @@ ``` **Model:** opus -**Model:** opus (security: opus, audit: sonnet) +**Model:** opus (audit: sonnet) ``` ## Severity Icons