Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,12 @@
"source": "./plugins/bitwarden-design-tools",
"version": "0.1.0",
"description": "Design toolkit for Bitwarden β€” non-persona skills for the design lifecycle. Content style guide reference, Figma Dev Mode MCP usage, Bitwarden brand application, design-to-engineering handoff prep, Design System governance, and the Product and Design Jira workflow. Composed by the bitwarden-designer agent and usable standalone."
},
{
"name": "bitwarden-test-engineer",
"source": "./plugins/bitwarden-test-engineer",
"version": "1.0.0",
"description": "Test engineering toolkit for Bitwarden. Hosts role-specific testing agents β€” currently a test strategist that recommends what to test, at which layer, and why (risk-weighted, shaped to each repo) and inventories existing coverage. Designed to grow additional roles such as an SDET or a QA engineer."
}
]
}
21 changes: 21 additions & 0 deletions .cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"version": "0.2",
"words": [
"accum",
"actioned",
"adf",
"AKIA",
"anthropics",
Expand All @@ -12,6 +13,7 @@
"askable",
"ASVS",
"atlassian",
"automatable",
"Bitwarden",
"blocklist",
"blogposts",
Expand All @@ -25,11 +27,13 @@
"codeBlock",
"CODEOWNERS",
"Confluence",
"Consolas",
"CQL",
"customfield",
"cvss",
"Dashlane",
"dast",
"detekt",
"docstrings",
"dread",
"duedate",
Expand All @@ -50,6 +54,7 @@
"Gatekeeping",
"GHAS",
"ghsa",
"getline",
"gofmt",
"gradlew",
"grype",
Expand All @@ -60,24 +65,29 @@
"hotspots",
"IDOR",
"inclusivity",
"inlines",
"issueIdOrKey",
"issuelinks",
"issuetype",
"Jira",
"JQL",
"keyserver",
"ktlint",
"lockdown",
"lockfiles",
"maxResults",
"mcp",
"Menlo",
"metacharacters",
"mockall",
"modelcontextprotocol",
"msword",
"MVVM",
"myapp",
"mypassword",
"myproject",
"Newtonsoft",
"nextest",
"nextPageToken",
"numstat",
"NVARCHAR",
Expand All @@ -94,11 +104,14 @@
"remotelink",
"Rescope",
"resolutiondate",
"Robolectric",
"rustdoc",
"sarif",
"SDET",
"SDLC",
"sast",
"sbom",
"Segoe",
"semver",
"shellcheck",
"shortlog",
Expand All @@ -117,15 +130,22 @@
"startswith",
"stride",
"structurizr",
"stylesheet",
"subdirs",
"tablist",
"tabpanel",
"tarpit",
"thumbsup",
"tinyui",
"tnum",
"touchpoint",
"touchpoints",
"triaging",
"unassigning",
"unassigns",
"unfound",
"ungroup",
"unlinkable",
"unresponded",
"unsanitized",
"userflow",
Expand All @@ -139,6 +159,7 @@
"wordprocessingml",
"worktree",
"worktrees",
"XCUI",
"xoxb",
"Zeroize",
"zeroization",
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ A curated collection of plugins for AI-assisted development at Bitwarden. Enable
| [bitwarden-product-analyst](plugins/bitwarden-product-analyst/) | 0.1.5 | Product analyst agent for creating comprehensive Bitwarden requirements documents from multiple sources |
| [bitwarden-security-engineer](plugins/bitwarden-security-engineer/) | 1.2.0 | Application security engineering: vulnerability triage, threat modeling, and secure code analysis |
| [bitwarden-software-engineer](plugins/bitwarden-software-engineer/) | 1.0.0 | Software engineer agent for a Bitwarden product team. Implements stories, tasks, and bugs with code quality, performance, security, and team comms in mind. |
| [bitwarden-test-engineer](plugins/bitwarden-test-engineer/) | 1.0.0 | Test engineering toolkit: role-specific testing agents spanning the test lifecycle, starting with risk-weighted test strategy and coverage planning. |
| [claude-config-validator](plugins/claude-config-validator/) | 1.1.1 | Validates Claude Code configuration files for security, structure, and quality |
| [claude-retrospective](plugins/claude-retrospective/) | 1.1.1 | Analyze Claude Code sessions to identify successful patterns and improvement opportunities |

Expand Down
22 changes: 22 additions & 0 deletions plugins/bitwarden-test-engineer/.claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"name": "bitwarden-test-engineer",
"version": "1.0.0",
"description": "Test engineering toolkit for Bitwarden. Hosts role-specific testing agents β€” currently a test strategist that recommends what to test, at which layer, and why (risk-weighted, shaped to each repo) and inventories existing coverage. Designed to grow additional roles such as an SDET or a QA engineer.",
"author": {
"name": "Bitwarden",
"url": "https://github.com/bitwarden"
},
"homepage": "https://github.com/bitwarden/ai-plugins/tree/main/plugins/bitwarden-test-engineer",
"repository": "https://github.com/bitwarden/ai-plugins",
"keywords": [
"testing",
"test-engineering",
"quality-engineering",
"test-strategy",
"test-automation",
"exploratory-testing",
"test-layers",
"qa",
"orchestrator"
]
}
15 changes: 15 additions & 0 deletions plugins/bitwarden-test-engineer/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Changelog

All notable changes to the Bitwarden Test Engineer Plugin will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.0.0] - 2026-06-15

### Added

- Initial release of the `bitwarden-test-engineer` plugin.
- `test-strategist` agent: classifies a change's inputs (Jira ticket, GitHub PR, tech breakdown, test-case CSV, plain-language description), fans out subagents to gather evidence, and presents a test recommendation.
- `assessing-test-coverage` skill: inventories what a change is already tested by, buckets observed tests by layer, cites them as stable GitHub permalinks, and writes a self-contained HTML coverage report.
- `analyzing-test-stack` skill: maps a change's testable behaviors to the cheapest sufficient test layer per platform, surfaces coverage gaps and shape-wrong tests, and emits a self-contained HTML report.
- Shared plugin-level `references/` and a `build-report.sh` script that splices the single shared stylesheet into each report so the two reports can't drift.
99 changes: 99 additions & 0 deletions plugins/bitwarden-test-engineer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Bitwarden Test Engineer Plugin

## Overview

A test engineering toolkit for Bitwarden. It hosts role-specific testing agents. Today it
ships one β€” the **test strategist** (`test-strategist`), the test-_planning_ role:
it recommends what to test, at which layer, and why, and inventories what is already tested.
It does not author, run, or maintain the tests, nor do exploratory/manual QA. The plugin is
designed to grow additional roles over time (for example an SDET or a QA engineer).

### First role: the test strategist

Given a change β€” a feature, bugfix, refactor, or migration β€” the agent recommends
**what to test, at which layer, and why**, shaped to **each repo's actual test practice**.
Two ideas drive it: each behavior is tested at the cheapest layer that buys the confidence it
needs (unit, integration, or E2E), and how those layers are weighted is decided per repo β€” a
unit-heavy pyramid (`server`, `clients`, `sdk-internal`, `android`), an integration/snapshot
trophy (`ios`), or a wholly all-E2E repo (the dedicated `test` repo,
`browser-interactions-testing`). E2E is "thin" only _within_ a platform repo; the dedicated
`test` repo is entirely E2E by design.

It ingests whatever evidence is available β€” a Jira ticket (via the Atlassian MCP), a GitHub
PR (via `gh`), an exported test-case CSV, and/or a plain-language description β€” fans out
subagents to gather it, assesses what is **already tested** (the `assessing-test-coverage`
skill, which inventories existing tests, cites each as a GitHub permalink, and writes a
coverage report), then runs the analyst skill (`analyzing-test-stack`), which produces the
test-stack recommendation. Both skills emit a self-contained HTML report.

## Where each layer lives

Unit and integration tests live alongside the code inside each platform repo
(e.g. `bitwarden/server`, `bitwarden/clients`, `bitwarden/ios`). **End-to-end tests live
in a dedicated, private `test` repository** β€” not inside the platform repos β€” so E2E
recommendations target that separate repo, and existing E2E coverage is treated as
unverified when that repo isn't checked out.

## Agents

| Agent | What It Does |
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `test-strategist` | Classifies the inputs for a change (Jira, PR, CSV, description), fans out subagents to gather evidence, assesses existing coverage (`assessing-test-coverage`), then runs `analyzing-test-stack` β€” emitting a self-contained coverage report and a self-contained test-stack report. |

## Skills

| Skill | What It Does |
| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `assessing-test-coverage` | The backward-looking inventory. Determines what is **already tested** for a change β€” scoped to the change surface, PR-first then a targeted lookup β€” buckets each observed test by layer, cites it as a stable GitHub permalink, flags untested behaviors as gaps, and writes a self-contained HTML coverage report. Feeds `analyzing-test-stack`; usable standalone to audit current coverage. |
| `analyzing-test-stack` | The recommender. Consumes the coverage inventory, then maps each testable behavior in a change to the cheapest sufficient test layer per platform, inside each repo's actual shape, names concrete tooling, surfaces coverage gaps and shape-wrong tests (ice-cream-cone, over-testing, missing platform layers), and writes a self-contained HTML report into a per-change report directory. |

## Cross-Plugin Integration

| Plugin | How It's Used |
| --------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `bitwarden-atlassian-tools` | Optional but recommended. Provides the `mcp__plugin_bitwarden-atlassian-tools_bitwarden-atlassian__*` server used to read Jira tickets and linked Confluence requirements. If absent, the plugin degrades gracefully β€” paste requirements or rely on the PR/CSV/description. |

## Installation

```bash
/plugin install bitwarden-test-engineer@bitwarden-marketplace
```

For Jira-backed analysis, install the Atlassian tools alongside it:

```bash
/plugin install bitwarden-atlassian-tools@bitwarden-marketplace
```

## Usage

The agent activates when you ask what test coverage a change needs, which
automation layers to add, how to shape a test plan, or whether existing tests are at the
right level:

```
I'm picking up PM-12345 next sprint. What test coverage should this feature have?
```

```
Does bitwarden/server#5821 have the right tests, or is it leaning too hard on end-to-end?
```

```
Here's our exported test cases CSV for the new item types import/export work (PM-32009) β€”
which of these should be automated and at what layer?
```

Each run produces a per-change directory `test-engineer-report-<slug>-<date>/` holding the
self-contained HTML reports: `coverage.html` (what is already tested β€” observed tests per layer,
each cited as a GitHub permalink, plus gaps), `recommended.html` (the per-platform recommendation
and its coverage-gap findings), and `combined.html` (the primary deliverable β€” both on one two-tab
page). Re-running on the same change and date refreshes the reports in that directory. They share
one off-brand data-report visual system so they read as the same instrument.

## References

- [Claude Code Agents](https://code.claude.com/docs/en/agents)
- [Claude Code Skills](https://code.claude.com/docs/en/skills)
- [The Testing Trophy](https://kentcdodds.com/blog/the-testing-trophy-and-testing-classifications)
- [Bitwarden Contributing Guidelines](https://contributing.bitwarden.com/contributing/)
Loading
Loading