Initial version of playwright-testing plugin by kdenney · Pull Request #140 · bitwarden/ai-plugins

kdenney · 2026-06-09T16:51:11Z

🎟️ Tracking

N/A — new plugin contribution.

📔 Objective

Adds the bitwarden-playwright-testing plugin (v1.0.0): an automated end-to-end UI testing framework for Bitwarden web changes, driven by Playwright.

The plugin exposes a single user-facing skill, test-web-changes, that takes a Jira ticket ID, an implementation-plan file, or a free-form feature description and turns it into a complete Playwright test run — gathering context, exploring the affected codebases, building grounded test cases, verifying the local dev environment, executing the tests through the browser UI, and compiling an HTML report with full-page screenshots.

How it works

test-web-changes runs as a team lead over an eight-task pipeline, dispatching a seven-agent team and persisting each artifact to .playwright-testing-artifacts/<slug>/:

Agent	Responsibility
`context-gatherer`	Acquires feature source content and extracts structured context
`code-explorer`	Explores affected repos → Application Context (changed files, routes, selectors, verification points)
`service-mapper`	Maps changed paths to the local services that must be running
`test-planner`	Builds grounded test cases via `build-test-cases`
`service-manager`	Verifies Docker containers, `/alive` endpoints, and Angular bootstrap; halts on failure
`test-runner`	Executes test cases through the `playwright-cli` agent with guardrails and screenshots
`report-compiler`	Compiles the HTML report

Notable design points

Web-first policy — all setup and test actions go through the browser UI; no direct DB queries, out-of-band REST calls, or CLI shortcuts.
Verify-only environment handling — the plugin confirms the dev environment is ready but never starts, builds, or stops services; it halts with a hint if something's missing.
Self-host not supported — the plugin currently only supports testing cloud-hosted setups. Self-host support could be added as a fast follow.
Billing support — build-test-cases bakes Stripe test values into UI steps; a billing-related 400 during execution halts the run immediately.
Out of scope — browser extension, desktop, and CLI surfaces (no Playwright UI surface).

Prerequisites

The playwright-cli Claude Code plugin (render verification and browser execution depend on it).
A running Bitwarden local dev environment (containers + the relevant .NET/web services).

Notes for reviewers

Registered in marketplace.json and README.md; passes validate-marketplace.sh and validate-plugin-structure.sh (0 errors, 0 warnings).
New plugin only — no changes to existing plugins.

Screenshots

github-actions · 2026-06-09T16:51:37Z

Validation Summary — `bitwarden-playwright-testing`

PR #140 — plugin first release at version 1.0.0.

Three validations run: plugin-validator, skill-reviewer, and security review (claude-config-validator). No critical findings overall. The plugin is structurally sound and contains no committed secrets, but two issues warrant attention before release: (1) two SKILL.md files use command-only frontmatter fields, and (2) the test-runner agent has overly broad Bash(curl:*) and Bash(ls *) allowlist entries.

Issues by severity

Critical — must fix

None.

Major — should fix

#	Area	File	Finding	Remediation
M1	Skill misclassified as orchestration command	`plugins/bitwarden-playwright-testing/skills/test-web-changes/SKILL.md:1-6`	Frontmatter declares `argument-hint` and `allowed-tools` (slash-command-only fields). Body is a step-by-step team-lead runbook ("Step 0 — Parse input", "--confirm flag", "Create team named...", "Dispatch ... Wait for completion") written in second person — i.e., a slash-command runbook rather than a progressive-disclosure skill.	Either relocate to `commands/test-web-changes.md`, or strip the command-only frontmatter and rewrite as a third-person skill. README likely advertises `/test-web-changes`; reconcile with `.claude-plugin/plugin.json` (no `commands[]` entry today).
M2	Skill frontmatter has command-only fields	`plugins/bitwarden-playwright-testing/skills/reading-mailcatcher-api/SKILL.md:4-5`	`argument-hint` and `allowed-tools` are present in SKILL frontmatter and the body contains a "User invocation" section. Description is ~720 chars (target ~500) and opens in second person ("Use this skill whenever you...").	Remove `argument-hint`/`allowed-tools` from the SKILL or split into a co-located command. Tighten the description to ~300–400 chars and convert to third-person infinitive ("Read an email from the local Bitwarden Mailcatcher inbox..."). Remove the duplicated "Invoke whenever a workflow needs to read..." sentence.
M3	Overly broad Bash permission — curl	`plugins/bitwarden-playwright-testing/agents/test-runner/AGENT.md:11`	`Bash(curl:)` permits curl to any URL on the internet* with any method/body. `references/tool-policy.md` constrains intent, but the harness allowlist is what is actually enforced. Combined with `Read` (unscoped) this forms a read-then-exfil chain reachable via prompt-injection through Jira ticket bodies or plan files.	Scope to local hosts only: `Bash(curl * http://localhost:)`, `Bash(curl https://localhost:)`, `Bash(curl http://127.0.0.1:*)`. Or wrap legitimate Category-3 curls in a co-located script (same pattern as `read-mailcatcher.sh`) and allow only the script path.
M4	Overly broad Bash permission — ls	`plugins/bitwarden-playwright-testing/agents/test-runner/AGENT.md:11`	`Bash(ls )` permits `ls /etc`, `ls ~/.ssh`, `ls /home//.aws`, etc. Only artifacts-dir listing is documented (`executing-web-tests/SKILL.md:204`).	Scope to artifacts dir: `Bash(ls /.playwright-testing-artifacts/)`. Or replace with `Glob`/`Read` if those are already allowed.
M5	Long skills lack progressive disclosure	`plugins/bitwarden-playwright-testing/skills/executing-web-tests/SKILL.md` (2,017 words) and `plugins/bitwarden-playwright-testing/skills/exploring-application-context/SKILL.md` (2,129 words)	Both are dense single files with no `references/` or `examples/`. Candidate splits: adaptive-assertion, screenshot/toast policy, pause-resume protocol (`executing-web-tests`); state-recording rules, output-schema (`exploring-application-context`).	Move heavy reference sections into `references/*.md`; add an `examples/sample-context.md` to `exploring-application-context` so downstream skills have a canonical anchor.

Minor — recommended polish

#	Area	File	Finding	Remediation
m1	Missing agent `<example>` blocks	All 7 `agents/*/AGENT.md`	None of the 7 agent `description` frontmatter fields contain `<example>` blocks. All agents are `user-invocable: false` and dispatched programmatically by `test-web-changes`, so routing examples are optional, but they document the dispatch pattern for future maintainers.	Optional: add 1–2 `<example>` blocks to each description showing a representative team-lead dispatch.
m2	Lean skills below 1,000-word target	`compiling-test-report/SKILL.md` (586 words), `determining-required-services/SKILL.md` (368 words), `verifying-environment-health/SKILL.md` (591 words)	Below the 1,000-word floor, but each delegates appropriately to a template, reference, or script. Acceptable as intentional minimalism.	No action required; flagging only for visibility.
m3	Missing examples directory	`build-test-cases/SKILL.md`	Strict output format described inline but no `examples/sample-test-case.md` to anchor on.	Optional: add `examples/sample-test-case.md` with a complete TC.
m4	Frontmatter–body separation	`compiling-test-report/SKILL.md:5`, `determining-required-services/SKILL.md:5`	No blank line between closing `---` and the first body line. Render-safe but YAML convention prefers a blank separator.	Insert a blank line after frontmatter close.
m5	Mixed path conventions	`build-test-cases/SKILL.md:20,22,104`	Mixes `${CLAUDE_PLUGIN_ROOT}` and `${CLAUDE_SKILL_DIR}` for references. Current usage is functionally correct (cross-skill vs in-skill) but worth standardizing.	Document the convention in the plugin README or keep as-is.
m6	URL duplication	`build-test-cases/SKILL.md:24`	Hardcodes `http://localhost:62911` (Admin Portal).	Link to `references/services.md` as the URL of record.
m7	Trailing whitespace / missing trailing newline	`executing-web-tests/SKILL.md:17` (cell trailing whitespace); `verifying-environment-health/SKILL.md` (no trailing newline at line 67 EOF)	Cosmetic.	`pnpm run format` should resolve.
m8	`ignoreHTTPSErrors: true` lacks in-file annotation	`plugins/bitwarden-playwright-testing/scripts/playwright.config.json:1-7`	The config disables TLS verification, which is correct for local dev (self-signed cert at `https://localhost:8080`) and documented in `executing-web-tests/SKILL.md`. JSON has no comments, so this constraint is invisible to someone reading the file alone.	Optional: add a `"_comment": "Local dev only — disables TLS verification because localhost uses self-signed certs"` key.
m9	Stripe write allowlist — narrow but worth annotating	`agents/test-runner/AGENT.md:11`	`Bash(stripe post /v1/test_helpers/test_clocks//advance:)` is appropriately pinned to test-clock advance. A future maintainer could widen the wildcard accidentally.	Add an inline note in the AGENT.md that the wildcard must not be widened.
m10	Untrusted input lacks documented sanitization posture	`skills/test-web-changes/SKILL.md:10-25`, `agents/context-gatherer/AGENT.md:22-30`	Pipeline ingests Jira ticket bodies, plan markdown, and free-form feature descriptions — all data, but currently fed into agent prompts without an explicit "treat as data, not instructions" note.	Defense-in-depth: add a one-line note in `context-gatherer/AGENT.md` that imperative phrases inside the source are to be ignored. Made effectively safe once M3 (curl scoping) is fixed.
m11	Mailcatcher DELETE relies on instruction-level gate	`skills/reading-mailcatcher-api/SKILL.md:153`, `references/email-patterns.md:95`	The skill says "ALWAYS ask the user before running this command" for `curl -X DELETE http://localhost:1080/messages`. Once M3 is fixed (curl scoped to local hosts), this remains an LLM-instruction gate but with bounded blast radius.	No action required after M3 is fixed.

What was validated

1. Plugin structure (plugin-validator) — PASS

plugin.json is valid JSON; includes name, version, description, author, homepage, repository, keywords, agents[]
name: bitwarden-playwright-testing matches the directory name
version: 1.0.0 matches marketplace.json entry at line 99
All 7 agents[] paths resolve to existing AGENT.md files
All 7 agents: lowercase-hyphenated names within 3–50 chars; valid model: sonnet; valid colors (green, orange, blue, yellow, purple, cyan); system prompts >200 chars each; tool allowlists present and (mostly) scoped
README.md is comprehensive (162 lines); CHANGELOG.md follows Keep a Changelog format with ## [1.0.0] - 2026-05-28 and ### Added
No commands or hooks declared (consistent with the plugin's design as agents + skills + orchestration)
No MCP server definitions (the plugin consumes tools from other plugins like bitwarden-atlassian-tools and playwright-cli)

2. Skill review (skill-reviewer) — 6 of 8 pass cleanly

Skill	Verdict
`build-test-cases`	Pass — 1,946 words; minor mixed-path note
`compiling-test-report`	Pass — 586 words; lean but delegates to template
`determining-required-services`	Pass — 368 words; lean but delegates to `references/services.md`
`executing-web-tests`	Pass with split recommendation — 2,017 words; no subdirs despite size
`exploring-application-context`	Pass with split recommendation — 2,129 words; no `examples/`
`reading-mailcatcher-api`	Needs revision — description voice + command-only frontmatter (M2)
`test-web-changes`	Needs major revision — misclassified, should be a slash command (M1)
`verifying-environment-health`	Pass — 591 words; delegates to two scripts

All referenced files exist: references/services.md, references/known-flows.md, references/email-patterns.md, references/billing-test-data.md, templates/report-template.html, scripts/read-mailcatcher.sh, scripts/preflight-check.sh, scripts/health-check.sh, plugin-level references/tool-policy.md.

3. Security review (reviewing-claude-config) — 0 critical, 2 major (M3/M4), 5 minor

No committed secrets. Regex sweep for Stripe (sk_test_/sk_live_, pk_*, rk_*, whsec_), AWS (AKIA...), GitHub PATs (ghp_/ghs_/github_pat_), Slack tokens, and generic password = "..." assignments — zero matches.
No settings.local.json, .env, or credentials* files committed.
References to server/dev/secrets.json are documentation pointers to a runtime config file the user owns locally — not committed secrets.
Shell scripts (health-check.sh, preflight-check.sh, read-mailcatcher.sh) are well-bounded, parameterized, and use set -u. No eval/exec/rm -rf/chmod 777/curl | sh patterns.
references/tool-policy.md defines a strong allowlist policy (no DB queries, no API-for-UI substitution, Stripe read-only except test-clock advance).
Agent tool allowlists are mostly narrow (e.g., service-manager only allows the two specific scripts in this plugin). Notable exceptions: Bash(curl:*) and Bash(ls *) on test-runner — see M3/M4.
ignoreHTTPSErrors: true in scripts/playwright.config.json is appropriate for local dev (self-signed cert) — see m8.

Disposition

Recommended actions before merge:

Resolve M1 (relocate or restructure test-web-changes/SKILL.md) and M2 (clean up reading-mailcatcher-api/SKILL.md frontmatter and description).
Tighten M3 and M4 (scope Bash(curl:*) and Bash(ls *) in the test-runner agent).
Address M5 (progressive disclosure for the two large skills) — can be deferred to a follow-up if scope is too large.
Minor items can land in a polish pass.

Once M1–M4 are addressed, the plugin is ready to ship. Remember to bump the version (likely PATCH 1.0.1) and add a CHANGELOG.md entry if any substantive changes land beyond this initial release.

github-actions · 2026-06-09T16:52:02Z

🤖 Bitwarden Claude Code Review

Overall Assessment: APPROVE

Reviewed the initial v1.0.0 contribution of the bitwarden-playwright-testing plugin — a new, additive plugin with 30 files including 7 agents, 8 skills, 3 shell scripts, an HTML report template, README, CHANGELOG, and the plugin manifest. The plugin is registered correctly in .claude-plugin/marketplace.json and the root README.md table. The orchestration skill (test-web-changes), agent allowlists, and four-category tool policy are internally consistent, and the included shell scripts (preflight-check.sh, health-check.sh, read-mailcatcher.sh) handle untrusted input safely — service-name whitelist validation in the health check, env-var-passed args into the embedded Python in the mailcatcher script, and hardcoded grep patterns in preflight. No new dependencies are introduced (no manifest files in the diff), so no dependency approval flow is required.

Code Review Details

No findings to report.

Initial version of playwright-testing plugin

5532280

kdenney requested a review from a team as a code owner June 9, 2026 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial version of playwright-testing plugin#140

Initial version of playwright-testing plugin#140
kdenney wants to merge 1 commit into
mainfrom
add/bitwarden-playwright-testing

kdenney commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kdenney commented Jun 9, 2026

🎟️ Tracking

📔 Objective

How it works

Notable design points

Prerequisites

Notes for reviewers

Screenshots

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Validation Summary — bitwarden-playwright-testing

Issues by severity

Critical — must fix

Major — should fix

Minor — recommended polish

What was validated

1. Plugin structure (plugin-validator) — PASS

2. Skill review (skill-reviewer) — 6 of 8 pass cleanly

3. Security review (reviewing-claude-config) — 0 critical, 2 major (M3/M4), 5 minor

Disposition

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Bitwarden Claude Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Validation Summary — `bitwarden-playwright-testing`

github-actions Bot commented Jun 9, 2026 •

edited

Loading