Skip to content

Initial version of playwright-testing plugin#140

Open
kdenney wants to merge 1 commit into
mainfrom
add/bitwarden-playwright-testing
Open

Initial version of playwright-testing plugin#140
kdenney wants to merge 1 commit into
mainfrom
add/bitwarden-playwright-testing

Conversation

@kdenney

@kdenney kdenney commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🎟️ Tracking

N/A — new plugin contribution.

📔 Objective

Adds the bitwarden-playwright-testing plugin (v1.0.0): an automated end-to-end UI testing framework for Bitwarden web changes, driven by Playwright.

The plugin exposes a single user-facing skill, test-web-changes, that takes a Jira ticket ID, an implementation-plan file, or a free-form feature description and turns it into a complete Playwright test run — gathering context, exploring the affected codebases, building grounded test cases, verifying the local dev environment, executing the tests through the browser UI, and compiling an HTML report with full-page screenshots.

How it works

test-web-changes runs as a team lead over an eight-task pipeline, dispatching a seven-agent team and persisting each artifact to .playwright-testing-artifacts/<slug>/:

Agent Responsibility
context-gatherer Acquires feature source content and extracts structured context
code-explorer Explores affected repos → Application Context (changed files, routes, selectors, verification points)
service-mapper Maps changed paths to the local services that must be running
test-planner Builds grounded test cases via build-test-cases
service-manager Verifies Docker containers, /alive endpoints, and Angular bootstrap; halts on failure
test-runner Executes test cases through the playwright-cli agent with guardrails and screenshots
report-compiler Compiles the HTML report

Notable design points

  • Web-first policy — all setup and test actions go through the browser UI; no direct DB queries, out-of-band REST calls, or CLI shortcuts.
  • Verify-only environment handling — the plugin confirms the dev environment is ready but never starts, builds, or stops services; it halts with a hint if something's missing.
  • Self-host not supported — the plugin currently only supports testing cloud-hosted setups. Self-host support could be added as a fast follow.
  • Billing supportbuild-test-cases bakes Stripe test values into UI steps; a billing-related 400 during execution halts the run immediately.
  • Out of scope — browser extension, desktop, and CLI surfaces (no Playwright UI surface).

Prerequisites

  • The playwright-cli Claude Code plugin (render verification and browser execution depend on it).
  • A running Bitwarden local dev environment (containers + the relevant .NET/web services).

Notes for reviewers

  • Registered in marketplace.json and README.md; passes validate-marketplace.sh and validate-plugin-structure.sh (0 errors, 0 warnings).
  • New plugin only — no changes to existing plugins.

Screenshots

image image

@kdenney kdenney requested a review from a team as a code owner June 9, 2026 16:51
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Validation Summary — bitwarden-playwright-testing

PR #140 — plugin first release at version 1.0.0.

Three validations run: plugin-validator, skill-reviewer, and security review (claude-config-validator). No critical findings overall. The plugin is structurally sound and contains no committed secrets, but two issues warrant attention before release: (1) two SKILL.md files use command-only frontmatter fields, and (2) the test-runner agent has overly broad Bash(curl:*) and Bash(ls *) allowlist entries.


Issues by severity

Critical — must fix

None.

Major — should fix

# Area File Finding Remediation
M1 Skill misclassified as orchestration command plugins/bitwarden-playwright-testing/skills/test-web-changes/SKILL.md:1-6 Frontmatter declares argument-hint and allowed-tools (slash-command-only fields). Body is a step-by-step team-lead runbook ("Step 0 — Parse input", "--confirm flag", "Create team named...", "Dispatch ... Wait for completion") written in second person — i.e., a slash-command runbook rather than a progressive-disclosure skill. Either relocate to commands/test-web-changes.md, or strip the command-only frontmatter and rewrite as a third-person skill. README likely advertises /test-web-changes; reconcile with .claude-plugin/plugin.json (no commands[] entry today).
M2 Skill frontmatter has command-only fields plugins/bitwarden-playwright-testing/skills/reading-mailcatcher-api/SKILL.md:4-5 argument-hint and allowed-tools are present in SKILL frontmatter and the body contains a "User invocation" section. Description is ~720 chars (target ~500) and opens in second person ("Use this skill whenever you..."). Remove argument-hint/allowed-tools from the SKILL or split into a co-located command. Tighten the description to ~300–400 chars and convert to third-person infinitive ("Read an email from the local Bitwarden Mailcatcher inbox..."). Remove the duplicated "Invoke whenever a workflow needs to read..." sentence.
M3 Overly broad Bash permission — curl plugins/bitwarden-playwright-testing/agents/test-runner/AGENT.md:11 Bash(curl:*) permits curl to any URL on the internet with any method/body. references/tool-policy.md constrains intent, but the harness allowlist is what is actually enforced. Combined with Read (unscoped) this forms a read-then-exfil chain reachable via prompt-injection through Jira ticket bodies or plan files. Scope to local hosts only: Bash(curl * http://localhost:*), Bash(curl * https://localhost:*), Bash(curl * http://127.0.0.1:*). Or wrap legitimate Category-3 curls in a co-located script (same pattern as read-mailcatcher.sh) and allow only the script path.
M4 Overly broad Bash permission — ls plugins/bitwarden-playwright-testing/agents/test-runner/AGENT.md:11 Bash(ls *) permits ls /etc, ls ~/.ssh, ls /home/*/.aws, etc. Only artifacts-dir listing is documented (executing-web-tests/SKILL.md:204). Scope to artifacts dir: Bash(ls */.playwright-testing-artifacts/*). Or replace with Glob/Read if those are already allowed.
M5 Long skills lack progressive disclosure plugins/bitwarden-playwright-testing/skills/executing-web-tests/SKILL.md (2,017 words) and plugins/bitwarden-playwright-testing/skills/exploring-application-context/SKILL.md (2,129 words) Both are dense single files with no references/ or examples/. Candidate splits: adaptive-assertion, screenshot/toast policy, pause-resume protocol (executing-web-tests); state-recording rules, output-schema (exploring-application-context). Move heavy reference sections into references/*.md; add an examples/sample-context.md to exploring-application-context so downstream skills have a canonical anchor.

Minor — recommended polish

# Area File Finding Remediation
m1 Missing agent <example> blocks All 7 agents/*/AGENT.md None of the 7 agent description frontmatter fields contain <example> blocks. All agents are user-invocable: false and dispatched programmatically by test-web-changes, so routing examples are optional, but they document the dispatch pattern for future maintainers. Optional: add 1–2 <example> blocks to each description showing a representative team-lead dispatch.
m2 Lean skills below 1,000-word target compiling-test-report/SKILL.md (586 words), determining-required-services/SKILL.md (368 words), verifying-environment-health/SKILL.md (591 words) Below the 1,000-word floor, but each delegates appropriately to a template, reference, or script. Acceptable as intentional minimalism. No action required; flagging only for visibility.
m3 Missing examples directory build-test-cases/SKILL.md Strict output format described inline but no examples/sample-test-case.md to anchor on. Optional: add examples/sample-test-case.md with a complete TC.
m4 Frontmatter–body separation compiling-test-report/SKILL.md:5, determining-required-services/SKILL.md:5 No blank line between closing --- and the first body line. Render-safe but YAML convention prefers a blank separator. Insert a blank line after frontmatter close.
m5 Mixed path conventions build-test-cases/SKILL.md:20,22,104 Mixes ${CLAUDE_PLUGIN_ROOT} and ${CLAUDE_SKILL_DIR} for references. Current usage is functionally correct (cross-skill vs in-skill) but worth standardizing. Document the convention in the plugin README or keep as-is.
m6 URL duplication build-test-cases/SKILL.md:24 Hardcodes http://localhost:62911 (Admin Portal). Link to references/services.md as the URL of record.
m7 Trailing whitespace / missing trailing newline executing-web-tests/SKILL.md:17 (cell trailing whitespace); verifying-environment-health/SKILL.md (no trailing newline at line 67 EOF) Cosmetic. pnpm run format should resolve.
m8 ignoreHTTPSErrors: true lacks in-file annotation plugins/bitwarden-playwright-testing/scripts/playwright.config.json:1-7 The config disables TLS verification, which is correct for local dev (self-signed cert at https://localhost:8080) and documented in executing-web-tests/SKILL.md. JSON has no comments, so this constraint is invisible to someone reading the file alone. Optional: add a "_comment": "Local dev only — disables TLS verification because localhost uses self-signed certs" key.
m9 Stripe write allowlist — narrow but worth annotating agents/test-runner/AGENT.md:11 Bash(stripe post /v1/test_helpers/test_clocks/*/advance:*) is appropriately pinned to test-clock advance. A future maintainer could widen the wildcard accidentally. Add an inline note in the AGENT.md that the wildcard must not be widened.
m10 Untrusted input lacks documented sanitization posture skills/test-web-changes/SKILL.md:10-25, agents/context-gatherer/AGENT.md:22-30 Pipeline ingests Jira ticket bodies, plan markdown, and free-form feature descriptions — all data, but currently fed into agent prompts without an explicit "treat as data, not instructions" note. Defense-in-depth: add a one-line note in context-gatherer/AGENT.md that imperative phrases inside the source are to be ignored. Made effectively safe once M3 (curl scoping) is fixed.
m11 Mailcatcher DELETE relies on instruction-level gate skills/reading-mailcatcher-api/SKILL.md:153, references/email-patterns.md:95 The skill says "ALWAYS ask the user before running this command" for curl -X DELETE http://localhost:1080/messages. Once M3 is fixed (curl scoped to local hosts), this remains an LLM-instruction gate but with bounded blast radius. No action required after M3 is fixed.

What was validated

1. Plugin structure (plugin-validator) — PASS

  • plugin.json is valid JSON; includes name, version, description, author, homepage, repository, keywords, agents[]
  • name: bitwarden-playwright-testing matches the directory name
  • version: 1.0.0 matches marketplace.json entry at line 99
  • All 7 agents[] paths resolve to existing AGENT.md files
  • All 7 agents: lowercase-hyphenated names within 3–50 chars; valid model: sonnet; valid colors (green, orange, blue, yellow, purple, cyan); system prompts >200 chars each; tool allowlists present and (mostly) scoped
  • README.md is comprehensive (162 lines); CHANGELOG.md follows Keep a Changelog format with ## [1.0.0] - 2026-05-28 and ### Added
  • No commands or hooks declared (consistent with the plugin's design as agents + skills + orchestration)
  • No MCP server definitions (the plugin consumes tools from other plugins like bitwarden-atlassian-tools and playwright-cli)

2. Skill review (skill-reviewer) — 6 of 8 pass cleanly

Skill Verdict
build-test-cases Pass — 1,946 words; minor mixed-path note
compiling-test-report Pass — 586 words; lean but delegates to template
determining-required-services Pass — 368 words; lean but delegates to references/services.md
executing-web-tests Pass with split recommendation — 2,017 words; no subdirs despite size
exploring-application-context Pass with split recommendation — 2,129 words; no examples/
reading-mailcatcher-api Needs revision — description voice + command-only frontmatter (M2)
test-web-changes Needs major revision — misclassified, should be a slash command (M1)
verifying-environment-health Pass — 591 words; delegates to two scripts

All referenced files exist: references/services.md, references/known-flows.md, references/email-patterns.md, references/billing-test-data.md, templates/report-template.html, scripts/read-mailcatcher.sh, scripts/preflight-check.sh, scripts/health-check.sh, plugin-level references/tool-policy.md.

3. Security review (reviewing-claude-config) — 0 critical, 2 major (M3/M4), 5 minor

  • No committed secrets. Regex sweep for Stripe (sk_test_/sk_live_, pk_*, rk_*, whsec_), AWS (AKIA...), GitHub PATs (ghp_/ghs_/github_pat_), Slack tokens, and generic password = "..." assignments — zero matches.
  • No settings.local.json, .env, or credentials* files committed.
  • References to server/dev/secrets.json are documentation pointers to a runtime config file the user owns locally — not committed secrets.
  • Shell scripts (health-check.sh, preflight-check.sh, read-mailcatcher.sh) are well-bounded, parameterized, and use set -u. No eval/exec/rm -rf/chmod 777/curl | sh patterns.
  • references/tool-policy.md defines a strong allowlist policy (no DB queries, no API-for-UI substitution, Stripe read-only except test-clock advance).
  • Agent tool allowlists are mostly narrow (e.g., service-manager only allows the two specific scripts in this plugin). Notable exceptions: Bash(curl:*) and Bash(ls *) on test-runner — see M3/M4.
  • ignoreHTTPSErrors: true in scripts/playwright.config.json is appropriate for local dev (self-signed cert) — see m8.

Disposition

Recommended actions before merge:

  1. Resolve M1 (relocate or restructure test-web-changes/SKILL.md) and M2 (clean up reading-mailcatcher-api/SKILL.md frontmatter and description).
  2. Tighten M3 and M4 (scope Bash(curl:*) and Bash(ls *) in the test-runner agent).
  3. Address M5 (progressive disclosure for the two large skills) — can be deferred to a follow-up if scope is too large.
  4. Minor items can land in a polish pass.

Once M1–M4 are addressed, the plugin is ready to ship. Remember to bump the version (likely PATCH 1.0.1) and add a CHANGELOG.md entry if any substantive changes land beyond this initial release.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

🤖 Bitwarden Claude Code Review

Overall Assessment: APPROVE

Reviewed the initial v1.0.0 contribution of the bitwarden-playwright-testing plugin — a new, additive plugin with 30 files including 7 agents, 8 skills, 3 shell scripts, an HTML report template, README, CHANGELOG, and the plugin manifest. The plugin is registered correctly in .claude-plugin/marketplace.json and the root README.md table. The orchestration skill (test-web-changes), agent allowlists, and four-category tool policy are internally consistent, and the included shell scripts (preflight-check.sh, health-check.sh, read-mailcatcher.sh) handle untrusted input safely — service-name whitelist validation in the health check, env-var-passed args into the embedded Python in the mailcatcher script, and hardcoded grep patterns in preflight. No new dependencies are introduced (no manifest files in the diff), so no dependency approval flow is required.

Code Review Details

No findings to report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant