Initial version of playwright-testing plugin#140
Conversation
Validation Summary —
|
| # | Area | File | Finding | Remediation |
|---|---|---|---|---|
| M1 | Skill misclassified as orchestration command | plugins/bitwarden-playwright-testing/skills/test-web-changes/SKILL.md:1-6 |
Frontmatter declares argument-hint and allowed-tools (slash-command-only fields). Body is a step-by-step team-lead runbook ("Step 0 — Parse input", "--confirm flag", "Create team named...", "Dispatch ... Wait for completion") written in second person — i.e., a slash-command runbook rather than a progressive-disclosure skill. |
Either relocate to commands/test-web-changes.md, or strip the command-only frontmatter and rewrite as a third-person skill. README likely advertises /test-web-changes; reconcile with .claude-plugin/plugin.json (no commands[] entry today). |
| M2 | Skill frontmatter has command-only fields | plugins/bitwarden-playwright-testing/skills/reading-mailcatcher-api/SKILL.md:4-5 |
argument-hint and allowed-tools are present in SKILL frontmatter and the body contains a "User invocation" section. Description is ~720 chars (target ~500) and opens in second person ("Use this skill whenever you..."). |
Remove argument-hint/allowed-tools from the SKILL or split into a co-located command. Tighten the description to ~300–400 chars and convert to third-person infinitive ("Read an email from the local Bitwarden Mailcatcher inbox..."). Remove the duplicated "Invoke whenever a workflow needs to read..." sentence. |
| M3 | Overly broad Bash permission — curl | plugins/bitwarden-playwright-testing/agents/test-runner/AGENT.md:11 |
Bash(curl:*) permits curl to any URL on the internet with any method/body. references/tool-policy.md constrains intent, but the harness allowlist is what is actually enforced. Combined with Read (unscoped) this forms a read-then-exfil chain reachable via prompt-injection through Jira ticket bodies or plan files. |
Scope to local hosts only: Bash(curl * http://localhost:*), Bash(curl * https://localhost:*), Bash(curl * http://127.0.0.1:*). Or wrap legitimate Category-3 curls in a co-located script (same pattern as read-mailcatcher.sh) and allow only the script path. |
| M4 | Overly broad Bash permission — ls | plugins/bitwarden-playwright-testing/agents/test-runner/AGENT.md:11 |
Bash(ls *) permits ls /etc, ls ~/.ssh, ls /home/*/.aws, etc. Only artifacts-dir listing is documented (executing-web-tests/SKILL.md:204). |
Scope to artifacts dir: Bash(ls */.playwright-testing-artifacts/*). Or replace with Glob/Read if those are already allowed. |
| M5 | Long skills lack progressive disclosure | plugins/bitwarden-playwright-testing/skills/executing-web-tests/SKILL.md (2,017 words) and plugins/bitwarden-playwright-testing/skills/exploring-application-context/SKILL.md (2,129 words) |
Both are dense single files with no references/ or examples/. Candidate splits: adaptive-assertion, screenshot/toast policy, pause-resume protocol (executing-web-tests); state-recording rules, output-schema (exploring-application-context). |
Move heavy reference sections into references/*.md; add an examples/sample-context.md to exploring-application-context so downstream skills have a canonical anchor. |
Minor — recommended polish
| # | Area | File | Finding | Remediation |
|---|---|---|---|---|
| m1 | Missing agent <example> blocks |
All 7 agents/*/AGENT.md |
None of the 7 agent description frontmatter fields contain <example> blocks. All agents are user-invocable: false and dispatched programmatically by test-web-changes, so routing examples are optional, but they document the dispatch pattern for future maintainers. |
Optional: add 1–2 <example> blocks to each description showing a representative team-lead dispatch. |
| m2 | Lean skills below 1,000-word target | compiling-test-report/SKILL.md (586 words), determining-required-services/SKILL.md (368 words), verifying-environment-health/SKILL.md (591 words) |
Below the 1,000-word floor, but each delegates appropriately to a template, reference, or script. Acceptable as intentional minimalism. | No action required; flagging only for visibility. |
| m3 | Missing examples directory | build-test-cases/SKILL.md |
Strict output format described inline but no examples/sample-test-case.md to anchor on. |
Optional: add examples/sample-test-case.md with a complete TC. |
| m4 | Frontmatter–body separation | compiling-test-report/SKILL.md:5, determining-required-services/SKILL.md:5 |
No blank line between closing --- and the first body line. Render-safe but YAML convention prefers a blank separator. |
Insert a blank line after frontmatter close. |
| m5 | Mixed path conventions | build-test-cases/SKILL.md:20,22,104 |
Mixes ${CLAUDE_PLUGIN_ROOT} and ${CLAUDE_SKILL_DIR} for references. Current usage is functionally correct (cross-skill vs in-skill) but worth standardizing. |
Document the convention in the plugin README or keep as-is. |
| m6 | URL duplication | build-test-cases/SKILL.md:24 |
Hardcodes http://localhost:62911 (Admin Portal). |
Link to references/services.md as the URL of record. |
| m7 | Trailing whitespace / missing trailing newline | executing-web-tests/SKILL.md:17 (cell trailing whitespace); verifying-environment-health/SKILL.md (no trailing newline at line 67 EOF) |
Cosmetic. | pnpm run format should resolve. |
| m8 | ignoreHTTPSErrors: true lacks in-file annotation |
plugins/bitwarden-playwright-testing/scripts/playwright.config.json:1-7 |
The config disables TLS verification, which is correct for local dev (self-signed cert at https://localhost:8080) and documented in executing-web-tests/SKILL.md. JSON has no comments, so this constraint is invisible to someone reading the file alone. |
Optional: add a "_comment": "Local dev only — disables TLS verification because localhost uses self-signed certs" key. |
| m9 | Stripe write allowlist — narrow but worth annotating | agents/test-runner/AGENT.md:11 |
Bash(stripe post /v1/test_helpers/test_clocks/*/advance:*) is appropriately pinned to test-clock advance. A future maintainer could widen the wildcard accidentally. |
Add an inline note in the AGENT.md that the wildcard must not be widened. |
| m10 | Untrusted input lacks documented sanitization posture | skills/test-web-changes/SKILL.md:10-25, agents/context-gatherer/AGENT.md:22-30 |
Pipeline ingests Jira ticket bodies, plan markdown, and free-form feature descriptions — all data, but currently fed into agent prompts without an explicit "treat as data, not instructions" note. | Defense-in-depth: add a one-line note in context-gatherer/AGENT.md that imperative phrases inside the source are to be ignored. Made effectively safe once M3 (curl scoping) is fixed. |
| m11 | Mailcatcher DELETE relies on instruction-level gate | skills/reading-mailcatcher-api/SKILL.md:153, references/email-patterns.md:95 |
The skill says "ALWAYS ask the user before running this command" for curl -X DELETE http://localhost:1080/messages. Once M3 is fixed (curl scoped to local hosts), this remains an LLM-instruction gate but with bounded blast radius. |
No action required after M3 is fixed. |
What was validated
1. Plugin structure (plugin-validator) — PASS
plugin.jsonis valid JSON; includesname,version,description,author,homepage,repository,keywords,agents[]name: bitwarden-playwright-testingmatches the directory nameversion: 1.0.0matchesmarketplace.jsonentry at line 99- All 7
agents[]paths resolve to existing AGENT.md files - All 7 agents: lowercase-hyphenated names within 3–50 chars; valid
model: sonnet; valid colors (green,orange,blue,yellow,purple,cyan); system prompts >200 chars each; tool allowlists present and (mostly) scoped - README.md is comprehensive (162 lines); CHANGELOG.md follows Keep a Changelog format with
## [1.0.0] - 2026-05-28and### Added - No commands or hooks declared (consistent with the plugin's design as agents + skills + orchestration)
- No MCP server definitions (the plugin consumes tools from other plugins like
bitwarden-atlassian-toolsandplaywright-cli)
2. Skill review (skill-reviewer) — 6 of 8 pass cleanly
| Skill | Verdict |
|---|---|
build-test-cases |
Pass — 1,946 words; minor mixed-path note |
compiling-test-report |
Pass — 586 words; lean but delegates to template |
determining-required-services |
Pass — 368 words; lean but delegates to references/services.md |
executing-web-tests |
Pass with split recommendation — 2,017 words; no subdirs despite size |
exploring-application-context |
Pass with split recommendation — 2,129 words; no examples/ |
reading-mailcatcher-api |
Needs revision — description voice + command-only frontmatter (M2) |
test-web-changes |
Needs major revision — misclassified, should be a slash command (M1) |
verifying-environment-health |
Pass — 591 words; delegates to two scripts |
All referenced files exist: references/services.md, references/known-flows.md, references/email-patterns.md, references/billing-test-data.md, templates/report-template.html, scripts/read-mailcatcher.sh, scripts/preflight-check.sh, scripts/health-check.sh, plugin-level references/tool-policy.md.
3. Security review (reviewing-claude-config) — 0 critical, 2 major (M3/M4), 5 minor
- No committed secrets. Regex sweep for Stripe (
sk_test_/sk_live_,pk_*,rk_*,whsec_), AWS (AKIA...), GitHub PATs (ghp_/ghs_/github_pat_), Slack tokens, and genericpassword = "..."assignments — zero matches. - No
settings.local.json,.env, orcredentials*files committed. - References to
server/dev/secrets.jsonare documentation pointers to a runtime config file the user owns locally — not committed secrets. - Shell scripts (
health-check.sh,preflight-check.sh,read-mailcatcher.sh) are well-bounded, parameterized, and useset -u. Noeval/exec/rm -rf/chmod 777/curl | shpatterns. references/tool-policy.mddefines a strong allowlist policy (no DB queries, no API-for-UI substitution, Stripe read-only except test-clock advance).- Agent tool allowlists are mostly narrow (e.g.,
service-manageronly allows the two specific scripts in this plugin). Notable exceptions:Bash(curl:*)andBash(ls *)ontest-runner— see M3/M4. ignoreHTTPSErrors: trueinscripts/playwright.config.jsonis appropriate for local dev (self-signed cert) — see m8.
Disposition
Recommended actions before merge:
- Resolve M1 (relocate or restructure
test-web-changes/SKILL.md) and M2 (clean upreading-mailcatcher-api/SKILL.mdfrontmatter and description). - Tighten M3 and M4 (scope
Bash(curl:*)andBash(ls *)in the test-runner agent). - Address M5 (progressive disclosure for the two large skills) — can be deferred to a follow-up if scope is too large.
- Minor items can land in a polish pass.
Once M1–M4 are addressed, the plugin is ready to ship. Remember to bump the version (likely PATCH 1.0.1) and add a CHANGELOG.md entry if any substantive changes land beyond this initial release.
🤖 Bitwarden Claude Code ReviewOverall Assessment: APPROVE Reviewed the initial v1.0.0 contribution of the Code Review DetailsNo findings to report. |
🎟️ Tracking
N/A — new plugin contribution.
📔 Objective
Adds the
bitwarden-playwright-testingplugin (v1.0.0): an automated end-to-end UI testing framework for Bitwarden web changes, driven by Playwright.The plugin exposes a single user-facing skill,
test-web-changes, that takes a Jira ticket ID, an implementation-plan file, or a free-form feature description and turns it into a complete Playwright test run — gathering context, exploring the affected codebases, building grounded test cases, verifying the local dev environment, executing the tests through the browser UI, and compiling an HTML report with full-page screenshots.How it works
test-web-changesruns as a team lead over an eight-task pipeline, dispatching a seven-agent team and persisting each artifact to.playwright-testing-artifacts/<slug>/:context-gatherercode-explorerservice-mappertest-plannerbuild-test-casesservice-manager/aliveendpoints, and Angular bootstrap; halts on failuretest-runnerplaywright-cliagent with guardrails and screenshotsreport-compilerNotable design points
build-test-casesbakes Stripe test values into UI steps; a billing-related 400 during execution halts the run immediately.Prerequisites
playwright-cliClaude Code plugin (render verification and browser execution depend on it).Notes for reviewers
marketplace.jsonandREADME.md; passesvalidate-marketplace.shandvalidate-plugin-structure.sh(0 errors, 0 warnings).Screenshots