Most AI plugins hope they work. ProductKit proves it.
Opinionated Claude plugins for PMs, designers, engineers, and researchers — built with the same eval-driven development practices used by serious AI product teams. Each plugin ships with a behavioral eval harness: an LLM-as-judge test suite that verifies the skill actually changes Claude's behavior, not just that the file parses.
Most Claude plugin repos have zero behavioral testing. Their CI proves files parse. That's it.
ProductKit ships a two-call LLM-as-judge eval harness: every plugin has a test suite of behavioral cases. A subject call loads the skill and generates a response. A grader call scores the response against falsifiable criteria — "did Claude refuse to write the PRD before asking clarifying questions?" "did it flag the vanity metric?" Each case gets a weighted pass/fail score with a 75% threshold.
31 cases across 3 plugins. Every case you can read, run, and extend.
| Plugin | For | Description | Status |
|---|---|---|---|
| Strategic PM | Product Managers | PRDs, roadmaps, competitive analysis, user research, AI product playbook | ✅ Live |
| Product Writing Studio | All roles | Exec comms, strategy memos, board decks, stakeholder emails | ✅ Live |
| PM Interview Prep | PMs (job seekers) | Mock interviews, structured answer coaching, product sense & execution questions | ✅ Live |
| UX Strategy | Designers | UX heuristics, design system thinking, usability audits, interaction patterns | 🔜 Coming soon |
| Metrics & Analytics | PMs, Data, Eng | Dashboard design, metric trees, experiment analysis, statistical rigor checks | 📋 Planned |
Want to build a plugin? See CONTRIBUTING.md — we're actively looking for contributors across product, design, engineering, and research.
Open your terminal and run these commands inside Claude Code:
Step 1: Add the marketplace
/plugin marketplace add shahcolate/Product-KitThis clones the repo locally. You only do this once.
Step 2: Install the plugin
/plugin add strategic-pmWhen prompted, choose your scope:
- User scope — available across all your projects
- Project scope — only available in the current project
Step 3: Enable auto-updates (recommended) When prompted during install, select auto-update. New versions will sync automatically every time Claude Code starts.
To update manually at any time:
/plugin marketplace updateThat's it. Claude will automatically load Strategic PM whenever you're doing product work. No extra commands needed — just start working.
- Download the latest ZIP from the Releases page
- Go to Settings → Capabilities → Skills
- Click "Upload skill" and upload the ZIP
- Claude auto-loads it whenever the relevant work is detected
Claude.ai doesn't support auto-updates yet. ⭐ Star this repo and watch releases to get notified when new versions ship.
Organization Owners can provision plugins centrally from admin settings. Admin-provisioned skills are enabled by default for all users — one upload covers your entire product team.
Skills are supported via the /v1/skills API endpoint. See Anthropic's Skills documentation for integration details.
Turn Claude from a template-filler into an opinionated PM co-pilot.
500+ lines of product strategy instruction: the Five Laws, Devil's Advocate Protocol, Decision Journal, PM Maturity Adapter, AI Product Management Playbook, and anti-pattern detection for 13 common PM failure modes.
Claude becomes an expert product communicator — not a generic writer.
Audience-First Protocol, Pyramid Principle enforcement, SCQA structuring, Clarity Laws, and document type intelligence for exec updates, strategy memos, board decks, stakeholder emails, product announcements, one-pagers, design briefs, and launch comms.
Your personal PM interview coach — not a flashcard app.
Mock interview mode with scoring rubrics, JTBD-framed product sense coaching, structured decomposition for estimation questions, STAR format behavioral coaching, and anti-pattern detection for common interview mistakes. Calibrated for FAANG, growth-stage, enterprise, and consumer companies.
See what ProductKit produces before installing it. These teardowns were generated by the AI Product Teardown Tool using Strategic PM as the system prompt.
| Product | Category | Key Finding |
|---|---|---|
| Notion | Productivity | Strong switching costs, but "everything tool" positioning creates value prop clarity risk |
| Linear | Dev Tools | Counter-positioning against Jira works — but narrow ICP limits TAM |
| Figma | Design | Network effects + multiplayer = strongest moat in the set, post-Adobe risk resolved |
| ChatGPT | AI | Fastest adoption in history, but retention architecture is the weakest dimension |
Run any product through Strategic PM's full framework battery and get a structured teardown across 6 dimensions: JTBD & Value Prop, Competitive Moat (7 Powers), Growth Model, Anti-Pattern Scan (all 13), Monetization, and Strategic Verdict.
This is a standalone CLI tool that calls the Anthropic API directly — not a Claude plugin. Clone the repo and run it with your own API key:
git clone https://github.com/shahcolate/Product-Kit.git
cd Product-Kit
pip install anthropic
export ANTHROPIC_API_KEY=sk-ant-...
python scripts/teardown.py "Notion"
python scripts/teardown.py "Linear" --output markdown --saveCompare two products head-to-head across all 6 dimensions plus a 7th "Head-to-Head Verdict":
python scripts/teardown.py "Notion" --vs "Coda"
python scripts/teardown.py "Figma" --vs "Canva" --output markdown --saveGenerate a thread-ready summary — a punchy hook, 5 bullet verdicts, and a bottom line:
python scripts/teardown.py "Notion" --output socialAll dimensions run in parallel via async API calls (~15s vs ~60s sequential).
Standalone CLI tools that do real work for product teams. Each calls the Anthropic API directly — clone the repo, set your API key, and run.
Ingest user feedback from CSV/JSON, cluster by theme, and produce structured synthesis with quotes, sentiment, urgency, and recommended actions. Multi-phase pipeline: extract themes per batch, merge across batches, assess actionability.
pip install -r requirements.txt
python scripts/feedback_synth.py feedback.csv --text-column "comment" --source "NPS Q1 2026"
python scripts/feedback_synth.py reviews.json --format json --text-field "body" --max-themes 8 --output markdown --saveRead git log (commit range, tag range, or date range), classify changes, and generate polished release notes. Produces internal (eng-facing) and external (customer-facing) versions — or both.
python scripts/release_notes.py --repo . --since v2.3.0 --audience external
python scripts/release_notes.py --repo . --range v2.3.0..v2.4.0 --output markdown --saveCapture competitor pages via headless browser, then use Claude vision to detect and summarize semantic changes vs. previous captures. Run weekly/monthly via cron. Semantic diff ("they removed the free tier") not pixel diff.
pip install -r requirements-browser.txt && python -m playwright install chromium
python scripts/competitor_watch.py --url "https://competitor.com/pricing" --name "Competitor X"
python scripts/competitor_watch.py --config competitors.json --output markdown --saveNavigate a signup/onboarding flow via headless browser, capture each screen, and produce a structured UX audit: friction scoring per screen, time-to-value assessment, drop-off risk identification, and overall A-F grade.
python scripts/onboarding_audit.py "https://app.example.com/signup" --product "Example App" --category "B2B SaaS"
python scripts/onboarding_audit.py "https://linear.app/signup" --max-steps 15 --output markdown --saveTake a URL + natural language goal, AI-plan navigation steps, capture screenshots at each step, and generate an annotated step-by-step tutorial. Supports pre-defined steps JSON and cookie-based auth for authenticated flows.
python scripts/tutorial.py "https://app.linear.dev" --goal "Create a new project and add a task"
python scripts/tutorial.py "https://notion.so" --steps steps.json --auth-cookie cookie.txt --output markdown --saveThe existing CI (validate.yml) proves plugin files parse and exist. That's table stakes. The eval harness answers the harder question: does loading this skill actually change what Claude does?
Each plugin ships with an evals/<plugin>/cases.json — a suite of behavioral test cases with falsifiable criteria. The runner makes two API calls per case:
- Subject call — Claude responds with the skill as its system prompt, simulating a real user interaction
- Grader call — a separate Claude call scores the response against each criterion and returns structured JSON
Example criteria from strategic-pm:
- "Response does NOT immediately begin writing a PRD" — passes if the behavior is absent
- "Response asks about the underlying problem, user need, or metric" — passes if the behavior is present
- "Questions are batched rather than asked one at a time" — weighted 2, critical criteria weighted 3
Run locally:
git clone https://github.com/shahcolate/Product-Kit.git
cd Product-Kit
pip install anthropic
export ANTHROPIC_API_KEY=sk-ant-...
python scripts/run_evals.py
# or: --plugin strategic-pm --output json --model claude-haiku-4-5-20251001Prove the skill actually matters. Run every eval case twice — with the skill and without — and see the behavioral lift:
python scripts/run_evals.py --plugin strategic-pm --baselinespm-001 · Reverse Brief trigger on PRD request
WITH skill: 10/10 (100%) ✅
WITHOUT skill: 3/10 (30%) ❌
Skill lift: +70 points
Run in CI: Actions → Behavioral Eval → Run workflow (manual trigger — ~$1–2/run at Opus pricing).
Want a teardown without cloning the repo? Open a GitHub issue using the Product Teardown Request template, and a GitHub Action will run the teardown and post results as a comment.
- Go to Issues → New Issue
- Select Product Teardown Request
- Enter the product name in the title:
Teardown: Stripe - Optionally add context in the body
- Results appear as a comment within ~2 minutes
Rate-limited to 5 teardowns/day to manage API costs.
productkit/
├── README.md # You're here
├── CONTRIBUTING.md # How to contribute plugins and skills
├── CHANGELOG.md # Version history
├── LICENSE # MIT
├── marketplace.json # Plugin marketplace catalog
├── .claude-plugin/
│ └── marketplace.json # Marketplace marker
├── plugins/
│ ├── strategic-pm/ # ✅ Live
│ │ ├── README.md
│ │ ├── .claude-plugin/plugin.json
│ │ └── skills/strategic-pm/SKILL.md
│ ├── product-writing-studio/ # ✅ Live
│ │ ├── README.md
│ │ ├── .claude-plugin/plugin.json
│ │ └── skills/product-writing-studio/SKILL.md
│ └── pm-interview-prep/ # ✅ Live
│ ├── README.md
│ ├── .claude-plugin/plugin.json
│ └── skills/pm-interview-prep/SKILL.md
├── evals/
│ ├── README.md # Eval methodology and how to run
│ ├── strategic-pm/cases.json # 12 behavioral eval cases
│ ├── product-writing-studio/cases.json # 13 behavioral eval cases
│ └── pm-interview-prep/cases.json # 6 behavioral eval cases
├── examples/
│ └── teardowns/ # Example teardown outputs
│ ├── notion.md
│ ├── linear.md
│ ├── figma.md
│ └── chatgpt.md
├── scripts/
│ ├── _common.py # Shared utilities across CLI tools
│ ├── feedback_synth.py # Feedback Synthesizer (CSV/JSON → themed synthesis)
│ ├── release_notes.py # Release Notes Generator (git log → polished notes)
│ ├── competitor_watch.py # Competitive Screenshot Monitor (Playwright + vision)
│ ├── onboarding_audit.py # Onboarding Flow Auditor (Playwright + vision)
│ ├── tutorial.py # Tutorial Creator (Playwright + vision + annotation)
│ ├── run_evals.py # Eval runner (--baseline for skill vs vanilla)
│ └── teardown.py # Teardown CLI (--vs, --output social, async)
├── requirements.txt # Base deps (anthropic)
├── requirements-browser.txt # Browser tool deps (playwright, Pillow)
├── .github/
│ ├── workflows/teardown.yml # Teardown-on-issue Action
│ └── ISSUE_TEMPLATE/teardown_request.md
├── teardowns/ # Saved teardown reports (gitignored)
├── feedback-synthesis/ # Saved feedback synthesis reports (gitignored)
├── release-notes/ # Saved release notes (gitignored)
├── competitor-watch/ # Competitor screenshots and reports (gitignored)
├── onboarding-audits/ # Onboarding audit reports (gitignored)
└── tutorials/ # Generated tutorials with screenshots (gitignored)
| Status | Plugin | Audience | Description |
|---|---|---|---|
| ✅ Live | Strategic PM | PMs | Full-stack PM co-pilot with AI Playbook |
| ✅ Live | Product Writing Studio | All roles | Exec comms, strategy memos, board decks, stakeholder emails |
| ✅ Live | PM Interview Prep | PMs | Mock interviews, structured answer coaching |
| 🔜 Next | UX Strategy | Designers | Heuristics, usability audits, design system thinking |
| 📋 Planned | Metrics & Analytics | PMs, Data, Eng | Metric trees, experiment analysis, dashboard design |
| 💡 Exploring | Research Ops | Researchers, PMs | Automated user research synthesis with web search |
| 💡 Exploring | Jira/Linear Bridge | PMs, Eng | Roadmap-to-ticket workflows via MCP |
Have a plugin idea? Open an issue or read CONTRIBUTING.md to propose one.
ProductKit gets better when people who build real products contribute to it. We're looking for PMs, designers, engineers, researchers — anyone who's shipped and learned from it.
- Improve existing plugins — add frameworks, edge cases, anti-patterns
- Build new plugins — bring your discipline's expertise
- Report bugs — when Claude misapplies something, tell us
Read the full guide → CONTRIBUTING.md
Q: Does this work on the free Claude plan? A: Yes. Skills are available on free, Pro, Max, Team, and Enterprise plans. Code execution must be enabled.
Q: How do I get updates automatically?
A: Claude Code users can enable auto-update during install — new versions sync at startup. Or run /plugin marketplace update manually. Claude.ai users re-download from Releases.
Q: Will Claude always use this skill? A: Claude auto-detects when the skill is relevant based on your task. Any matching work should trigger it automatically.
Q: Can I use multiple plugins at once? A: Yes. Claude loads multiple skills simultaneously. ProductKit plugins are designed to compose well with each other and with third-party skills.
Q: I'm on a Team/Enterprise plan. Can I deploy this org-wide? A: Yes. Organization Owners can provision skills centrally. One upload covers your entire team.
Q: I want to add a plugin for my discipline (design, engineering, research). Can I? A: Yes — that's the point. See CONTRIBUTING.md for the plugin proposal process.
Q: Something isn't working right. A: Open an issue with what you asked Claude and what went wrong. Include the output if possible.
MIT — use it, fork it, improve it, share it.
Thinking partners for the people who build products.
⭐ Star this repo if it makes your work sharper. It helps others find it.