Skip to content

Commit 8fd2cd4

Browse files
ryanbas21claude
andcommitted
chore: add CLAUDE.md, agent skills, and project configuration
- Add CLAUDE.md with agent skills config (issue tracker, triage labels, domain docs) - Add docs/agents/ with GitHub issue tracker, triage labels, and domain doc conventions - Add mattpocock/skills (.agents/skills/) for diagnose, tdd, triage, etc. - Add .claude/skills/ symlinks for Claude Code skill discovery - Ignore .claude/settings.local.json (machine-specific permissions) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent b914ec2 commit 8fd2cd4

52 files changed

Lines changed: 2247 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.agents/skills/caveman/SKILL.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
name: caveman
3+
description: >
4+
Ultra-compressed communication mode. Cuts token usage ~75% by dropping
5+
filler, articles, and pleasantries while keeping full technical accuracy.
6+
Use when user says "caveman mode", "talk like caveman", "use caveman",
7+
"less tokens", "be brief", or invokes /caveman.
8+
---
9+
10+
Respond terse like smart caveman. All technical substance stay. Only fluff die.
11+
12+
## Persistence
13+
14+
ACTIVE EVERY RESPONSE once triggered. No revert after many turns. No filler drift. Still active if unsure. Off only when user says "stop caveman" or "normal mode".
15+
16+
## Rules
17+
18+
Drop: articles (a/an/the), filler (just/really/basically/actually/simply), pleasantries (sure/certainly/of course/happy to), hedging. Fragments OK. Short synonyms (big not extensive, fix not "implement a solution for"). Abbreviate common terms (DB/auth/config/req/res/fn/impl). Strip conjunctions. Use arrows for causality (X -> Y). One word when one word enough.
19+
20+
Technical terms stay exact. Code blocks unchanged. Errors quoted exact.
21+
22+
Pattern: `[thing] [action] [reason]. [next step].`
23+
24+
Not: "Sure! I'd be happy to help you with that. The issue you're experiencing is likely caused by..."
25+
Yes: "Bug in auth middleware. Token expiry check use `<` not `<=`. Fix:"
26+
27+
### Examples
28+
29+
**"Why React component re-render?"**
30+
31+
> Inline obj prop -> new ref -> re-render. `useMemo`.
32+
33+
**"Explain database connection pooling."**
34+
35+
> Pool = reuse DB conn. Skip handshake -> fast under load.
36+
37+
## Auto-Clarity Exception
38+
39+
Drop caveman temporarily for: security warnings, irreversible action confirmations, multi-step sequences where fragment order risks misread, user asks to clarify or repeats question. Resume caveman after clear part done.
40+
41+
Example -- destructive op:
42+
43+
> **Warning:** This will permanently delete all rows in the `users` table and cannot be undone.
44+
>
45+
> ```sql
46+
> DROP TABLE users;
47+
> ```
48+
>
49+
> Caveman resume. Verify backup exist first.

.agents/skills/diagnose/SKILL.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
name: diagnose
3+
description: Disciplined diagnosis loop for hard bugs and performance regressions. Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression.
4+
---
5+
6+
# Diagnose
7+
8+
A discipline for hard bugs. Skip phases only when explicitly justified.
9+
10+
When exploring the codebase, use the project's domain glossary to get a clear mental model of the relevant modules, and check ADRs in the area you're touching.
11+
12+
## Phase 1 — Build a feedback loop
13+
14+
**This is the skill.** Everything else is mechanical. If you have a fast, deterministic, agent-runnable pass/fail signal for the bug, you will find the cause — bisection, hypothesis-testing, and instrumentation all just consume that signal. If you don't have one, no amount of staring at code will save you.
15+
16+
Spend disproportionate effort here. **Be aggressive. Be creative. Refuse to give up.**
17+
18+
### Ways to construct one — try them in roughly this order
19+
20+
1. **Failing test** at whatever seam reaches the bug — unit, integration, e2e.
21+
2. **Curl / HTTP script** against a running dev server.
22+
3. **CLI invocation** with a fixture input, diffing stdout against a known-good snapshot.
23+
4. **Headless browser script** (Playwright / Puppeteer) — drives the UI, asserts on DOM/console/network.
24+
5. **Replay a captured trace.** Save a real network request / payload / event log to disk; replay it through the code path in isolation.
25+
6. **Throwaway harness.** Spin up a minimal subset of the system (one service, mocked deps) that exercises the bug code path with a single function call.
26+
7. **Property / fuzz loop.** If the bug is "sometimes wrong output", run 1000 random inputs and look for the failure mode.
27+
8. **Bisection harness.** If the bug appeared between two known states (commit, dataset, version), automate "boot at state X, check, repeat" so you can `git bisect run` it.
28+
9. **Differential loop.** Run the same input through old-version vs new-version (or two configs) and diff outputs.
29+
10. **HITL bash script.** Last resort. If a human must click, drive _them_ with `scripts/hitl-loop.template.sh` so the loop is still structured. Captured output feeds back to you.
30+
31+
Build the right feedback loop, and the bug is 90% fixed.
32+
33+
### Iterate on the loop itself
34+
35+
Treat the loop as a product. Once you have _a_ loop, ask:
36+
37+
- Can I make it faster? (Cache setup, skip unrelated init, narrow the test scope.)
38+
- Can I make the signal sharper? (Assert on the specific symptom, not "didn't crash".)
39+
- Can I make it more deterministic? (Pin time, seed RNG, isolate filesystem, freeze network.)
40+
41+
A 30-second flaky loop is barely better than no loop. A 2-second deterministic loop is a debugging superpower.
42+
43+
### Non-deterministic bugs
44+
45+
The goal is not a clean repro but a **higher reproduction rate**. Loop the trigger 100×, parallelise, add stress, narrow timing windows, inject sleeps. A 50%-flake bug is debuggable; 1% is not — keep raising the rate until it's debuggable.
46+
47+
### When you genuinely cannot build a loop
48+
49+
Stop and say so explicitly. List what you tried. Ask the user for: (a) access to whatever environment reproduces it, (b) a captured artifact (HAR file, log dump, core dump, screen recording with timestamps), or (c) permission to add temporary production instrumentation. Do **not** proceed to hypothesise without a loop.
50+
51+
Do not proceed to Phase 2 until you have a loop you believe in.
52+
53+
## Phase 2 — Reproduce
54+
55+
Run the loop. Watch the bug appear.
56+
57+
Confirm:
58+
59+
- [ ] The loop produces the failure mode the **user** described — not a different failure that happens to be nearby. Wrong bug = wrong fix.
60+
- [ ] The failure is reproducible across multiple runs (or, for non-deterministic bugs, reproducible at a high enough rate to debug against).
61+
- [ ] You have captured the exact symptom (error message, wrong output, slow timing) so later phases can verify the fix actually addresses it.
62+
63+
Do not proceed until you reproduce the bug.
64+
65+
## Phase 3 — Hypothesise
66+
67+
Generate **3–5 ranked hypotheses** before testing any of them. Single-hypothesis generation anchors on the first plausible idea.
68+
69+
Each hypothesis must be **falsifiable**: state the prediction it makes.
70+
71+
> Format: "If <X> is the cause, then <changing Y> will make the bug disappear / <changing Z> will make it worse."
72+
73+
If you cannot state the prediction, the hypothesis is a vibe — discard or sharpen it.
74+
75+
**Show the ranked list to the user before testing.** They often have domain knowledge that re-ranks instantly ("we just deployed a change to #3"), or know hypotheses they've already ruled out. Cheap checkpoint, big time saver. Don't block on it — proceed with your ranking if the user is AFK.
76+
77+
## Phase 4 — Instrument
78+
79+
Each probe must map to a specific prediction from Phase 3. **Change one variable at a time.**
80+
81+
Tool preference:
82+
83+
1. **Debugger / REPL inspection** if the env supports it. One breakpoint beats ten logs.
84+
2. **Targeted logs** at the boundaries that distinguish hypotheses.
85+
3. Never "log everything and grep".
86+
87+
**Tag every debug log** with a unique prefix, e.g. `[DEBUG-a4f2]`. Cleanup at the end becomes a single grep. Untagged logs survive; tagged logs die.
88+
89+
**Perf branch.** For performance regressions, logs are usually wrong. Instead: establish a baseline measurement (timing harness, `performance.now()`, profiler, query plan), then bisect. Measure first, fix second.
90+
91+
## Phase 5 — Fix + regression test
92+
93+
Write the regression test **before the fix** — but only if there is a **correct seam** for it.
94+
95+
A correct seam is one where the test exercises the **real bug pattern** as it occurs at the call site. If the only available seam is too shallow (single-caller test when the bug needs multiple callers, unit test that can't replicate the chain that triggered the bug), a regression test there gives false confidence.
96+
97+
**If no correct seam exists, that itself is the finding.** Note it. The codebase architecture is preventing the bug from being locked down. Flag this for the next phase.
98+
99+
If a correct seam exists:
100+
101+
1. Turn the minimised repro into a failing test at that seam.
102+
2. Watch it fail.
103+
3. Apply the fix.
104+
4. Watch it pass.
105+
5. Re-run the Phase 1 feedback loop against the original (un-minimised) scenario.
106+
107+
## Phase 6 — Cleanup + post-mortem
108+
109+
Required before declaring done:
110+
111+
- [ ] Original repro no longer reproduces (re-run the Phase 1 loop)
112+
- [ ] Regression test passes (or absence of seam is documented)
113+
- [ ] All `[DEBUG-...]` instrumentation removed (`grep` the prefix)
114+
- [ ] Throwaway prototypes deleted (or moved to a clearly-marked debug location)
115+
- [ ] The hypothesis that turned out correct is stated in the commit / PR message — so the next debugger learns
116+
117+
**Then ask: what would have prevented this bug?** If the answer involves architectural change (no good test seam, tangled callers, hidden coupling) hand off to the `/improve-codebase-architecture` skill with the specifics. Make the recommendation **after** the fix is in, not before — you have more information now than when you started.
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
#!/usr/bin/env bash
2+
# Human-in-the-loop reproduction loop.
3+
# Copy this file, edit the steps below, and run it.
4+
# The agent runs the script; the user follows prompts in their terminal.
5+
#
6+
# Usage:
7+
# bash hitl-loop.template.sh
8+
#
9+
# Two helpers:
10+
# step "<instruction>" → show instruction, wait for Enter
11+
# capture VAR "<question>" → show question, read response into VAR
12+
#
13+
# At the end, captured values are printed as KEY=VALUE for the agent to parse.
14+
15+
set -euo pipefail
16+
17+
step() {
18+
printf '\n>>> %s\n' "$1"
19+
read -r -p " [Enter when done] " _
20+
}
21+
22+
capture() {
23+
local var="$1" question="$2" answer
24+
printf '\n>>> %s\n' "$question"
25+
read -r -p " > " answer
26+
printf -v "$var" '%s' "$answer"
27+
}
28+
29+
# --- edit below ---------------------------------------------------------
30+
31+
step "Open the app at http://localhost:3000 and sign in."
32+
33+
capture ERRORED "Click the 'Export' button. Did it throw an error? (y/n)"
34+
35+
capture ERROR_MSG "Paste the error message (or 'none'):"
36+
37+
# --- edit above ---------------------------------------------------------
38+
39+
printf '\n--- Captured ---\n'
40+
printf 'ERRORED=%s\n' "$ERRORED"
41+
printf 'ERROR_MSG=%s\n' "$ERROR_MSG"

.agents/skills/grill-me/SKILL.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
name: grill-me
3+
description: Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when user wants to stress-test a plan, get grilled on their design, or mentions "grill me".
4+
---
5+
6+
Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer.
7+
8+
Ask the questions one at a time.
9+
10+
If a question can be answered by exploring the codebase, explore the codebase instead.
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# ADR Format
2+
3+
ADRs live in `docs/adr/` and use sequential numbering: `0001-slug.md`, `0002-slug.md`, etc.
4+
5+
Create the `docs/adr/` directory lazily — only when the first ADR is needed.
6+
7+
## Template
8+
9+
```md
10+
# {Short title of the decision}
11+
12+
{1-3 sentences: what's the context, what did we decide, and why.}
13+
```
14+
15+
That's it. An ADR can be a single paragraph. The value is in recording *that* a decision was made and *why* — not in filling out sections.
16+
17+
## Optional sections
18+
19+
Only include these when they add genuine value. Most ADRs won't need them.
20+
21+
- **Status** frontmatter (`proposed | accepted | deprecated | superseded by ADR-NNNN`) — useful when decisions are revisited
22+
- **Considered Options** — only when the rejected alternatives are worth remembering
23+
- **Consequences** — only when non-obvious downstream effects need to be called out
24+
25+
## Numbering
26+
27+
Scan `docs/adr/` for the highest existing number and increment by one.
28+
29+
## When to offer an ADR
30+
31+
All three of these must be true:
32+
33+
1. **Hard to reverse** — the cost of changing your mind later is meaningful
34+
2. **Surprising without context** — a future reader will look at the code and wonder "why on earth did they do it this way?"
35+
3. **The result of a real trade-off** — there were genuine alternatives and you picked one for specific reasons
36+
37+
If a decision is easy to reverse, skip it — you'll just reverse it. If it's not surprising, nobody will wonder why. If there was no real alternative, there's nothing to record beyond "we did the obvious thing."
38+
39+
### What qualifies
40+
41+
- **Architectural shape.** "We're using a monorepo." "The write model is event-sourced, the read model is projected into Postgres."
42+
- **Integration patterns between contexts.** "Ordering and Billing communicate via domain events, not synchronous HTTP."
43+
- **Technology choices that carry lock-in.** Database, message bus, auth provider, deployment target. Not every library — just the ones that would take a quarter to swap out.
44+
- **Boundary and scope decisions.** "Customer data is owned by the Customer context; other contexts reference it by ID only." The explicit no-s are as valuable as the yes-s.
45+
- **Deliberate deviations from the obvious path.** "We're using manual SQL instead of an ORM because X." Anything where a reasonable reader would assume the opposite. These stop the next engineer from "fixing" something that was deliberate.
46+
- **Constraints not visible in the code.** "We can't use AWS because of compliance requirements." "Response times must be under 200ms because of the partner API contract."
47+
- **Rejected alternatives when the rejection is non-obvious.** If you considered GraphQL and picked REST for subtle reasons, record it — otherwise someone will suggest GraphQL again in six months.
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# CONTEXT.md Format
2+
3+
## Structure
4+
5+
```md
6+
# {Context Name}
7+
8+
{One or two sentence description of what this context is and why it exists.}
9+
10+
## Language
11+
12+
**Order**:
13+
{A concise description of the term}
14+
_Avoid_: Purchase, transaction
15+
16+
**Invoice**:
17+
A request for payment sent to a customer after delivery.
18+
_Avoid_: Bill, payment request
19+
20+
**Customer**:
21+
A person or organization that places orders.
22+
_Avoid_: Client, buyer, account
23+
24+
## Relationships
25+
26+
- An **Order** produces one or more **Invoices**
27+
- An **Invoice** belongs to exactly one **Customer**
28+
29+
## Example dialogue
30+
31+
> **Dev:** "When a **Customer** places an **Order**, do we create the **Invoice** immediately?"
32+
> **Domain expert:** "No — an **Invoice** is only generated once a **Fulfillment** is confirmed."
33+
34+
## Flagged ambiguities
35+
36+
- "account" was used to mean both **Customer** and **User** — resolved: these are distinct concepts.
37+
```
38+
39+
## Rules
40+
41+
- **Be opinionated.** When multiple words exist for the same concept, pick the best one and list the others as aliases to avoid.
42+
- **Flag conflicts explicitly.** If a term is used ambiguously, call it out in "Flagged ambiguities" with a clear resolution.
43+
- **Keep definitions tight.** One sentence max. Define what it IS, not what it does.
44+
- **Show relationships.** Use bold term names and express cardinality where obvious.
45+
- **Only include terms specific to this project's context.** General programming concepts (timeouts, error types, utility patterns) don't belong even if the project uses them extensively. Before adding a term, ask: is this a concept unique to this context, or a general programming concept? Only the former belongs.
46+
- **Group terms under subheadings** when natural clusters emerge. If all terms belong to a single cohesive area, a flat list is fine.
47+
- **Write an example dialogue.** A conversation between a dev and a domain expert that demonstrates how the terms interact naturally and clarifies boundaries between related concepts.
48+
49+
## Single vs multi-context repos
50+
51+
**Single context (most repos):** One `CONTEXT.md` at the repo root.
52+
53+
**Multiple contexts:** A `CONTEXT-MAP.md` at the repo root lists the contexts, where they live, and how they relate to each other:
54+
55+
```md
56+
# Context Map
57+
58+
## Contexts
59+
60+
- [Ordering](./src/ordering/CONTEXT.md) — receives and tracks customer orders
61+
- [Billing](./src/billing/CONTEXT.md) — generates invoices and processes payments
62+
- [Fulfillment](./src/fulfillment/CONTEXT.md) — manages warehouse picking and shipping
63+
64+
## Relationships
65+
66+
- **Ordering → Fulfillment**: Ordering emits `OrderPlaced` events; Fulfillment consumes them to start picking
67+
- **Fulfillment → Billing**: Fulfillment emits `ShipmentDispatched` events; Billing consumes them to generate invoices
68+
- **Ordering ↔ Billing**: Shared types for `CustomerId` and `Money`
69+
```
70+
71+
The skill infers which structure applies:
72+
73+
- If `CONTEXT-MAP.md` exists, read it to find contexts
74+
- If only a root `CONTEXT.md` exists, single context
75+
- If neither exists, create a root `CONTEXT.md` lazily when the first term is resolved
76+
77+
When multiple contexts exist, infer which one the current topic relates to. If unclear, ask.

0 commit comments

Comments
 (0)