From 04b8fe58facf455684bffb20f8533b83cc9426a6 Mon Sep 17 00:00:00 2001 From: Dominik Simonik Date: Wed, 15 Apr 2026 15:19:18 +0200 Subject: [PATCH 1/4] feat: add /debug skill for structured hypothesis-driven debugging Implements a new user-invocable skill that guides users through a structured cycle: hypothesize, instrument, reproduce, analyze, fix, verify. All debug instrumentation is tagged with WINGSPAN-DEBUG and removed at the end, leaving only the fix. Closes #141 Co-Authored-By: Claude Opus 4.6 (1M context) --- CLAUDE.md | 2 + skills/debug/SKILL.md | 239 ++++++++++++++++++++ skills/debug/references/validate-and-fix.md | 1 + 3 files changed, 242 insertions(+) create mode 100644 skills/debug/SKILL.md create mode 120000 skills/debug/references/validate-and-fix.md diff --git a/CLAUDE.md b/CLAUDE.md index 6abc6a6..8725c7d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -26,6 +26,8 @@ Standalone Skills: - **`/debrief`** — Produce a structured, blameless debrief document after an incident, failed release, or significant bug. +- **`/debug`** — Structured, interactive hypothesis-driven debugging. Instruments code to test hypotheses, then removes all debug code after fixing. + Each phase persists its output to `docs/` so the next phase can discover it from a cold start. **Fast path:** **`/hotfix`** — Streamlined workflow for emergency fixes. Skips brainstorm and planning but enforces review and testing. Use when speed matters but quality is still non-negotiable. diff --git a/skills/debug/SKILL.md b/skills/debug/SKILL.md new file mode 100644 index 0000000..c3e7a68 --- /dev/null +++ b/skills/debug/SKILL.md @@ -0,0 +1,239 @@ +--- +name: debug +user-invocable: true +description: Finds and fixes bugs through structured hypothesis testing and code instrumentation. Use when user says "debug", "debug this", "find the bug", "what's causing this", "troubleshoot", or "why is this broken". +argument-hint: bug description, error message, or reproduction steps +effort: high +compatibility: Designed for Claude Code (or similar products with agent support) +--- + +# Debug — structured hypothesis-driven debugging + +Find and fix bugs through a structured cycle: hypothesize, instrument, reproduce, analyze, fix, verify. All debug instrumentation is removed at the end, leaving only the fix. + +## Bug Description + +$ARGUMENTS + +**If the bug description above is empty, ask the user**: "What's the bug? Describe the symptoms, paste an error message, or explain how to reproduce it. Include any hypotheses you have about the cause." + +DO NOT proceed until you have a bug description. + +## Phase 0 — Understand + +Summarize the bug in one sentence. Identify: + +- **Symptom**: what the user observes (error message, wrong behavior, crash) +- **Suspected area**: which layer or component is likely involved +- **User hypotheses**: any theories the user provided about the cause + +If the user did not provide hypotheses, use **AskUserQuestion**: "Do you have any guesses about what might be causing this? Even rough intuitions help narrow the search." + +1. **Yes** — let the user describe their hypothesis +2. **No, just investigate** — proceed without user hypotheses + +## Phase 1 — Hypothesize + +### 1.1. Explore the codebase + +Run a focused codebase exploration to understand the area around the bug: + +- Task @codebase-review-agent("Locate the code responsible for this behavior. Focus on the symptom described — do not survey the entire codebase. Bug: ") + +After the agent returns, read the identified files and their immediate context. + +### 1.2. Formulate hypotheses + +Based on the code exploration and any user hypotheses, formulate **2-5 concrete hypotheses** about what could cause the bug. Each hypothesis must be: + +- **Specific**: points to a concrete code path or condition +- **Testable**: can be confirmed or ruled out with debug output +- **Tagged**: assigned an identifier (H1, H2, H3, ...) + +Present the hypotheses to the user: + +``` +## Hypotheses + +- **H1**: [description] — test by checking [what to observe] +- **H2**: [description] — test by checking [what to observe] +- **H3**: [description] — test by checking [what to observe] +``` + +Use **AskUserQuestion**: "Do these hypotheses look right? I'll instrument the code to test them." + +1. **Proceed (Recommended)** — instrument and test these hypotheses +2. **Adjust** — modify or add hypotheses before proceeding + +### 1.3. Set up workspace + +Before writing any files, ensure the session is on a working branch: + +- Call @create-branch to check and optionally create a working branch or worktree. + +## Phase 2 — Instrument + +Add targeted debug instrumentation to test each hypothesis. Follow these rules: + +- **Tag every addition** with a comment containing `WINGSPAN-DEBUG` so it can be reliably found and removed later. Use the language's comment syntax (e.g., `// WINGSPAN-DEBUG`, `# WINGSPAN-DEBUG`, ``). +- **Label output by hypothesis**: each log statement must identify which hypothesis it tests (e.g., `[H1]`, `[H2]`). +- **Log values, not just markers**: capture the actual state (variable values, conditions, return values) that confirms or rules out each hypothesis. +- **Minimize invasiveness**: add logging only — do not change control flow, add dependencies, or modify behavior. +- **Keep it simple**: use the project's existing logging mechanism or basic print/console output. Do not introduce new dependencies. + +If the project uses a build or compilation step, run it to confirm the instrumentation compiles. Fix any build failures before proceeding. + +After instrumenting, tell the user: + +1. Exactly what steps to take to reproduce the bug +2. Where the debug output will appear (console, log file, etc.) +3. What to copy/paste back (or which file to point to) + +Use **AskUserQuestion**: "I've added debug instrumentation. Please reproduce the bug and share the output." + +1. **Here's the output** — user provides the debug output +2. **Output is in a file** — user provides a path to read +3. **Adjust instrumentation** — the instrumentation needs changes before reproducing + +**If "Adjust instrumentation"**: ask what needs to change, update, and re-present instructions. + +## Phase 3 — Analyze + +Read the debug output. For each hypothesis, determine: + +- **Confirmed**: the output shows the suspected condition is true +- **Ruled out**: the output shows the suspected condition is false +- **Inconclusive**: not enough information to decide + +### If a hypothesis is confirmed + +Summarize the root cause to the user in 2-3 sentences, referencing the specific debug output that confirms it. Proceed to Phase 4. + +### If all hypotheses are ruled out + +Explain what was learned from ruling them out. Use **AskUserQuestion**: + +1. **New hypotheses (Recommended)** — formulate new hypotheses based on what was eliminated, return to Phase 1.2 +2. **Add more instrumentation** — keep existing instrumentation and add more, return to Phase 2 +3. **Stop** — clean up instrumentation and end (go to Phase 6) + +### If inconclusive + +Explain what was learned and what remains unclear. Use **AskUserQuestion**: + +1. **Refine instrumentation (Recommended)** — adjust logging to get clearer signal, return to Phase 2 +2. **New hypotheses** — start fresh with new hypotheses, return to Phase 1.2 + +## Phase 4 — Fix + +### 4.1. Implement the fix + +Write the minimal change that addresses the confirmed root cause: + +- Change only what is necessary — no drive-by refactors. +- Match surrounding code style. +- If the fix grows beyond the original scope, stop and suggest `/plan` → `/build`. + +### 4.2. Remove debug instrumentation + +Remove ALL debug instrumentation added in Phase 2. Search for `WINGSPAN-DEBUG` across the codebase and remove every tagged line or block. Verify none remain: + +```bash +grep -r "WINGSPAN-DEBUG" . +``` + +If any remain, remove them. The codebase must contain only the fix — no debug instrumentation. + +### 4.3. Validate + +Follow the [validation and fix procedure](references/validate-and-fix.md). + +## Phase 5 — Verify + +Use **AskUserQuestion**: "I've implemented the fix and removed all debug instrumentation. Please reproduce the original bug to confirm it's resolved." + +1. **Fixed** — the bug is resolved +2. **Still broken** — describe what happened +3. **New issue introduced** — describe the new problem + +**If "Fixed"**: proceed to Phase 7. + +**If "Still broken"**: read the user's description. Use **AskUserQuestion**: + +1. **Adjust the fix** — root cause was right but fix was incomplete, revise it and return to Phase 4.1 +2. **Re-investigate** — root cause hypothesis may be wrong, return to Phase 1 with new information +3. **Stop** — revert all changes and end + +**If "New issue introduced"**: fix the regression, re-validate, and ask again. + +## Phase 6 — Cleanup Only + +Reached when the user stops debugging without a fix. + +Remove ALL debug instrumentation. Search for `WINGSPAN-DEBUG` across the codebase: + +```bash +grep -r "WINGSPAN-DEBUG" . +``` + +Remove every match. Confirm to the user that all debug instrumentation has been removed and the codebase is clean. + +## Phase 7 — Complete + +### Final validation + +Run the project's formatter, linter, and test runner one last time. Fix any failures. + +### Handoff + +Use **AskUserQuestion**: "Bug fixed and verified! What would you like to do next?" + +1. **Commit the fix (Recommended)** — create a commit with the fix +2. **Add a regression test first** — write a test that reproduces the original bug, then commit +3. **Done** — end the session + +**If "Commit the fix"**: create a single commit: + +```text +fix: + +Root cause: <1-2 sentence explanation> +``` + +**If "Add a regression test first"**: write a test that fails without the fix and passes with it. Run validation, then create the commit. + +## Gotchas + +- The `WINGSPAN-DEBUG` tag is the single source of truth for cleanup. Every debug addition must include it — no exceptions. +- If the bug is non-deterministic (race condition, flaky test), instrumentation may need multiple reproduction attempts. Ask the user to reproduce several times if needed. +- If more than 3 hypothesis-test cycles pass without progress, suggest a different approach: `/brainstorm` for deeper analysis or pairing with a teammate. +- Debug instrumentation must never be committed. Phase 4.2 and Phase 6 exist specifically to prevent this. +- If the user's reproduction environment differs from the development environment (production, staging, specific device), note that instrumentation must be compatible with that environment. + +## Evaluation queries + +### Should trigger +1. "Debug this crash — the app dies when I tap the submit button." +2. "What's causing the test failure in the auth module?" +3. "Help me find the bug — users report stale data after refresh." +4. "Troubleshoot why the API returns 500 on this specific payload." +5. "Why is this broken? It worked yesterday before the deploy." + +### Should NOT trigger +1. "Add a dark mode toggle to the settings screen." +2. "Review this PR before I merge it." +3. "Write unit tests for the checkout flow." +4. "Refactor the data layer to use the repository pattern." +5. "Create a plan for the new onboarding feature." + +### Edge cases +1. "This test is flaky — passes locally, fails in CI." (debugging-adjacent; should trigger) +2. "The build is broken." (may be a config issue, not a code bug; should trigger) +3. "Performance is slow on the dashboard page." (performance investigation, not a bug; should not trigger — suggest `/brainstorm` instead) + +## Important + +- This skill is interactive. It requires the user to reproduce bugs and report results between phases. +- Keep debug instrumentation minimal and tagged. Every addition gets `WINGSPAN-DEBUG`. +- The goal is to narrow down the root cause systematically, not to guess-and-check. +- Remove all debug instrumentation before finishing — the only lasting change should be the fix itself. diff --git a/skills/debug/references/validate-and-fix.md b/skills/debug/references/validate-and-fix.md new file mode 120000 index 0000000..173ee62 --- /dev/null +++ b/skills/debug/references/validate-and-fix.md @@ -0,0 +1 @@ +../../shared/references/validate-and-fix.md \ No newline at end of file From 054d30a66bd3116a29991f526e10bc36f780725b Mon Sep 17 00:00:00 2001 From: Dominik Simonik Date: Thu, 16 Apr 2026 14:20:53 +0200 Subject: [PATCH 2/4] fix: ensure debug skill defers to companion plugin logging conventions Adds explicit guidance for companion plugins to override default instrumentation behavior, keeping the skill tech-agnostic while enabling stack-specific debug support (e.g., Flutter plugin). Co-Authored-By: Claude Opus 4.6 (1M context) --- skills/debug/SKILL.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/skills/debug/SKILL.md b/skills/debug/SKILL.md index c3e7a68..24cca40 100644 --- a/skills/debug/SKILL.md +++ b/skills/debug/SKILL.md @@ -79,7 +79,8 @@ Add targeted debug instrumentation to test each hypothesis. Follow these rules: - **Label output by hypothesis**: each log statement must identify which hypothesis it tests (e.g., `[H1]`, `[H2]`). - **Log values, not just markers**: capture the actual state (variable values, conditions, return values) that confirms or rules out each hypothesis. - **Minimize invasiveness**: add logging only — do not change control flow, add dependencies, or modify behavior. -- **Keep it simple**: use the project's existing logging mechanism or basic print/console output. Do not introduce new dependencies. +- **Keep it simple**: use the project's existing logging mechanism or standard output. Do not introduce new dependencies. +- **Prefer companion plugin guidance**: if a companion plugin provides debug or logging conventions for the project's stack, follow those over generic defaults. If the project uses a build or compilation step, run it to confirm the instrumentation compiles. Fix any build failures before proceeding. From 7282c8793608eb1e46f8c9bdfaad0456b33eec93 Mon Sep 17 00:00:00 2001 From: Dominik Simonik Date: Thu, 16 Apr 2026 14:41:30 +0200 Subject: [PATCH 3/4] fix: resolve markdown lint errors in debug skill Co-Authored-By: Claude Opus 4.6 (1M context) --- skills/debug/SKILL.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/skills/debug/SKILL.md b/skills/debug/SKILL.md index 24cca40..baf2909 100644 --- a/skills/debug/SKILL.md +++ b/skills/debug/SKILL.md @@ -52,7 +52,7 @@ Based on the code exploration and any user hypotheses, formulate **2-5 concrete Present the hypotheses to the user: -``` +```text ## Hypotheses - **H1**: [description] — test by checking [what to observe] @@ -214,6 +214,7 @@ Root cause: <1-2 sentence explanation> ## Evaluation queries ### Should trigger + 1. "Debug this crash — the app dies when I tap the submit button." 2. "What's causing the test failure in the auth module?" 3. "Help me find the bug — users report stale data after refresh." @@ -221,6 +222,7 @@ Root cause: <1-2 sentence explanation> 5. "Why is this broken? It worked yesterday before the deploy." ### Should NOT trigger + 1. "Add a dark mode toggle to the settings screen." 2. "Review this PR before I merge it." 3. "Write unit tests for the checkout flow." @@ -228,6 +230,7 @@ Root cause: <1-2 sentence explanation> 5. "Create a plan for the new onboarding feature." ### Edge cases + 1. "This test is flaky — passes locally, fails in CI." (debugging-adjacent; should trigger) 2. "The build is broken." (may be a config issue, not a code bug; should trigger) 3. "Performance is slow on the dashboard page." (performance investigation, not a bug; should not trigger — suggest `/brainstorm` instead) From 4989bda3e544644d3c114369a0aab1bfd8209f63 Mon Sep 17 00:00:00 2001 From: Dominik Simonik Date: Mon, 25 May 2026 12:26:00 +0200 Subject: [PATCH 4/4] fix(debug): align frontmatter and refine cleanup grep - Split when_to_use out of description to match #187 standards. - Use /create-branch (skill) instead of @create-branch (agent sigil). - Switch cleanup search to git grep so it skips .git and respects .gitignore. Co-Authored-By: Claude Opus 4.7 (1M context) --- skills/debug/SKILL.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/skills/debug/SKILL.md b/skills/debug/SKILL.md index baf2909..f73d013 100644 --- a/skills/debug/SKILL.md +++ b/skills/debug/SKILL.md @@ -1,7 +1,8 @@ --- name: debug user-invocable: true -description: Finds and fixes bugs through structured hypothesis testing and code instrumentation. Use when user says "debug", "debug this", "find the bug", "what's causing this", "troubleshoot", or "why is this broken". +description: Finds and fixes bugs through structured hypothesis testing and code instrumentation. +when_to_use: Use when user says "debug", "debug this", "find the bug", "what's causing this", "troubleshoot", or "why is this broken". argument-hint: bug description, error message, or reproduction steps effort: high compatibility: Designed for Claude Code (or similar products with agent support) @@ -69,7 +70,7 @@ Use **AskUserQuestion**: "Do these hypotheses look right? I'll instrument the co Before writing any files, ensure the session is on a working branch: -- Call @create-branch to check and optionally create a working branch or worktree. +- Call /create-branch to check and optionally create a working branch or worktree. ## Phase 2 — Instrument @@ -140,7 +141,7 @@ Write the minimal change that addresses the confirmed root cause: Remove ALL debug instrumentation added in Phase 2. Search for `WINGSPAN-DEBUG` across the codebase and remove every tagged line or block. Verify none remain: ```bash -grep -r "WINGSPAN-DEBUG" . +git grep -n "WINGSPAN-DEBUG" ``` If any remain, remove them. The codebase must contain only the fix — no debug instrumentation. @@ -174,7 +175,7 @@ Reached when the user stops debugging without a fix. Remove ALL debug instrumentation. Search for `WINGSPAN-DEBUG` across the codebase: ```bash -grep -r "WINGSPAN-DEBUG" . +git grep -n "WINGSPAN-DEBUG" ``` Remove every match. Confirm to the user that all debug instrumentation has been removed and the codebase is clean.