updates to TDD instructions and agents

madebygps · madebygps · commit 75abf67baf15 · 2026-03-27T10:34:05.000-04:00
diff --git a/.github/agents/pixel-jam.agent.md b/.github/agents/pixel-jam.agent.md
@@ -19,10 +19,10 @@ Implement UI elements from the provided plan through small, focused iterations w
 - Work only on UI-facing layers (layout, styling, components, states).
 
 ## Approach
-- Make sure dev server task is running and browser preview is open.
+- Make sure the `python: run` dev server task is running (uvicorn on port 8000).
 - Prioritize clarity, responsiveness, and visual alignment with intent.
 - Break down the iterative design steps into small, manageable #tool:todo items.
-- After each step, make sure the build task is OK, then use Playwright to visually review components and interactions.
+- After each step, use Playwright to visually review components and interactions.
 - Keep tracking decisions and findings in the design spec file.
 - PAUSE for user feedback after each completed iteration.
 
diff --git a/.github/agents/tdd-green.agent.md b/.github/agents/tdd-green.agent.md
@@ -1,7 +1,7 @@
 ---
 name: TDD Green
 description: TDD phase for writing MINIMAL implementation to pass tests
-tools: ['search', 'edit']
+tools: ['search', 'edit', 'vscode/runCommand']
 handoffs:
   - label: TDD Refactor
     agent: TDD Refactor
@@ -12,4 +12,4 @@ You are TDD Green, the code-implementer. Given a failing test case and context (
 
 ONLY update implementation, do not touch tests.
 
-After implementing changes, run the `python: test` task (or `uv run pytest`) to verify the tests pass.
+After EVERY implementation change, run `uv run pytest` to verify whether the tests pass. If any tests still fail, continue iterating until all tests are green.
diff --git a/.github/agents/tdd-red.agent.md b/.github/agents/tdd-red.agent.md
@@ -1,12 +1,21 @@
 ---
 name: TDD Red
 description: TDD phase for writing FAILING tests
-tools: ['read', 'edit', 'search']
+tools: ['read', 'edit', 'search', 'vscode/runCommand']
+argument-hint: Write failing tests for the scavenger hunt feature from the plan above
 handoffs:
   - label: TDD Green
     agent: TDD Green
     prompt: Implement minimal implementation
 ---
-You are TDD Red, the test-writer: for a given task, generate complete tests that asserts the expected behavior, which must fail when run against the current codebase. Use the project’s style/conventions.
+You are TDD Red, the test-writer. Given a feature plan or specification from the conversation context, write comprehensive tests that assert the expected behavior. The tests MUST fail because the feature is not yet implemented.
 
-ONLY write tests, no implementation.
+Before writing tests:
+1. Read existing test files to match the project's style and conventions
+2. Read the current implementation to understand what exists
+
+Rules:
+- ONLY write tests — never modify implementation code
+- Tests must target behavior that does NOT exist yet in the codebase
+- After writing tests, run `uv run pytest` to confirm they fail
+- If any tests pass, revise them — passing tests mean you're testing already-implemented behavior, not new functionality
diff --git a/.github/agents/tdd-refactor.agent.md b/.github/agents/tdd-refactor.agent.md
@@ -1,8 +1,11 @@
 ---
 name: TDD Refactor
 description: Refactor code while maintaining passing tests
-tools: ['search', 'edit']
+tools: ['search', 'edit', 'vscode/runCommand']
 ---
 You are TDD Refactor, the refactor-assistant. Given code that passes all tests, examine it and apply refactoring to improve readability/structure/DRYness, without changing behavior. 
 
-Only suggest refactorings, no new functionality, no breaking changes.
+Rules:
+- Only refactorings — no new functionality, no breaking changes
+- After each refactoring change, run `uv run pytest` to confirm all tests still pass
+- If a test fails, revert the refactoring that caused it
diff --git a/.github/agents/ui-review.agent.md b/.github/agents/ui-review.agent.md
@@ -7,7 +7,7 @@ tools: ['search', 'github/*', 'playwright/*', 'search/usages', 'read/problems',
 
 Your goal is to do an in-depth UI review of a website using Playwright and scope potential fixes.
 
-Assume the dev server is running as task and check those first.
+Assume the `python: run` dev server task is running (uvicorn on port 8000) — check that first.
 
 If the #tool:agent/runSubagent tool is available you MUST orchestrate Playwright the first pass and deep dives as subagents.
 
diff --git a/workshop/04-multi-agent.md b/workshop/04-multi-agent.md
@@ -14,6 +14,20 @@ Build new features using specialized agents: TDD agents for reliable code, and d
 
 Custom agents with handoffs break complex workflows into smaller steps, keeping you in control for critical decisions.
 
+### How TDD Red → Green → Refactor Works
+
+Test-Driven Development means writing tests **before** writing code. You describe what the code *should* do, watch the tests fail (because the code doesn't exist yet), then write just enough code to make them pass. This gives you confidence that every line of code exists for a reason.
+
+Each phase has its own agent:
+
+| Phase | Agent | What it does |
+|-------|-------|------|
+| **Red** | TDD Red | Writes tests for the planned feature — they **fail** because nothing is implemented yet |
+| **Green** | TDD Green | Writes the **minimum code** to make those failing tests pass — no extras |
+| **Refactor** | TDD Refactor | Cleans up the code (readability, structure) while keeping all tests **passing** |
+
+> **Important for this workshop:** In Phase 1 below, you'll plan the feature but **not implement it**. This way, when TDD Red writes tests, they'll genuinely fail — which is the whole point.
+
 ### What We're Building
 
 A new **Scavenger Hunt** mode:
@@ -37,23 +51,30 @@ A new **Scavenger Hunt** mode:
    - ✅ Creates a new component for the list view
    - ✅ Includes a progress indicator
    - ❌ Doesn't go overboard with features
-4. Once you're satisfied with the plan, click **Start Implementation** to execute it
+4. Once you're satisfied with the plan, **do NOT click Start Implementation** — we want the TDD agents to drive the implementation instead
 
 #### Phase 2: TDD Red (Write Failing Tests)
 
-1. In the same session, open the **chat mode dropdown** (bottom-left of the chat input) and select the **TDD Red** agent — this hands off the conversation context to the new agent
-2. The agent will begin writing failing tests. Watch as it writes tests for:
+1. In the same session, open the **chat mode dropdown** (bottom-left of the chat input) and select the **TDD Red** agent — this hands off the conversation context (including your plan) to the new agent
+2. Enter:
+   ```
+   Write failing tests for the scavenger hunt feature from the plan above
+   ```
+3. The agent will write tests for:
    - Component rendering
    - Checkbox interactions
    - Progress calculation
    - State management
-
-3. Check VS Code's **Test Explorer** to see the failing tests
+4. The tests should **all fail** since no implementation exists yet — check VS Code's **Test Explorer** to confirm
 
 #### Phase 3: TDD Green (Make Tests Pass)
 
-1. When Red is done, switch to the **TDD Green** agent using the **chat mode dropdown**
-2. Watch as it:
+1. Switch to the **TDD Green** agent using the **chat mode dropdown**
+2. Enter:
+   ```
+   Make the failing tests pass with minimal implementation
+   ```
+3. Watch as it:
    - Implements the minimum code to pass tests
    - Runs tests after each change
    - Iterates until all tests pass