diff --git a/README.md b/README.md index 045c016..7a06ece 100644 --- a/README.md +++ b/README.md @@ -42,7 +42,7 @@ See the [full setup guide](docs/setup.md) for authentication, environment variab ## Architecture -A CEO agent orchestrates eight specialists — Researcher, Strategist, Builder, Reviewer, Evaluator, Archivist, Distiller, and Failure Analyst — each running as an independent [Claude Code](https://docs.anthropic.com/en/docs/claude-code) subprocess. +A CEO agent orchestrates nine specialists — Researcher, Strategist, Builder, Reviewer, Evaluator, Archivist, Distiller, Failure Analyst, and Scrum Master — each running as an independent [Claude Code](https://docs.anthropic.com/en/docs/claude-code) subprocess. ![Architecture](docs/diagrams/architecture.svg) @@ -373,12 +373,13 @@ factory dashboard # Live web dashboard on :8420 factory detect # Print project state factory discover # Introspect + generate eval profile factory export # Full project snapshot as JSON -factory checkpoint # Save CEO state for crash recovery -factory resume # Resume from checkpoint +factory log [--data JSON] # Record milestone to event log ``` See `factory --help` for the complete list. +**Crash-resilient resume:** The factory supports automatic recovery from interrupted runs. The Scrum Master agent performs a standup at the start of each cycle, reading the event log and `.factory/` state to reconstruct context and resume from where the previous run left off. + --- ## Plugin Agents @@ -395,7 +396,7 @@ claude --agent factory-researcher "study the auth system" claude --agent factory-builder "add dark mode support" ``` -Available agents: `factory-researcher`, `factory-strategist`, `factory-builder`, `factory-reviewer`, `factory-evaluator`, `factory-archivist`, `factory-distiller`, `factory-ceo`, `factory-failure_analyst`. +Available agents: `factory-researcher`, `factory-strategist`, `factory-builder`, `factory-reviewer`, `factory-evaluator`, `factory-archivist`, `factory-distiller`, `factory-ceo`, `factory-failure_analyst`, `factory-scrummaster`. Agent metadata (model, tools, descriptions) is defined in `factory/agents/agents.yml`. Source prompts live in `factory/agents/prompts/`. A CI workflow auto-generates a `plugin` branch with ready-to-use agent files on every push to main. diff --git a/docs/ace.md b/docs/ace.md index 2938f36..c78cef3 100644 --- a/docs/ace.md +++ b/docs/ace.md @@ -17,7 +17,7 @@ across all projects Generate Merge & Auto-append Analyzes experiment outcomes across all factory-managed projects (discovered via the global registry at `~/.factory/registry.json`, with directory scanning as fallback): - Loads data from performance reports (`.factory/performance_report.json`) with TSV fallback - Computes category success rates (which types of changes get kept vs reverted) -- Generates candidate playbook bullets for all 7 agent roles from experiment outcomes, CEO verdict patterns, and observation coverage +- Generates candidate playbook bullets for all agent roles from experiment outcomes, CEO verdict patterns, and observation coverage - Each bullet is a behavioral rule: DO (reinforced pattern) or DON'T (anti-pattern) ### 2. Curate (`factory/ace/curator.py`) @@ -69,7 +69,7 @@ factory ace ~/my-project factory ceo ~/my-project --mode meta ``` -Meta mode runs the full improvement loop, then reflects on the outcomes to evolve all 7 agent playbooks. See [Self-Improvement Loop](self-improvement.md) for the full picture — including cross-project learning, CEO self-evaluation, and how the pieces fit together. +Meta mode runs the full improvement loop, then reflects on the outcomes to evolve all agent playbooks. See [Self-Improvement Loop](self-improvement.md) for the full picture — including cross-project learning, CEO self-evaluation, and how the pieces fit together. ## When to Run @@ -81,7 +81,7 @@ ACE produces meaningful playbook updates only when there is enough experiment da ## What Gets Evolved -All 7 agent roles have playbooks: +Agent roles with playbooks: | Role | What ACE learns | |------|----------------| @@ -92,6 +92,7 @@ All 7 agent roles have playbooks: | Reviewer | What to focus on in code review, false positive patterns | | Evaluator | Score interpretation, when to flag anomalies | | Archivist | What to record, archive organization patterns | +| Scrum Master | Standup patterns, sprint resume effectiveness | ## Design Principles diff --git a/docs/architecture.md b/docs/architecture.md index ab229e1..172689e 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -18,13 +18,13 @@ A dedicated Claude Code agent that owns the full workflow. Spawned via `factory - Spawns specialist agents as subprocesses - Makes keep/revert decisions based on eval scores - Ensures mandatory archival after every cycle -- Maintains a checkpoint for crash-resilient resume +- Runs Scrum Master standup for crash-resilient resume via event log Prompt: `factory/agents/prompts/ceo.md` ### Layer 3: Specialist Agents -Eight specialist Claude Code subprocesses, each with a narrow responsibility: +Nine specialist Claude Code subprocesses, each with a narrow responsibility: | Agent | Role | Invoked via | |-------|------|------------| @@ -36,6 +36,7 @@ Eight specialist Claude Code subprocesses, each with a narrow responsibility: | **Archivist** | Write learnings to `.factory/archive/`, update performance reports | `factory agent archivist --task "..."` | | **Distiller** | Synthesize research + raw idea into a buildable project spec | `factory agent distiller --task "..."` | | **Failure Analyst** | Classify run failures by root cause (research mode only) | `factory agent failure_analyst --task "..."` | +| **Scrum Master** | Standup: read event log + project state, produce sprint status report for CEO | `factory agent scrum_master --task "..."` | Agent prompts are resolved via two-tier lookup in `factory/agents/runner.py`: 1. Project-specific override: `/.factory/agents/.md` @@ -182,7 +183,7 @@ Stuck detection activates after 3+ consecutive same-category reverts, forcing ca | `factory/strategy.py` | FEEC priority heuristic | | `factory/study.py` | Interaction log analysis | | `factory/insights.py` | Cross-project pattern analysis | -| `factory/checkpoint.py` | CEO checkpoint save/load | +| `factory/checkpoint_hook.py` | Sprint standup state reconstruction | | `factory/analysis.py` | Experiment comparison (diff, explain) | | `factory/registry.py` | Global project registry (`~/.factory/registry.json`) | | `factory/report.py` | Performance report generation and loading | diff --git a/docs/contributing.md b/docs/contributing.md index e64a1b3..ab195a5 100644 --- a/docs/contributing.md +++ b/docs/contributing.md @@ -77,7 +77,7 @@ factory ceo ~/remote-factory --focus "shell completions for the factory CLI" | **Notifications (Telegram, Slack, etc.)** | Real-time push notifications on keep/revert decisions, cycle completions, and score regressions. A basic `TelegramNotifier` skeleton exists in `factory/notify/telegram.py` but isn't wired into the CEO loop — needs proper integration and multi-provider support | | **Parallel experiments** | Run multiple hypotheses concurrently on separate branches, evaluate in parallel | | **GitHub Actions integration** | Run the factory as a GitHub Action on push/PR events | -| **Custom agent roles** | Allow users to define new specialist agents beyond the 7 built-in roles | +| **Custom agent roles** | Allow users to define new specialist agents beyond the built-in roles | | **Dashboard auth** | Add basic authentication to the live dashboard for shared deployments | ### Hard / Research @@ -104,11 +104,11 @@ factory/ ├── strategy.py # FEEC priority heuristic ├── study.py # Code analysis + observations ├── insights.py # Cross-project patterns -├── checkpoint.py # CEO state save/load +├── checkpoint_hook.py # Sprint standup state reconstruction ├── analysis.py # Experiment comparison ├── agents/ │ ├── runner.py # Agent subprocess spawner -│ ├── prompts/ # Agent role prompts (7 roles) +│ ├── prompts/ # Agent role prompts (10 roles) │ └── playbooks/ # ACE-evolved playbooks ├── registry.py # Global project registry ├── report.py # Performance report generation diff --git a/docs/diagrams/architecture.d2 b/docs/diagrams/architecture.d2 index 0487fdf..14313ec 100644 --- a/docs/diagrams/architecture.d2 +++ b/docs/diagrams/architecture.d2 @@ -130,6 +130,10 @@ agents: { label: "Failure Analyst\nClassify" style: { fill: "#80CBC4"; stroke: "#00695C"; font-color: "#0D1B2A"; font-size: 13; border-radius: 6 } } + scrummaster: { + label: "Scrum Master\nStandup" + style: { fill: "#A5D6A7"; stroke: "#2E7D32"; font-color: "#0D1B2A"; font-size: 13; border-radius: 6 } + } } stores: { diff --git a/docs/index.md b/docs/index.md index e55a1a2..da82137 100644 --- a/docs/index.md +++ b/docs/index.md @@ -46,7 +46,7 @@ graph LR style G fill:#e53935,color:#fff,stroke:#c62828 ``` -A CEO agent orchestrates eight specialists — Researcher, Strategist, Builder, Reviewer, Evaluator, Archivist, Distiller, and Failure Analyst — each running as an independent [Claude Code](https://docs.anthropic.com/en/docs/claude-code) subprocess. The Researcher searches the web and reads prior knowledge from the archive. The Strategist generates ranked hypotheses. The Builder implements one on an experiment branch. The Evaluator scores before and after. The CEO decides keep or revert. The Archivist records everything to `.factory/archive/` and regenerates performance reports for cross-project learning. In interactive mode, the Distiller synthesizes research into a buildable spec through user feedback. In research mode, the Failure Analyst classifies run failures to guide targeted hypothesis generation. +A CEO agent orchestrates nine specialists — Researcher, Strategist, Builder, Reviewer, Evaluator, Archivist, Distiller, Failure Analyst, and Scrum Master — each running as an independent [Claude Code](https://docs.anthropic.com/en/docs/claude-code) subprocess. The Researcher searches the web and reads prior knowledge from the archive. The Strategist generates ranked hypotheses. The Builder implements one on an experiment branch. The Evaluator scores before and after. The CEO decides keep or revert. The Archivist records everything to `.factory/archive/` and regenerates performance reports for cross-project learning. In interactive mode, the Distiller synthesizes research into a buildable spec through user feedback. In research mode, the Failure Analyst classifies run failures to guide targeted hypothesis generation. ## Workflows diff --git a/docs/self-improvement.md b/docs/self-improvement.md index 6f988e1..38ee103 100644 --- a/docs/self-improvement.md +++ b/docs/self-improvement.md @@ -121,7 +121,7 @@ ACE analyzes this across **all factory-managed projects** (discovered via the gl ### What ACE Produces -For each of the 7 agent roles, ACE generates DO and DON'T rules backed by empirical evidence: +For each of the agent roles, ACE generates DO and DON'T rules backed by empirical evidence: ```markdown ### DO @@ -266,7 +266,7 @@ This is the factory eating its own dogfood — the same process it uses on targe After the improve cycle, the CEO runs ACE across all managed projects: 1. **Update counters**: Load all experiment records, update `helpful`/`harmful` counters on existing playbook bullets -2. **Reflect**: Analyze cross-project experiment data, generate candidate bullets for all 7 roles +2. **Reflect**: Analyze cross-project experiment data, generate candidate bullets for all agent roles 3. **Curate**: Merge candidates with existing playbooks, deduplicate (75% similarity threshold), prune net-negative rules, cap at 15 items per role ```bash