Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ See the [full setup guide](docs/setup.md) for authentication, environment variab

## Architecture

A CEO agent orchestrates eight specialists — Researcher, Strategist, Builder, Reviewer, Evaluator, Archivist, Distiller, and Failure Analyst — each running as an independent [Claude Code](https://docs.anthropic.com/en/docs/claude-code) subprocess.
A CEO agent orchestrates nine specialists — Researcher, Strategist, Builder, Reviewer, Evaluator, Archivist, Distiller, Failure Analyst, and Scrum Master — each running as an independent [Claude Code](https://docs.anthropic.com/en/docs/claude-code) subprocess.

![Architecture](docs/diagrams/architecture.svg)

Expand Down Expand Up @@ -373,12 +373,13 @@ factory dashboard # Live web dashboard on :8420
factory detect <path> # Print project state
factory discover <path> # Introspect + generate eval profile
factory export <path> # Full project snapshot as JSON
factory checkpoint <path> # Save CEO state for crash recovery
factory resume <path> # Resume from checkpoint
factory log <path> <event> [--data JSON] # Record milestone to event log
```

See `factory --help` for the complete list.

**Crash-resilient resume:** The factory supports automatic recovery from interrupted runs. The Scrum Master agent performs a standup at the start of each cycle, reading the event log and `.factory/` state to reconstruct context and resume from where the previous run left off.

---

## Plugin Agents
Expand All @@ -395,7 +396,7 @@ claude --agent factory-researcher "study the auth system"
claude --agent factory-builder "add dark mode support"
```

Available agents: `factory-researcher`, `factory-strategist`, `factory-builder`, `factory-reviewer`, `factory-evaluator`, `factory-archivist`, `factory-distiller`, `factory-ceo`, `factory-failure_analyst`.
Available agents: `factory-researcher`, `factory-strategist`, `factory-builder`, `factory-reviewer`, `factory-evaluator`, `factory-archivist`, `factory-distiller`, `factory-ceo`, `factory-failure_analyst`, `factory-scrummaster`.

Agent metadata (model, tools, descriptions) is defined in `factory/agents/agents.yml`. Source prompts live in `factory/agents/prompts/`. A CI workflow auto-generates a `plugin` branch with ready-to-use agent files on every push to main.

Expand Down
7 changes: 4 additions & 3 deletions docs/ace.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ across all projects Generate Merge & Auto-append
Analyzes experiment outcomes across all factory-managed projects (discovered via the global registry at `~/.factory/registry.json`, with directory scanning as fallback):
- Loads data from performance reports (`.factory/performance_report.json`) with TSV fallback
- Computes category success rates (which types of changes get kept vs reverted)
- Generates candidate playbook bullets for all 7 agent roles from experiment outcomes, CEO verdict patterns, and observation coverage
- Generates candidate playbook bullets for all agent roles from experiment outcomes, CEO verdict patterns, and observation coverage
- Each bullet is a behavioral rule: DO (reinforced pattern) or DON'T (anti-pattern)

### 2. Curate (`factory/ace/curator.py`)
Expand Down Expand Up @@ -69,7 +69,7 @@ factory ace ~/my-project
factory ceo ~/my-project --mode meta
```

Meta mode runs the full improvement loop, then reflects on the outcomes to evolve all 7 agent playbooks. See [Self-Improvement Loop](self-improvement.md) for the full picture — including cross-project learning, CEO self-evaluation, and how the pieces fit together.
Meta mode runs the full improvement loop, then reflects on the outcomes to evolve all agent playbooks. See [Self-Improvement Loop](self-improvement.md) for the full picture — including cross-project learning, CEO self-evaluation, and how the pieces fit together.

## When to Run

Expand All @@ -81,7 +81,7 @@ ACE produces meaningful playbook updates only when there is enough experiment da

## What Gets Evolved

All 7 agent roles have playbooks:
Agent roles with playbooks:

| Role | What ACE learns |
|------|----------------|
Expand All @@ -92,6 +92,7 @@ All 7 agent roles have playbooks:
| Reviewer | What to focus on in code review, false positive patterns |
| Evaluator | Score interpretation, when to flag anomalies |
| Archivist | What to record, archive organization patterns |
| Scrum Master | Standup patterns, sprint resume effectiveness |

## Design Principles

Expand Down
7 changes: 4 additions & 3 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,13 @@ A dedicated Claude Code agent that owns the full workflow. Spawned via `factory
- Spawns specialist agents as subprocesses
- Makes keep/revert decisions based on eval scores
- Ensures mandatory archival after every cycle
- Maintains a checkpoint for crash-resilient resume
- Runs Scrum Master standup for crash-resilient resume via event log

Prompt: `factory/agents/prompts/ceo.md`

### Layer 3: Specialist Agents

Eight specialist Claude Code subprocesses, each with a narrow responsibility:
Nine specialist Claude Code subprocesses, each with a narrow responsibility:

| Agent | Role | Invoked via |
|-------|------|------------|
Expand All @@ -36,6 +36,7 @@ Eight specialist Claude Code subprocesses, each with a narrow responsibility:
| **Archivist** | Write learnings to `.factory/archive/`, update performance reports | `factory agent archivist --task "..."` |
| **Distiller** | Synthesize research + raw idea into a buildable project spec | `factory agent distiller --task "..."` |
| **Failure Analyst** | Classify run failures by root cause (research mode only) | `factory agent failure_analyst --task "..."` |
| **Scrum Master** | Standup: read event log + project state, produce sprint status report for CEO | `factory agent scrum_master --task "..."` |

Agent prompts are resolved via two-tier lookup in `factory/agents/runner.py`:
1. Project-specific override: `<project>/.factory/agents/<role>.md`
Expand Down Expand Up @@ -182,7 +183,7 @@ Stuck detection activates after 3+ consecutive same-category reverts, forcing ca
| `factory/strategy.py` | FEEC priority heuristic |
| `factory/study.py` | Interaction log analysis |
| `factory/insights.py` | Cross-project pattern analysis |
| `factory/checkpoint.py` | CEO checkpoint save/load |
| `factory/checkpoint_hook.py` | Sprint standup state reconstruction |
| `factory/analysis.py` | Experiment comparison (diff, explain) |
| `factory/registry.py` | Global project registry (`~/.factory/registry.json`) |
| `factory/report.py` | Performance report generation and loading |
Expand Down
6 changes: 3 additions & 3 deletions docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ factory ceo ~/remote-factory --focus "shell completions for the factory CLI"
| **Notifications (Telegram, Slack, etc.)** | Real-time push notifications on keep/revert decisions, cycle completions, and score regressions. A basic `TelegramNotifier` skeleton exists in `factory/notify/telegram.py` but isn't wired into the CEO loop — needs proper integration and multi-provider support |
| **Parallel experiments** | Run multiple hypotheses concurrently on separate branches, evaluate in parallel |
| **GitHub Actions integration** | Run the factory as a GitHub Action on push/PR events |
| **Custom agent roles** | Allow users to define new specialist agents beyond the 7 built-in roles |
| **Custom agent roles** | Allow users to define new specialist agents beyond the built-in roles |
| **Dashboard auth** | Add basic authentication to the live dashboard for shared deployments |

### Hard / Research
Expand All @@ -104,11 +104,11 @@ factory/
├── strategy.py # FEEC priority heuristic
├── study.py # Code analysis + observations
├── insights.py # Cross-project patterns
├── checkpoint.py # CEO state save/load
├── checkpoint_hook.py # Sprint standup state reconstruction
├── analysis.py # Experiment comparison
├── agents/
│ ├── runner.py # Agent subprocess spawner
│ ├── prompts/ # Agent role prompts (7 roles)
│ ├── prompts/ # Agent role prompts (10 roles)
│ └── playbooks/ # ACE-evolved playbooks
├── registry.py # Global project registry
├── report.py # Performance report generation
Expand Down
4 changes: 4 additions & 0 deletions docs/diagrams/architecture.d2
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,10 @@ agents: {
label: "Failure Analyst\nClassify"
style: { fill: "#80CBC4"; stroke: "#00695C"; font-color: "#0D1B2A"; font-size: 13; border-radius: 6 }
}
scrummaster: {
label: "Scrum Master\nStandup"
style: { fill: "#A5D6A7"; stroke: "#2E7D32"; font-color: "#0D1B2A"; font-size: 13; border-radius: 6 }
}
}

stores: {
Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ graph LR
style G fill:#e53935,color:#fff,stroke:#c62828
```

A CEO agent orchestrates eight specialists — Researcher, Strategist, Builder, Reviewer, Evaluator, Archivist, Distiller, and Failure Analyst — each running as an independent [Claude Code](https://docs.anthropic.com/en/docs/claude-code) subprocess. The Researcher searches the web and reads prior knowledge from the archive. The Strategist generates ranked hypotheses. The Builder implements one on an experiment branch. The Evaluator scores before and after. The CEO decides keep or revert. The Archivist records everything to `.factory/archive/` and regenerates performance reports for cross-project learning. In interactive mode, the Distiller synthesizes research into a buildable spec through user feedback. In research mode, the Failure Analyst classifies run failures to guide targeted hypothesis generation.
A CEO agent orchestrates nine specialists — Researcher, Strategist, Builder, Reviewer, Evaluator, Archivist, Distiller, Failure Analyst, and Scrum Master — each running as an independent [Claude Code](https://docs.anthropic.com/en/docs/claude-code) subprocess. The Researcher searches the web and reads prior knowledge from the archive. The Strategist generates ranked hypotheses. The Builder implements one on an experiment branch. The Evaluator scores before and after. The CEO decides keep or revert. The Archivist records everything to `.factory/archive/` and regenerates performance reports for cross-project learning. In interactive mode, the Distiller synthesizes research into a buildable spec through user feedback. In research mode, the Failure Analyst classifies run failures to guide targeted hypothesis generation.

## Workflows

Expand Down
4 changes: 2 additions & 2 deletions docs/self-improvement.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ ACE analyzes this across **all factory-managed projects** (discovered via the gl

### What ACE Produces

For each of the 7 agent roles, ACE generates DO and DON'T rules backed by empirical evidence:
For each of the agent roles, ACE generates DO and DON'T rules backed by empirical evidence:

```markdown
### DO
Expand Down Expand Up @@ -266,7 +266,7 @@ This is the factory eating its own dogfood — the same process it uses on targe
After the improve cycle, the CEO runs ACE across all managed projects:

1. **Update counters**: Load all experiment records, update `helpful`/`harmful` counters on existing playbook bullets
2. **Reflect**: Analyze cross-project experiment data, generate candidate bullets for all 7 roles
2. **Reflect**: Analyze cross-project experiment data, generate candidate bullets for all agent roles
3. **Curate**: Merge candidates with existing playbooks, deduplicate (75% similarity threshold), prune net-negative rules, cap at 15 items per role

```bash
Expand Down
Loading