Skip to content

Commit 4fe7212

Browse files
fix(flywheel): phase-1 repairs for broken flywheel loop
* feat(pipeline): autonomous pipeline v0.1.1-v0.1.3 v0.1.1: Production-quality stage prompts - Add getRepoContext() helper (file tree, language, deps detection) - ResearchPlan: architect-grade prompt with repo awareness - Decompose: wave ordering by import deps, self-contained tasks - Verify: actionable errors with file paths and line numbers - Verify→Execute feedback: generate fix tasks from errors v0.1.2: Inter-task coherence + observability - allowLongPrompt option in scheduler.submit() - Wave tasks get plan context ("read .cc-pipeline/plan.md") - verifyResults field on PipelineRun for dashboard - markStaleRunsFailed() crash recovery on startup v0.1.3: API polish + test coverage - Pipeline config overrides from POST /api/pipeline - Pipeline endpoints added to /api/docs - 3 new tests: verify-fix grouping, crash recovery, plan context - Per-run PipelineConfig (stage methods use cfg param) 329/329 tests pass, TSC clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(pipeline): memory leak, stale ctx, redundant map lookups - Add _runConfigs.delete() to drive().catch() error handler - Refresh RepoContext before doVerify() (execute stage modifies repo) - Cache cfg() locally in doResearchPlan and doExecute inner loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(pipeline): error context retry, wave validation, budget-based loop (v0.1.4) P0-a: Retry with error context — inject previous error into prompt on retry instead of clearing it. Both auto-retry (executeAndRelease) and manual requeue() now append error context (capped at 500 chars). P0-b: Wave file conflict validation — extractFilePaths() extracts file paths from task prompts, validateWaves() detects intra-wave file conflicts and moves conflicting tasks to subsequent waves. Prevents parallel agents from editing the same file. P1-a: Budget-based retry loop with dead-loop detection — verify stage now considers three stop conditions: budget exhausted, same errors repeated (dead loop), or max iterations reached. Added totalBudget field to PipelineConfig (default $50). Tests: 341 pass (was 329), TSC clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * task(f816b860e3c946ac): feat(classifier): add classifyTask pure function * task(b24b8f37e8d64ede): feat(types): add optional model field to Task * task(654d29040498495e): feat(agent-runner): pickFallbackAgent + per-task model override * feat(scheduler): task classifier + model fallback on retry (v0.1.5) Task Classifier: classifyTask() pure function auto-assigns model/timeout/ budget based on prompt analysis. quick (<200 chars, ≤1 file) → haiku/120s/$1, deep (refactor/redesign/architect, 3+ files) → opus/600s/$10, standard → sonnet/300s/$5. Integrated into scheduler.submit() with caller-override priority. Model Fallback: on retry, swap agent (claude↔codex) via AgentRunner.pickFallbackAgent() for better chance of success. Pipeline note: task-classifier.ts, types.ts model field, and agent-runner.ts pickFallbackAgent were created by cc-manager's own pipeline (first successful self-hosted run). Scheduler integration, tests, and classifier bug fixes done manually. Tests: 357 pass (was 341), TSC clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * task(6f72353ece8b4c67): feat(worktree-pool): add getActiveWorkers() and rebaseOnMain() * task(6986f62d76ef4228): feat(store): persist _originalPrompt; serialize dependsOn array as JSON * task(71e20b9544914d85): test(scheduler): prompt accumulation fix, model escalation, array * feat(scheduler): prompt accumulation fix, model escalation, staged rebase, multi-dep DAG (v0.1.6) Phase 1 Critical Path: - Fix prompt accumulation: use _originalPrompt to rebuild from original + latest error - Model escalation: retryCount >= 2 upgrades to opus via modelOverride - Staged rebase: after merge, rebase all active worktrees onto new main - Dependency DAG: check ALL deps in array, not just first element - agent-runner respects task.modelOverride in both CLI and SDK paths Pipeline generated: types, store, worktree-pool, tests Manual fixes: scheduler integration, multi-dep check, prompt accumulation, rebase wiring Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(v0.1.7): empty commit detection, GPT-5.4 routing, pricing table, session resume, codex config F1: Detect empty commits after agent exits — fail task instead of silent success F2: CRITICAL commit enforcement in task prompt F3: Post-merge working directory sync (syncMainWorktree) F4: Complete pricing table with all 6 models (haiku, sonnet, opus, gpt-5.4, gpt-5.4-wide, o4-mini) F5: Capture sessionId from Claude stream-json, --resume on retry F6: --json-schema structured output for review agent F7: Codex GPT-5.4 routing for deep/integration tasks, classifier outputs agent+contextProfile F9: Codex config.toml profile management (default + wide 1M context) 372 tests pass, TSC clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add strategy, research, plans, and pipeline artifacts - docs/STRATEGY.md: competitive landscape, four pillars, borrowed patterns - docs/3-agents-reference.md: Claude CLI, SDK, Codex features + gaps - docs/research/: agent landscape, model pricing, NeurIPS findings - docs/plans/: v0.1.6, v0.1.7, v0.2 implementation plans - docs/ROADMAP.md, GAP-ANALYSIS.md, COMPETITIVE-ANALYSIS.md, etc. - .cc-pipeline/: pipeline artifacts from self-hosting runs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: fix stale versions, model names, honest flywheel status - CLAUDE.md: v0.1.0→v0.1.7, 282→372 tests, add pipeline modules, honest flywheel status (NOT WORKING) - CONFIGURATION.md: claude-opus-4-5→claude-opus-4-6 - ROADMAP.md: update current state to v0.1.7, mark completed features, honest assessment Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(flywheel): phase-1 repairs for broken flywheel loop 8 bug fixes from the full flywheel audit (B1-B8): - fix(worktree): remove hardcoded v1/ from node_modules symlink (P0-W1) - fix(worktree): release() defaults to merged:false (P0-W2) - fix(agent-runner): meta tasks skip commit instructions and build verify (P0-A1) - fix(pipeline): ensureMetaTaskSucceeded / runMetaTask helper (P0-A2) - fix(scheduler): review rejection and merge conflict set status="failed" (P0-S1) - fix(scheduler): populate mergeGate on all review/merge paths - fix(pipeline): cancel aborts running tasks, cleans all Maps (P0-S2) - fix(pipeline): per-run state isolation with Maps + scoped dirs (P0-S3) Additional quality fixes from simplify review: - cancelledTasks guard: don't overwrite "failed" with "cancelled" - N→1 DB writes: pipelineStore.save per-wave not per-task - Extract runMetaTask() to deduplicate 3x meta-task pattern - MergeGateState type + persistence in store 384 tests pass, 0 fail, TSC clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): add git config for CI in pipeline full-flow test GitHub Actions runners lack git user.name/email config, causing `git commit --allow-empty` to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 0ce807d commit 4fe7212

38 files changed

Lines changed: 8543 additions & 58 deletions

.cc-pipeline/plan.md

Lines changed: 961 additions & 0 deletions
Large diffs are not rendered by default.

.cc-pipeline/tasks.json

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
{
2+
"waves": [
3+
{
4+
"waveIndex": 0,
5+
"tasks": [
6+
"Modify src/types.ts only. (1) In the Task interface, after the line `model?: string;`, add two new optional fields: `modelOverride?: string;` and `_originalPrompt?: string;`. (2) Change the existing `dependsOn?: string;` field to `dependsOn?: string | string[];`. No other changes are needed — createTask() already handles the widened type via opts?.dependsOn assignment. Run `npx tsc --noEmit` and verify no errors. Then commit: `git add -A && git commit -m \"feat(types): add modelOverride, _originalPrompt; widen dependsOn to string|string[]\"`"
7+
]
8+
},
9+
{
10+
"waveIndex": 1,
11+
"tasks": [
12+
"Modify src/store.ts only. Apply four changes in sequence: (1) In migrate(), after the existing `review` column ALTER TABLE try/catch block, add: `try { this.db.exec(\"ALTER TABLE tasks ADD COLUMN original_prompt TEXT\"); } catch {}`. (2) In taskToParams(), append `task._originalPrompt ?? null` as the 26th element of the returned array. (3) Update all INSERT SQL strings in save(), updateBatch() insertStmt, and saveBatch() insertStmt to add `original_prompt` to the column list and a 26th `?` to VALUES. Update all UPDATE SQL strings to add `original_prompt=?` before `WHERE id=?` (the params array is already 26 elements from step 2). (4) In update() fieldMap, replace the dependsOn entry with: `dependsOn: { col: 'depends_on', serialize: (v) => v == null ? null : Array.isArray(v) ? JSON.stringify(v as unknown[]) : v as string }`, and add `_originalPrompt: { col: 'original_prompt' }`. In rowToTask(), replace the dependsOn line with: `dependsOn: (() => { const raw = row.depends_on as string|null|undefined; if (!raw) return undefined; if (raw.startsWith('[')) { try { return JSON.parse(raw) as string[]; } catch { return raw; } } return raw; })()`, and add `_originalPrompt: (row.original_prompt as string | null) ?? undefined`. Run `npx tsc --noEmit`. Commit: `git add -A && git commit -m \"feat(store): persist _originalPrompt; serialize dependsOn array as JSON\"`",
13+
"Modify src/worktree-pool.ts only. Insert two new public methods after the getWorkerStats() method, before the private git() helper: (1) `getActiveWorkers(exclude?: string): string[]` — iterate `this.workers.values()`, push `w.name` into result array when `w.busy && w.name !== exclude`, return result. (2) `async rebaseOnMain(workerName: string): Promise<boolean>` — get worker via `this.workers.get(workerName)`, return false if not found; call `const { stdout } = await this.git('rev-parse', 'main')` to get mainSha (trim it); call `await this.gitIn(w.path, 'rebase', mainSha)`; on any error, call `await this.gitIn(w.path, 'rebase', '--abort').catch(() => {})`, log a warn with `log('warn', '[pool] rebaseOnMain: conflict, aborted', { worker: workerName })`, and return false; return true on success. Use the existing `log` import and `this.git`/`this.gitIn` helpers already in the file. Run `npx tsc --noEmit`. Commit: `git add -A && git commit -m \"feat(worktree-pool): add getActiveWorkers() and rebaseOnMain()\"`"
14+
]
15+
},
16+
{
17+
"waveIndex": 2,
18+
"tasks": [
19+
"Modify src/agent-runner.ts only. Make exactly two line changes: (1) In runClaudeSDK(), find the line `model: task.model ?? this.model,` and change it to `model: task.modelOverride ?? task.model ?? this.model,`. (2) In runClaude(), find the line `\"--model\", task.model ?? this.model,` and change it to `\"--model\", task.modelOverride ?? task.model ?? this.model,`. No other changes. Run `npx tsc --noEmit` and verify no errors. Commit: `git add -A && git commit -m \"feat(agent-runner): honour task.modelOverride in runClaude and runClaudeSDK\"`"
20+
]
21+
},
22+
{
23+
"waveIndex": 3,
24+
"tasks": [
25+
"Modify src/scheduler.ts only. Apply four sub-changes then compile and commit. (1) In executeAndRelease(), in the retry block where task.prompt is modified: after `task.retryCount++`, add `if (!task._originalPrompt) { task._originalPrompt = task.prompt; }`, then replace the existing prompt-mutation line with `task.prompt = prevError ? \\`${task._originalPrompt}\\n\\n---\\n## Previous Attempt Failed (attempt ${task.retryCount})\\nError: ${errorContext}\\nFix the error above and try again.\\` : task._originalPrompt;`, then after that add `if (task.retryCount >= 2) { task.modelOverride = 'claude-opus-4-6'; }`. (2) Same fix in requeue(): before the existing prompt mutation, add `if (!task._originalPrompt) { task._originalPrompt = task.prompt; }`, replace the prompt line to rebuild from task._originalPrompt, and after `task.retryCount += 1` add `if (task.retryCount >= 2) { task.modelOverride = 'claude-opus-4-6'; }`. (3) In loop(), replace the single-ID dependency check block with array-aware logic: `const depIds = Array.isArray(task.dependsOn) ? task.dependsOn : [task.dependsOn]; let anyFailed = false, failedDepId: string|undefined, failedDepStatus: string|undefined, allSuccess = true; for (const depId of depIds) { const dep = this.tasks.get(depId) ?? this.store.get(depId) ?? undefined; if (!dep || ['failed','timeout','cancelled'].includes(dep.status)) { anyFailed = true; failedDepId = depId; failedDepStatus = dep?.status ?? 'missing'; break; } if (dep.status !== 'success') allSuccess = false; }` then fail-fast if anyFailed, re-queue if !allSuccess. (4) After `const mergeResult = await this.pool.release(...)`, add: `if (shouldMerge && mergeResult.merged) { for (const w of this.pool.getActiveWorkers(workerName)) { this.pool.rebaseOnMain(w).catch((err: unknown) => { log('warn','staged rebase failed',{worker:w,error:String(err)}); }); } }`. Also update submit() opts signature: change `dependsOn?: string` to `dependsOn?: string | string[]`. Run `npx tsc --noEmit`. Commit: `git add -A && git commit -m \"feat(scheduler): fix prompt accumulation, model escalation, array dependsOn, staged rebase\"`"
26+
]
27+
},
28+
{
29+
"waveIndex": 4,
30+
"tasks": [
31+
"Modify src/__tests__/worktree-pool.test.ts only. Append two describe blocks at end of file. Block 1 'WorktreePool.getActiveWorkers': (a) test returns [] when no workers busy — init pool, call getActiveWorkers(), assert empty; (b) test returns both acquired worker names — acquire two workers, assert getActiveWorkers().length===2 and includes both names; (c) test excludes named worker — acquire two, call getActiveWorkers(w1.name), assert length===1 and result[0]===w2.name. Block 2 'WorktreePool.rebaseOnMain': (a) test returns false for unknown name — init pool, call rebaseOnMain('nonexistent'), assert false; (b) test returns true when up-to-date — acquire worker, call rebaseOnMain(worker.name), assert true; (c) test returns true after rebasing onto new main commits — acquire w0, write a file in w0.path, git add+commit in w0's worktree; acquire w1, write a different file in w1.path, git add+commit in w1's worktree, capture HEAD sha, call `git update-ref refs/heads/main <sha>` in repoPath to advance main; then call rebaseOnMain(w0.name) and assert true. Each test uses makeTempRepo() and cleanup() in finally. Run `node --import tsx --test src/__tests__/worktree-pool.test.ts`. Commit: `git add -A && git commit -m \"test(worktree-pool): getActiveWorkers and rebaseOnMain coverage\"`",
32+
"Modify src/__tests__/scheduler.test.ts only. (1) Update makePool() to add two stubs: `getActiveWorkers: (_exclude?: string) => [] as string[]` and `rebaseOnMain: async (_name: string) => true`. (2) Append three describe blocks. 'Scheduler retry — prompt accumulation fix': (a) test that on 3 attempts (2 failures then success), each retry prompt contains exactly one '## Previous Attempt Failed' section — use regex match count; (b) test that task._originalPrompt equals the original submitted prompt string after first retry. 'Scheduler retry — model escalation': test that across 3 failed attempts (maxRetries:2), modelOverride captured at attempt index 0 and 1 is undefined, and at index 2 is 'claude-opus-4-6'. 'Scheduler dependency DAG — array dependsOn': (a) test task with dependsOn:[dep1.id,dep2.id] succeeds only after both deps complete, using completion order array to verify ordering; (b) test task fails immediately with error referencing the failed dep ID when one dep in the array fails; (c) backward-compat test: string dependsOn (not array) still works without errors. Use makeStore() and inline runner mocks with setTimeout delays. Run `node --import tsx --test src/__tests__/scheduler.test.ts` then `node --import tsx --test src/__tests__/*.test.ts`. Commit: `git add -A && git commit -m \"test(scheduler): prompt accumulation fix, model escalation, array dependsOn coverage\"`"
33+
]
34+
}
35+
],
36+
"totalTasks": 7
37+
}

AGENTS.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Repository Guidelines
2+
3+
## Project Structure & Module Organization
4+
`src/` contains the TypeScript application code. Core runtime modules include `scheduler.ts`, `agent-runner.ts`, `store.ts`, `server.ts`, and the CLI entrypoints `index.ts` and `cli.ts`. Tests live in `src/__tests__/` and follow the source module names, for example `src/__tests__/scheduler.test.ts`. The web dashboard is a single static file at `src/web/index.html`; built output goes to `dist/`. Longer-form design and API docs live in `docs/`, and `ARCHITECTURE.md` explains the dependency flow between modules. Runtime artifacts such as `.cc-manager.db` and `.worktrees/` should not be treated as source.
5+
6+
## Build, Test, and Development Commands
7+
Use Node.js 20+.
8+
9+
- `npm install`: install dependencies and enable the repo’s git hooks via `prepare`.
10+
- `npm run dev`: run the app from source with `tsx`.
11+
- `npm run build`: compile TypeScript to `dist/` and copy the web UI asset.
12+
- `npm test`: run the Node test runner against `src/__tests__/*.test.ts`.
13+
- `npx tsc --noEmit`: run the strict type check used by CI and the pre-commit hook.
14+
- `npm run start -- --repo /path/to/repo`: run the built server locally.
15+
16+
## Coding Style & Naming Conventions
17+
Follow `.editorconfig`: UTF-8, LF, and 2-space indentation. Keep source in TypeScript under `src/`; do not add plain `.js` source files there. Use Node ESM import paths with explicit `.js` extensions, for example `import { Store } from './store.js';`. Match existing file naming: kebab-case module files and `*.test.ts` for tests. Prefer explicit types and keep `strict`-mode compatibility. For the dashboard, keep `src/web/index.html` framework-free and self-contained.
18+
19+
## Testing Guidelines
20+
Add or update targeted tests in `src/__tests__/` whenever behavior changes. Keep test filenames aligned with the module under test and cover both success and failure paths for scheduler, store, server, or worktree behavior. Before opening a PR, run `npx tsc --noEmit`, `npm test`, and `npm run build` if your change affects runtime packaging.
21+
22+
## Commit & Pull Request Guidelines
23+
Recent history uses Conventional Commits, usually with a scope: `feat(scheduler): ...`, `fix(pipeline): ...`, `docs: ...`. Keep commits focused and descriptive. PRs should answer: what changed, why it changed, and how to test it. Follow `.github/pull_request_template.md`: confirm tests pass, types compile, `console.log` calls are removed, and docs are updated when needed.

CLAUDE.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@ Multi-agent orchestrator that runs parallel Claude Code agents in git worktrees.
1313
- **src/worktree-pool.ts** — Git worktree lifecycle, parallel init, merge
1414
- **src/store.ts** — SQLite persistence (better-sqlite3, WAL mode)
1515
- **src/types.ts** — Shared TypeScript types
16+
- **src/pipeline.ts** — 5-stage autonomous pipeline (research→decompose→execute→verify→done)
17+
- **src/pipeline-types.ts** — Pipeline type definitions
18+
- **src/pipeline-store.ts** — Pipeline run persistence
19+
- **src/task-classifier.ts** — Task routing (quick/standard/deep → model/agent/contextProfile)
1620
- **src/logger.ts** — Structured JSON logger
1721
- **src/web/index.html** — Dashboard (vanilla HTML/JS, dark theme)
1822

@@ -30,7 +34,7 @@ node dist/index.js --repo /path/to/repo --workers 5 --port 8080
3034
```
3135

3236
```bash
33-
# Run tests (282 tests across 8 suites)
37+
# Run tests (372 tests across 10 suites)
3438
node --import tsx --test src/__tests__/*.test.ts
3539
```
3640

@@ -45,6 +49,14 @@ node --import tsx --test src/__tests__/*.test.ts
4549
## Agent Flywheel Strategy
4650
The cc-manager improves itself by running agents against its own codebase.
4751

52+
### Current Status: NOT WORKING (v0.1.7)
53+
Two self-hosting runs (v0.1.5, v0.1.6) achieved only 43-50% commit rate. All "successful" runs required manual fixes. The flywheel loop does not yet produce reliable, mergeable code autonomously.
54+
55+
**Root causes identified**:
56+
- Agents exit 0 without committing (fixed in v0.1.7: F1 empty commit detection)
57+
- Complex files (scheduler.ts 618 LOC) always fail multi-point integration
58+
- System prompt commit instruction too weak (fixed in v0.1.7: F2 CRITICAL warning)
59+
4860
### Proven Best Practices
4961
- **240s timeout** — sweet spot (120s = 80% failure, 180s = occasional timeout)
5062
- **One file per task** — prevents merge conflicts between concurrent agents
@@ -146,7 +158,7 @@ pending → running → success (branch merged to main)
146158

147159
## Repository
148160
- **GitHub**: `agent-next/cc-manager` (private)
149-
- **Version**: v0.1.0
161+
- **Version**: v0.1.7
150162

151163
## Security Notes
152164
- **No authentication**: cc-manager has no auth. It is a local dev tool — do NOT expose to the public internet.

0 commit comments

Comments
 (0)