Skip to content

[codex] add reviewed parallel worktree starter#9

Draft
ChaosRealmsAI wants to merge 37 commits into
mainfrom
fix/runtime-resource-leaks
Draft

[codex] add reviewed parallel worktree starter#9
ChaosRealmsAI wants to merge 37 commits into
mainfrom
fix/runtime-resource-leaks

Conversation

@ChaosRealmsAI
Copy link
Copy Markdown
Owner

@ChaosRealmsAI ChaosRealmsAI commented Jun 3, 2026

What changed

  • Added the built-in odw starter parallel-review-apply workflow for large project work: parallel implementation worktrees, review gate, approve-only atomic apply, final read-only verification, and report visibility.
  • Added optional low-decision-cost args.request / args.spec planning: when explicit args.tasks are omitted, the starter asks a structured planner for owned task files, then runs the same preflight, parallel implementation, review, repair, apply, and verify gates.
  • Added worktree diff apply/review helpers, main worktree snapshot guard/restore helpers, richer report events, and odw runs show summaries for review/apply/snapshot outcomes.
  • odw runs show now summarizes final workflow history so request-only runs show plan, review reject/approve, targeted repair, and verify status without opening the full HTML report.
  • odw runs list now defaults to a compact human-readable run list, while odw runs list --json preserves the raw machine-readable records.
  • Failed odw runs show views now surface state.result.error in the header, so failed planner/implementation runs show the actionable cause before the event tail.
  • CI now uses Node 24-compatible official action versions (actions/checkout@v6, actions/setup-node@v6, actions/cache@v5) ahead of the June 2026 GitHub Actions Node 20 deprecation warning.
  • PandaCode task/session CLI help now explains common inputs like --task, --task-file, --cd, --runtime, --timeout-ms, and --json instead of showing blank option descriptions.
  • PandaCode top-level inspection commands now default to compact human-readable summaries for doctor, models, and list; --json preserves the full machine-readable reports for automation.
  • PandaCode Codex sessions are now saved before long execute turns and status/logs fallback to saved session state plus local logs when a completed per-session Codex daemon has already been reaped.
  • PandaCode also saves starting-phase Codex sessions before codexctl session start returns, so status/list can see a just-launched long task instead of reporting no latest session.
  • HTML reports now show the same compact workflow history in the overview, and successful repair flows are not marked failed just because an earlier review gate rejected before repair.
  • Added starter docs, examples, API contract updates, and selftests for atomic apply, review gates, repair flow, task ID validation, task prompt validation, task file ownership/path validation, dirty task-file guard, duplicate task-file guard, final verification guard, request planning, workflow-history summaries, and run listing UX.
  • Fixed PandaCode Codex text auto-answer by preserving pending question context even when Codex returns needs_input with a non-zero exit code.
  • Fixed schema-node extraction so long structured final JSON is parsed from the untruncated PandaCode final assistant message before log/report truncation.
  • Improved the starter repair loop so multi-file review blockers target the primary blocker file instead of rewriting every task mentioned as evidence; 3-task batches default to 3 review rounds and 4+ task batches default to 4 review rounds.
  • Improved review-repair task selection again so blocker text that starts with test-failure evidence but names a source-file root cause repairs the source task instead of repeatedly rewriting the tests/docs task.
  • Preserved exact captured git patch text when combining worktree diffs, so blank context lines are not corrupted before review/apply.
  • Review preflight failures now become structured blockers and include preflight category/message in review gate events and odw runs show, so patch conflicts or review-workspace failures are actionable without scraping stderr.
  • The starter repair selector now matches root-cause symbols defined in candidate diffs, with task-prompt symbols only as a fallback, so function-named blockers such as createDecisionDigest repair the owning core task without broadly rerunning docs/tests tasks.
  • odw runs show and HTML report workflow history now include the first review blocker sample, so a reject line shows the actionable cause instead of only blockers=1.
  • Clarified reviewer prompts so new files appearing untracked in the temporary review worktree are not mistaken for missing landing coverage; approval still lands the captured patch, including new files.
  • Added preflight guards so parallel tasks must have stable unique IDs, declare non-empty string prompts, declare normalized repo-relative owned files, avoid internal/generated directories, and cannot accidentally claim the same declared file without explicit allowDuplicateTaskFiles:true intent.
  • Hardened schema-node and worktree-review prompts so reviewers return final JSON only, put reject evidence into structured fields, and tests/docs tasks do not invent undeclared public entrypoints or skip verification just to pass an isolated worktree.
  • The starter now injects the run context and full planned task list into every implementation/repair prompt so parallel tasks share one product/API contract instead of each worktree inferring sibling contracts independently.
  • Constrained planner-produced task runtime / permission values so a planner cannot route implementation work to unsupported PandaCode commands such as node.

Why

This supports the intended remote product-work loop: owner comments/specs can fan out into isolated AI tasks, be reviewed adversarially as a combined candidate, and only land after an explicit approve gate and verification evidence. The request/spec planner reduces owner decision cost when the owner is working from a short doc comment or iPad note instead of hand-authoring task/file decomposition.

Root causes fixed

  • Real dogfood exposed that Codex can return needs_input with question context during start and exit non-zero. PandaCode previously only persisted pending input on ok executor output, so ODW auto-answer could fail with cannot infer Codex question id for --text.
  • A dependent three-task dogfood run showed that repair feedback mentioning docs/src/tests caused all tasks to be redone from clean worktrees. That let unrelated implementation/test candidates drift. The starter now treats secondary file mentions as evidence and repairs the primary blocker task when possible.
  • The same dogfood run showed reviewer uncertainty when newly created files appear untracked in the temporary review workspace. The review prompt now explains that this is expected for captured patches and not a landing blocker by itself.
  • Large-project decomposition can accidentally assign the same ID/file to multiple parallel tasks, leave file ownership undeclared, declare unsafe/non-normalized paths, or leave prompts empty/non-string. The starter now fails before creating worktrees instead of entering node/session collisions, weak ownership checks, path escape risks, ambiguous task execution, or predictable patch-conflict/retry loops.
  • A four-task real Codex dogfood run showed reviewers often found the right blockers but answered schema nodes with prose like reject, forcing repeated schema_mismatch retries. The same run showed a tests/docs task inventing an undeclared index.js public entrypoint and trying to skip tests in an isolated worktree. The runtime and starter prompts now make those failure modes explicit.
  • A second four-task dogfood run with index.js ownership fixed showed tests/docs still invented a different schema (runId, complete, pending) because implementation nodes only saw their own task prompt. The starter now gives every task the shared run context and full planned task contracts.
  • Request-only planner dogfood showed the planner could return valid task-plan JSON longer than ODW's compact report text limit; schema validation saw the truncated envelope instead of the final JSON. ODW now parses structured output before truncating report text.
  • Request-only planner dogfood also showed a planner can choose an implementation runtime like node; the starter now rejects/normalizes unsupported planner runtimes before implementation fan-out.
  • Report dogfood on the successful request-only run showed the HTML overview had no workflow history and marked the whole run failed because an earlier review gate rejected before repair. The report overview now shows compact history and only marks the run failed when final workflow status/result failed.
  • Dogfooding the run-journal entrypoint showed odw runs list was raw JSON by default, which is hard to scan during remote/iPad-style project checks. The default list is now compact, with --json for automation.
  • Replaying failed request-only dogfood runs showed runs show buried terminal causes like planning_failed and no captured worktree changes in the event tail. Failed run headers now extract object/string result errors directly from state.
  • Dogfooding PandaCode's top-level inspection commands showed doctor, models, and list all printed large raw JSON by default, so the common "is this usable right now?" check was harder than necessary. The defaults now summarize runtime health, model counts/defaults, and session counts, while scripts keep the old full shape via --json.
  • A new request-only remote-product-loop dogfood run showed review blockers like node test.mjs exits ... src/annotations.js does not infer... were repaired as public-api-tests-docs because test.mjs appeared first. The repair selector now scores root-cause file mentions and penalizes test-failure evidence, so the next repair targets src/annotations.js.
  • A second-slice remote-product-loop dogfood run repeatedly failed review preflight with error: corrupt patch at line ... even though each candidate diff applied individually. Root cause: ODW used trimEnd() before concatenating git patches, which stripped the single-space line that represents a blank context line in unified diffs. The combiner now preserves patch text and only appends a missing newline.
  • The same failed run showed review r1/r2/r3: reject ... blockers=0 while the actionable corrupt patch cause only appeared in stderr. Review preflight failures now emit a structured blocker plus preflight category/message so repair prompts, runs show, and reports can surface the cause directly.
  • A third-slice remote-review dogfood run (odw-exec-1780521870647-88709) showed repeated integration-only repairs while reviewers identified the exported createDecisionDigest core contract as the root cause. The repair selector now uses candidate-defined symbols to map function-named blockers back to the owning task.
  • The next dogfood run (odw-exec-1780523246667-9602) showed prompt-symbol matching was too broad and caused full-batch repairs, which made independent task contracts churn. Symbol matching now prefers definitions found in captured diffs and only falls back to task prompts when no definition evidence exists.
  • A fourth-slice artifact-diff/handoff dogfood run first failed safely at the third review round with a clear, still-repairable contract mismatch. The successful rerun used 5 planned tasks and converged after three reject/repair cycles under a 4-round ceiling. The starter now gives 4+ task batches the full 4-round default so complex contract-churn slices get one more targeted repair opportunity before safe failure.
  • Direct PandaCode Codex dogfood showed status could report a successful completed session as failed after PandaCode intentionally stopped the per-session daemon: codexctl session read returned ok:false unknown run id, while list and artifacts still had the session record. PandaCode now persists last state/summary and treats that read failure as a local-record fallback instead of a task failure.
  • A follow-up long Codex dogfood run showed the opposite edge: during codexctl session start, prompt/log files already existed but no session record had been saved yet, so status and list still showed no latest session. PandaCode now writes a provisional starting session record with prompt/log/socket artifacts before invoking start.

Validation

  • node scripts/selftest.mjs in odw: 87/87 passed, including structured review preflight blocker/event assertions, symbol-named root-cause repair selection, blocker sample history, and 4+ task default review-round coverage
  • cargo test at workspace root: ODW unit tests + parity selftest, PandaCode 122 unit tests, and 16 fake runtime tests passed
  • cargo clippy --workspace --all-targets -- -D warnings: passed
  • Verified with the patched local pandacode binary that pandacode status --runtime codex --cd /Users/Zhuanz/workspace/pandacode-direct-dogfood-20260604 now returns ok:true, state:"completed", and live_read_unavailable:true instead of surfacing the dead-daemon unknown run id as the session state.
  • Verified patched pandacode logs --runtime codex --json falls back to .pandacode/codex/runs/<session>/logs/latest.jsonl and returns a local log tail when live read is unavailable.
  • Added fake-runtime regression coverage for pandacode codex status --session latest while codexctl session start is still sleeping; the latest pointer exists, status.state is starting, and top-level list shows one Codex session.
  • cargo fmt --package open-dynamic-workflow --check: passed
  • Installed local CLIs: odw 0.3.1, pandacode 0.3.1
  • Verified latest pandacode run --help shows descriptions for inline task text, task files, workspace directory, runtime selection, machine-readable JSON, and timeout.
  • Verified pandacode doctor, pandacode models, and pandacode list default to compact summaries, and --json output for all three still parses as JSON.
  • Verified installed odw runs show odw-exec-1780515008596-9952 --path /Users/Zhuanz/workspace/odw-request-planner-dogfood-20260604-v3 prints a compact workflow history: plan → reject → targeted repair → approve → verify.
  • Regenerated /Users/Zhuanz/workspace/odw-request-planner-dogfood-20260604-v3/.odw/runs/odw-exec-1780515008596-9952/report.html with the latest odw: report overview contains plan → reject → targeted repair → approve → verify and "failed":false.
  • Verified latest odw runs list --path /Users/Zhuanz/workspace/odw-request-planner-dogfood-20260604-v3 prints a compact one-line run summary and odw runs list --json still parses as raw run records.
  • Verified latest odw runs show on failed dogfood runs odw-exec-1780514257522-79981 and odw-exec-1780514812502-99690 surfaces Failure: planning_failed: ... and Failure: no captured worktree changes in the header.
  • Verified GitHub Actions warning surfaced on the PR and upgraded the official checkout/setup-node/cache actions to Node 24-compatible major versions; remote CI is used as the final validation for that workflow change.
  • Dogfooded pandacode run --help and found key task/session flags had empty descriptions. Added help text plus a fake-runtime regression test for the common run options.
  • Real Codex dogfood odw-exec-1780505513941-97703: parallel docs run approved, applied atomically, final npm test passed, verify snapshot clean.
  • Real Codex dependent dogfood odw-exec-1780507859710-45655: implementation/tests/docs split across 3 worktrees, Codex needs_input auto-answered, first review rejected docs only, docs-only repair retained implementation/test candidates, second review approved, 3 patches applied atomically, final npm test passed, verify snapshot clean.
  • Real Codex four-task dogfood odw-exec-1780510655854-15704: schema/analysis/report/tests-docs split across 4 worktrees, review rejected failing/incoherent tests, repair stayed local to implicated tasks, Codex needs_input auto-answer was exercised, max review rounds stopped without applying unsafe patches, and the final gate retained blocker evidence.
  • Real Codex four-task dogfood odw-exec-1780512381288-43153: explicit index.js ownership, no reviewer schema_mismatch after prompt hardening, tests/docs no longer skipped isolated verification, final gate safely rejected an invented API contract and retained evidence for the new shared-context fix.
  • Real Codex shared-context dogfood odw-exec-1780513476465-57923: mobile planning-board module split across 4 worktrees, every implementation prompt included batch context/full planned task contracts, review approved, 4 captured patches applied atomically, final node test.mjs passed, verify snapshot clean, and output aligned on index.js plus canonical planned/running/blocked/done statuses instead of the prior invented contract.
  • Real Codex request-only planner dogfood odw-exec-1780514257522-79981: exposed long structured JSON extraction failure before any unsafe implementation landed.
  • Real Codex request-only planner dogfood odw-exec-1780514812502-99690: planner passed schema but selected unsupported runtime:"node"; implementation failed safely before applying changes.
  • Real Codex request-only planner dogfood odw-exec-1780515008596-9952: high-level request only, planner produced 4 owned tasks, first review rejected a failing public test, targeted repair reran only public-fixtures-tests-docs, second review approved, 4 captured patches applied atomically, final node test.mjs passed, verify snapshot clean.
  • Real Codex request-only remote-product-loop dogfood odw-exec-1780518366711-1253: high-level request only, planner produced 4 owned tasks, implementation completed 4 worktrees, review correctly rejected failing source/test behavior, and the run safely refused to land after exposing that root-cause blockers were repeatedly routed to the tests/docs task. Added a mock regression test for that repair-targeting failure.
  • Real Codex request-only remote-product-loop dogfood odw-exec-1780519403300-26020 after the root-cause repair fix: same high-level owner request, planner produced 5 owned tasks, implementation completed 5 worktrees, dual review approved in round 1 after node test.mjs and npm test, 5 patches applied atomically, final node test.mjs passed, verify snapshot clean.
  • Real Codex second-slice remote-product-loop dogfood odw-exec-1780519926354-29730: existing first-slice project plus a high-level request for Feishu-friendly Markdown specs; planner produced 4 owned tasks and implementation completed, but review preflight repeatedly failed safely with corrupt patch before landing. Added a mock regression test covering combined patches with trailing blank context lines.
  • Real Codex second-slice remote-product-loop dogfood odw-exec-1780520642507-54665 after the patch combiner fix: planner produced 3 owned tasks, implementation completed 3 worktrees, review workspace applied the combined candidate without corrupt patch, first review rejected missing prompt content, targeted repair reran only generator/docs while retaining API/tests, second review approved, 3 patches applied atomically, final node test.mjs passed, verify snapshot clean.
  • Real Codex third-slice remote-review dogfood odw-exec-1780521870647-88709: planner produced 2 owned tasks, review found failing digest source-id behavior and batch-only input bugs, targeted repair repeatedly reran only integration and safely refused to land when the root cause belonged to createDecisionDigest.
  • Real Codex third-slice remote-review dogfood odw-exec-1780523246667-9602: after initial symbol matching, planner produced 4 owned tasks and review found a batch metadata mismatch, but prompt-symbol matching was too broad and all tasks were repaired; final review safely rejected inconsistent trace.batch.id/trace.batch.batchId contract churn without landing.
  • Real Codex third-slice remote-review dogfood odw-exec-1780524017351-26812 after preferring diff-defined symbols: planner produced 4 owned tasks, first review rejected over-broad open-question classification in src/decision-digest.js, targeted repair reran only create-decision-digest|render-decision-digest while retaining 4 files, second review approved, 4 patches applied atomically, final node test.mjs passed, verify snapshot clean. Re-running odw runs show with the latest CLI now shows that first blocker sample directly in workflow history.
  • Real Codex fourth-slice remote-review dogfood odw-exec-1780525079576-47490: artifact snapshot diff + owner handoff packet request, planner produced 4 owned tasks, review/repair targeted the right core/Markdown/docs tasks while retaining tests, and the run failed safely at max review round 3 with concrete blocker evidence instead of landing a contract mismatch.
  • Real Codex fourth-slice remote-review dogfood rerun odw-exec-1780525853321-53849 with a 4-round ceiling: planner produced 5 owned tasks, first review rejected inconsistent digest/source-comment expectations, targeted repair reran 4 tasks; second review rejected sourceComments.changed contract mismatch, targeted repair reran only 3 tasks while retaining README/tests; third review approved, 5 patches applied atomically, final node test.mjs passed, verify snapshot clean. Dogfood project commit: 748355c add owner handoff packet dogfood slice.
  • Real Codex fifth-slice remote-review dogfood odw-exec-1780527063497-83594 using the new default with no explicit maxReviewRounds: request-only planner produced 4 owned tasks, first reject showed the real default as round 2/4, targeted repair reran only tests/core; second reject reran tests/core/Markdown-API while retaining docs; third review approved, 4 captured patches applied atomically, final node test.mjs passed, verify snapshot clean. Dogfood project commit: 432c57b add owner review queue dogfood slice.
  • Direct PandaCode Codex dogfood in /Users/Zhuanz/workspace/pandacode-direct-dogfood-20260604: pandacode run --runtime codex built a dependency-free ESM remote product-comment inbox library, generated tests/docs, and reported npm test passing. During the long run, status could not see latest until the session record was saved; after completion, old status showed unknown run id despite successful artifacts. Added record-save/fallback behavior and fake-runtime coverage.
  • Direct PandaCode Codex long-run dogfood in /Users/Zhuanz/workspace/pandacode-active-status-dogfood-20260604: while execute was running, patched status returned state:"running" and the last agent message; after completion, patched status returned ok:true, state:"completed", live_read_unavailable:true, and the saved last summary. The generated remote-spec workflow package passed npm test and was committed as 342221a add remote spec workflow pandacode dogfood.
  • Added regression coverage for review preflight failures carrying blockers plus preflight_category/preflight_message, for runs show rendering those fields in compact recent events, for symbol-named root-cause repair selection, and for review blocker samples in CLI/HTML workflow history.

Reports:

  • /Users/Zhuanz/workspace/odw-real-codex-dogfood-20260603/.odw/runs/odw-exec-1780505513941-97703/report.html
  • /Users/Zhuanz/workspace/odw-dependent-task-dogfood-20260603/.odw/runs/odw-exec-1780507859710-45655/report.html
  • /Users/Zhuanz/workspace/odw-four-task-dogfood-20260604/.odw/runs/odw-exec-1780510655854-15704/report.html
  • /Users/Zhuanz/workspace/odw-four-task-success-dogfood-20260604/.odw/runs/odw-exec-1780512381288-43153/report.html
  • /Users/Zhuanz/workspace/odw-shared-context-success-dogfood-20260604/.odw/runs/odw-exec-1780513476465-57923/report.html
  • /Users/Zhuanz/workspace/odw-request-planner-dogfood-20260604/.odw/runs/odw-exec-1780514257522-79981/report.html
  • /Users/Zhuanz/workspace/odw-request-planner-dogfood-20260604-v2/.odw/runs/odw-exec-1780514812502-99690/report.html
  • /Users/Zhuanz/workspace/odw-request-planner-dogfood-20260604-v3/.odw/runs/odw-exec-1780515008596-9952/report.html
  • /Users/Zhuanz/workspace/odw-remote-product-loop-dogfood-20260604-v1/.odw/runs/odw-exec-1780518366711-1253/report.html
  • /Users/Zhuanz/workspace/odw-remote-product-loop-dogfood-20260604-v1/.odw/runs/odw-exec-1780519403300-26020/report.html
  • /Users/Zhuanz/workspace/odw-remote-product-loop-dogfood-20260604-v1/.odw/runs/odw-exec-1780519926354-29730/report.html
  • /Users/Zhuanz/workspace/odw-remote-product-loop-dogfood-20260604-v1/.odw/runs/odw-exec-1780520642507-54665/report.html
  • /Users/Zhuanz/workspace/odw-remote-product-loop-dogfood-20260604-v1/.odw/runs/odw-exec-1780521870647-88709/report.html
  • /Users/Zhuanz/workspace/odw-remote-product-loop-dogfood-20260604-v1/.odw/runs/odw-exec-1780523246667-9602/report.html
  • /Users/Zhuanz/workspace/odw-remote-product-loop-dogfood-20260604-v1/.odw/runs/odw-exec-1780524017351-26812/report.html
  • /Users/Zhuanz/workspace/odw-remote-product-loop-dogfood-20260604-v1/.odw/runs/odw-exec-1780525079576-47490/report.html
  • /Users/Zhuanz/workspace/odw-remote-product-loop-dogfood-20260604-v1/.odw/runs/odw-exec-1780525853321-53849/report.html
  • /Users/Zhuanz/workspace/odw-remote-product-loop-dogfood-20260604-v1/.odw/runs/odw-exec-1780527063497-83594/report.html
  • Installed CLI slow-start smoke in /Users/Zhuanz/workspace/pandacode-installed-start-smoke-20260604: env FAKE_CODEX_START_SLEEP=30 pandacode codex exec ... used the installed /Users/Zhuanz/.cargo/bin/pandacode; while codexctl session start was still sleeping, pandacode codex status returned ok:true, state:"starting", live_read_unavailable:true, and top-level pandacode list showed codex: 1; after start completed, status and logs returned the completed fake session successfully.
  • Installed ODW/PandaCode complex request-only dogfood in /Users/Zhuanz/workspace/odw-installed-complex-dogfood-20260604: odw exec --backend pandacode --input-file odw-input.json --effort medium used the installed CLI on a fresh repo, planner produced 4 owned tasks, 4 Codex worktrees implemented 9 files, two review agents approved in round 1 after npm test and export/integration smoke checks, 4 patches landed atomically, final npm test passed, verify guard was clean, and dogfood project commit 8dc4311 add remote product ops dogfood slice captured the result. Report: /Users/Zhuanz/workspace/odw-installed-complex-dogfood-20260604/.odw/runs/odw-exec-1780530282012-8494/report.html.
  • Second installed ODW/PandaCode incremental dogfood in /Users/Zhuanz/workspace/odw-installed-complex-dogfood-20260604: run odw-exec-1780530747351-12325 iterated an existing package with owner inbox, Markdown handoff, and batch progress APIs. Review round 1 correctly rejected a Markdown/progress integration bug, then old repair targeting overmatched all 4 tasks (markdown-handoff,batch-progress,owner-inbox,public-api-docs) even though the blocker named src/markdown.js/src/progress.js. Added b238689 fix(odw): avoid prompt-symbol repair overmatching, so explicit blocker file paths now take precedence over broad prompt-symbol fallback; new selftest does not let prompt API mentions broaden file-path repair covers the dogfood failure. Dogfood result still converged after repair, final npm test passed, verify guard clean, dogfood project commit 435544b add owner inbox and markdown handoff dogfood slice; report: /Users/Zhuanz/workspace/odw-installed-complex-dogfood-20260604/.odw/runs/odw-exec-1780530747351-12325/report.html.
  • Validation after b238689: node scripts/selftest.mjs passed 88/88, ODW=$(which odw) node scripts/selftest.mjs passed 88/88 against the installed CLI, cargo test passed, cargo clippy --workspace --all-targets -- -D warnings passed, cargo fmt --all --check passed, and git diff --check passed.

claude added 30 commits June 3, 2026 20:14
Why: each codex node lazily starts a codexctl daemon via --session-socket but never stopped it, so finished sessions left a daemon + codex child resident (dogfooding accumulated 200+ orphaned daemons, ~3GB) until codexctl's idle-timeout reaped them.

What: exec/resume/answer stop the daemon once a session is terminal — guarded by session_is_terminal (not awaiting input, not running async), so a needs_input / answer --no-wait continuation is never orphaned. resume() resolves a live run by mirroring that rule: a parked (needs_input) session's daemon was kept, so its run is still live and is continued directly; any other session was reaped, so its ephemeral run_id is dead and the thread is resurrected from codex's persisted rollout (thread_id) into a fresh run used for send AND execute. This keeps resume working across daemon reaping (verified: exec -> daemon=0 -> resume recalls context, incl. multi-hop).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Why: .odw/runs grew unbounded (dogfooding accumulated 4882 run dirs).

What: after creating a run dir, prune all but the newest ODW_RUNS_KEEP runs (default 50, 0=off). Protects this run's own dir and the resume source, so a concurrent exec in the same repo cannot delete an in-progress run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants