Skip to content

feat: composer workflow restructure and factory hardening#126

Open
FrkAk wants to merge 43 commits into
mainfrom
worktree-composer-workflow-restructure
Open

feat: composer workflow restructure and factory hardening#126
FrkAk wants to merge 43 commits into
mainfrom
worktree-composer-workflow-restructure

Conversation

@FrkAk

@FrkAk FrkAk commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Summary

Task Reference: [MYMR-237]

Three stacked rounds of composer work, making it a PR-terminal software factory: a Mymir task goes research → plan → implement → CI gate → review → bounded fix loop → opened PR, with crash-safe recovery and a GitHub-feedback rework round-trip. HOTL owns merging; composer terminates at the opened PR.

Round 1 — workflow restructure (63b7e91..4e3c51d):

  • skills/composer/SKILL.md rewritten as a lean workflow: shared STATUS vocabulary (DONE / DONE_WITH_CONCERNS / NEEDS_DECISION / BLOCKED), todo-anchored loop with digraph, continuous execution, structural stop conditions, red-flags table, CSO-fixed description. /goal harness removed entirely.
  • Bounded review→fix loop (2 rotations, then HOTL escalation); implementer fix mode.
  • Slim per-phase reference extracts under skills/composer/references/ replace full mymir-spec force-loads in the four phase agents (researcher spec context ~4,950 → ~2,650 words; reviewer ~6,500 → ~900); extract headings carry canonical source prefixes so citations resolve.

Round 2 — factory hardening (9b5e173..97c9c15):

  • 13 reliability rules from an adversarial audit: plannable-pick exit + dependency pre-flight, Mymir-transport stop condition, BLOCKED-because-terminal exception, claimed-task entry, PR-link recovery with [taskRef] bracket verification, environmental gh failure handling, backlog transient retry + stranded-task report, stale-claim sweep, headless gate skip, provisional propagation on escalated verdicts, branch-collision check, foreign-commit re-evaluation, terminal-write status re-read.
  • Implementer runs worktree-isolated (isolation: worktree, live-verified incl. signed-commit push from inside a subagent worktree); default-branch derivation replaces hardcoded main; merge-forward policy before PR and each fix rotation; claim-ownership semantics with branch-evidence fallback.
  • CI gate: orchestrator waits bounded (timeout 600 gh pr checks --watch, exit-code mapped) before dispatching review; pending/unresolved CI caps the verdict at request-changes.
  • Crash-safe append-only run log at .mymir/composer-<project>.md (event vocabulary, increment-before-dispatch rotation counter, grep-derived counters, recovery algorithm, archive rotation).
  • Rework mode: /mymir:composer rework <taskRef|pr-url> — reviewer-led intake fetches unresolved GitHub review threads (GraphQL, outdated-anchor re-location), re-verifies against HEAD, feeds the existing fix loop with a fresh 2-rotation budget; implementer may post one summary comment, never resolves threads.
  • Context/cost: researcher single depth='agent' fetch; estimate-based model selection with 7 opus-forcing guardrails (reviewer never downgraded); app fix: formatCriteria now renders acceptance-criterion ids (lib/context/format.ts), closing a bug where the researcher's documented by-id AC rewrite appended duplicates — TDD'd in tests/context/format.test.ts, 5 golden snapshots regenerated.
  • Flag-gated research-ahead pipelining (--pipelined): conservative variant, lookahead 1, 7-row brief-invalidation table, kill switch after two consecutive invalidations.
  • scripts/check-plugins.ts gains two CI gates: @-include target resolution across every plugin (a dangling include silently strips an agent's rules at runtime) and canonical hash pins for the composer extracts (references/sources.json) — any edit to a pinned mymir reference fails CI until the extracts are reviewed and the pin refreshed via bun run sync:plugins.

Round 3 — fresh-eyes review fixes (74eb69b, cf3fcb9): 10 confirmed defects, each verified against server code or reproduced shell behavior before fixing:

  • Implementer branch setup ran git checkout "$DEFAULT_BRANCH", which git refuses (exit 128) inside the worktree isolation its own frontmatter mandates; now fetches and branches from origin/$DEFAULT_BRANCH without checking out.
  • Rework fix dispatches never said "rework", but the implementer's fix-mode contract requires the word to accept an in_progress entry (HOTL flip); dispatches now carry a Rework. prefix.
  • Completion Protocol AC payload example was schema-invalid ({id, checked}; the MCP schema requires text); now {id, text, checked}.
  • Orchestrator's reviewer dispatch instructed an up-front depth='review' fetch that defeats the reviewer's own two-phase isolation; dropped to the contract's 3-line dispatch shape.
  • review.md pre-flight conditioned on files, which depth='working' mechanically excludes; reworded to the missing-PR-handle check.
  • --pipelined parsed as single-task under the invocation rule; the prefetch brief had no run-log event and risked breaking the last-PICK recovery invariant (new BRIEF event, prefetch logs no PICK).
  • Stale researcher/planner frontmatter descriptions (researcher claimed it "does not write to Mymir"); canonical artifacts.md §2 tag-vocab guidance corrected overviewmeta with extract + mirror sync and pin refresh.

Committed regression suite (skills/composer/tests/scenarios.md, now 20 scenarios with RED→GREEN provenance) — all pass against this branch. Platform mirrors (codex / cursor / antigravity) synced for the review skill, reviewer-rules extract, and mymir references.

Type of change

  • New feature
  • Bug fix
  • Refactor / cleanup

Testing

  • Tested locally with bun test tests/context tests/api/task-context.test.ts (36 pass)
  • Linting passes (bun run lint)
  • Typecheck passes (bun run typecheck)
  • bun run check:plugins (mirror sync, include resolution, extract pins) and bun run format:check pass

Plugin markdown verified by the committed 20-scenario pressure suite dispatched as fresh subagents (loopholes found during runs were countered and re-tested). Live worktree-isolation smoke test confirmed branch/commit/push from inside a plugin-agent worktree.

Notes for reviewer

  • Live end-to-end acceptance (a real /mymir:composer <task> run + rework round-trip) requires this branch installed as the plugin source and a fresh session; checklist in the repo-external plan doc.
  • decompose-* agents still force-load full mymir specs; converting them to slim extracts is deliberate future work. The rework GraphQL intake reads the first 100 review threads (no hasNextPage pagination yet) — documented future work.
  • No caller-id MCP surface exists today, so task claims use branch-evidence ownership (assigneeIds wiring documented as the upgrade path).

@FrkAk FrkAk changed the title feat: restructure composer as workflow with slim agent extracts feat: composer workflow restructure and factory hardening Jun 12, 2026
Comment thread scripts/check-plugins.ts Fixed
@FrkAk FrkAk self-assigned this Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants