fix(executor): consume the terminal-write CAS verdict (#671/H4) + #1095 research doc#1195
Conversation
Records the research deliverable for #1095 (research-gated): mechanism selection (transaction-owned git ref, serialized+git v1), the §0 evidence GO/NO-GO gate with a reachable NO-GO verdict on today's sample, the §9 hazard register, the §11 crash-injection matrix, and the §13 reachable implementation path. Implementation remains gated on a production-evidence GO + human review; §13 step 1 (the H4 CAS fix) is the one piece shipped independently of that gate. Refs #1095 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
task_execution_service discarded the bool returned by db.update_execution_status — an atomic compare-and-set. A SUCCESS write loses the CAS only to a CANCELLED row, but the caller ran the won-only side effects unconditionally: it completed the activity as COMPLETED, reset the dispatch breaker, and returned a SUCCESS result. A turn the operator already cancelled was therefore reported back as a billable success. Now the SUCCESS terminal consumes the CAS verdict: on a lost CAS it re-reads the row, completes the activity as FAILED (not COMPLETED), skips the breaker reset, and returns the persisted terminal status with a new RECONCILED error code instead of SUCCESS. The won path is unchanged. Lost CAS => reconcile, never act. Adds tests/unit/test_terminal_write_cas_gate.py (lost-CAS reconcile + won-path regression pin). The DB-layer CAS itself stays covered by tests/unit/test_cancelled_not_overwritten.py. Refs #1095 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Converges the five FAILED-terminal writers (CB fast-fail, timeout, backend call-budget, HTTPError salvage, generic Exception) onto one helper, _write_terminal_and_gate, which writes the terminal through update_execution_status's CAS and completes the activity ONLY when this writer won. Replaces the old check-then-act guard (get_execution() -> status != CANCELLED), which both raced (TOCTOU) and only blocked CANCELLED — the CAS additionally blocks an already SUCCESS/FAILED row. The HTTPError path's AUTH dispatch-breaker outcome is now gated on the same won verdict, so a turn superseded by a cancel can't count toward the breaker. The FAILED TaskExecutionResult return stays unconditional (only side effects gate). Sites with no side effects (CapacityFull, CircuitOpen) and the CancelledError 3-state guard are left as-is. Adds the FAILED-path lost-CAS coverage to test_terminal_write_cas_gate.py. Full tests/unit suite green (2135 passed). Refs #1095 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Resolve by running |
vybe
left a comment
There was a problem hiding this comment.
✅ Validated & approved (/validate-pr)
Research deliverable + the H4 CAS reliability fix it surfaced (Refs #1095 P1; the #671/H4 service-layer bug). 3 files, +851/-77 (488 of which are the planning doc).
The bug is real: db.update_execution_status is an atomic CAS returning the winner; every terminal writer in execute_task discarded it, so an operator-cancelled turn could be written back as a billable SUCCESS (activity COMPLETED, breaker reset, SUCCESS to caller).
Fix review:
_write_terminal_and_gateconsolidates the 5 FAILED paths; completes the activity only on a won CAS;won=Truewhenexecution_idfalsy preserves prior no-record behavior.- SUCCESS path consumes the verdict → re-reads row, completes activity as FAILED (not COMPLETED), skips breaker reset, returns new
RECONCILEDcode + the persisted terminal status. - AUTH dispatch-breaker outcome gated on
won(a superseded turn can't count toward the breaker). - Correctly replaces check-then-act (TOCTOU + CANCELLED-only) with the CAS (also blocks already-SUCCESS/FAILED). Untouched sites (slot-release
finally, CapacityFull/CircuitOpen no-effect, CancelledError 3-state guard) correctly left alone.
Tests: test_terminal_write_cas_gate.py pins won/lost on both SUCCESS and FAILED terminals + happy-path over-gating regression; TDD red→green; full unit 2135 passed; verify-local 5/5 (incl. import-smoke #1033 class).
- Base →
dev✓ · 3 conventional commits ✓ · CI all green - Security scan clean; no new backend modules / env vars / infra
- The broad transactional subsystem correctly stays NO-GO behind the §0 evidence gate.
Minor (non-blocking): the new RECONCILED reconciliation could earn a one-line mention in architecture.md's task_execution_service notes, but as an internal correctness fix a descriptive commit suffices.
Merge after #1195's sibling #1203 is rebased — they share task_execution_service.py, so land this first (smaller diff).
Summary
Issue #1095 ("transactional agent executions — discard workspace changes unless validated as success") is research-gated. This PR delivers two things:
docs/planning/TRANSACTIONAL_EXECUTIONS_2026-06.md. Its §0 evidence gate returns a NO-GO for the broad transactional subsystem on today's data; the full mechanism (container-local coordinator,~/.trinity/post-check,tx_statereconciliation, crash-injection suite) is an explicitly deferred, GO-gated follow-up.So this is "research + the reliability fix it surfaced," not research alone.
The bug (H4)
db.update_execution_status(...)(src/backend/db/schedules.py) is an atomic compare-and-set that returns the CAS winner (bool). A SUCCESS write loses the CAS only to aCANCELLEDrow; non-success terminal writes lose to any already-terminal row. Every terminal-write caller inTaskExecutionService.execute_taskdiscarded that verdict and ran "won-only side effects" unconditionally.Worst case (the SUCCESS path): a turn the operator already cancelled was reported back as a billable SUCCESS — the activity was completed as COMPLETED, the dispatch breaker was reset, and the caller received SUCCESS. The DB-layer CAS itself is already correct and tested (
tests/unit/test_cancelled_not_overwritten.py, #671); this was purely the service layer ignoring the verdict.What changed
Every terminal writer now consumes the CAS verdict — lost CAS ⇒ reconcile/skip, never act:
RECONCILEDerror code instead of SUCCESS. The won path is unchanged._write_terminal_and_gate(...), which writes the terminal through the CAS and completes the activity only when this writer won. This replaces the old check-then-act guard (get_execution() → status != CANCELLED), which both raced (TOCTOU) and only blocked CANCELLED — the CAS additionally blocks an already-SUCCESS/FAILED row.wonverdict, so a turn superseded by a cancel can't count toward the breaker.finallyslot-release (must always run, never gated), the no-side-effect FAILED sites (CapacityFull, CircuitOpen), and the already-correctCancelledError3-state guard.Testing
This was implemented test-first and verified at three levels.
1. TDD red → green (unit, backend-free)
tests/unit/test_terminal_write_cas_gate.py(note: intests/unit/, not the roottest_cb_probe_execution_close.py— the root file's autousecleanup_after_testfixture needs a live backend, so it can't yield a local RED;tests/unit/overrides those fixtures and runs backend-free, next to the DB-layer halftest_cancelled_not_overwritten.py).2. Full unit suite (no regression)
pytest tests/unit/→ 2135 passed, 24 skipped (deterministic order). ruff clean on both edited files; mypy adds zero new errors (the pre-existing errors are repo-wide loose typing onupdate_execution_status, unchanged by this PR).3.
verify-local— real prod images + isolated sibling stack (PASS, all 5 stages):8100/healthgreenstatus: pass; sibling stack auto-torn-down.Scope / not in this PR
The broad workspace-transaction subsystem stays NO-GO per the doc's §0 gate (re-runnable production-evidence query). Deferred, GO-gated: container-local coordinator,
~/.trinity/post-checkvalidator,tx_state+ restart reconciliation, the §11 crash-injection matrix, the worktree/concurrent track (#27), and external-effect rollback (#1084).Refs #1095
🤖 Generated with Claude Code