Skip to content

fix(executor): consume the terminal-write CAS verdict (#671/H4) + #1095 research doc#1195

Merged
vybe merged 3 commits into
devfrom
AndriiPasternak31/plan-issue-1095
Jun 16, 2026
Merged

fix(executor): consume the terminal-write CAS verdict (#671/H4) + #1095 research doc#1195
vybe merged 3 commits into
devfrom
AndriiPasternak31/plan-issue-1095

Conversation

@AndriiPasternak31

Copy link
Copy Markdown
Contributor

Summary

Issue #1095 ("transactional agent executions — discard workspace changes unless validated as success") is research-gated. This PR delivers two things:

  1. The research/design deliverabledocs/planning/TRANSACTIONAL_EXECUTIONS_2026-06.md. Its §0 evidence gate returns a NO-GO for the broad transactional subsystem on today's data; the full mechanism (container-local coordinator, ~/.trinity/post-check, tx_state reconciliation, crash-injection suite) is an explicitly deferred, GO-gated follow-up.
  2. A proven artifact the research surfaced — the H4 CAS fix (doc §6.3 / §13 step 1), which the doc says to "ship regardless of GO… its own small PR."

So this is "research + the reliability fix it surfaced," not research alone.

The bug (H4)

db.update_execution_status(...) (src/backend/db/schedules.py) is an atomic compare-and-set that returns the CAS winner (bool). A SUCCESS write loses the CAS only to a CANCELLED row; non-success terminal writes lose to any already-terminal row. Every terminal-write caller in TaskExecutionService.execute_task discarded that verdict and ran "won-only side effects" unconditionally.

Worst case (the SUCCESS path): a turn the operator already cancelled was reported back as a billable SUCCESS — the activity was completed as COMPLETED, the dispatch breaker was reset, and the caller received SUCCESS. The DB-layer CAS itself is already correct and tested (tests/unit/test_cancelled_not_overwritten.py, #671); this was purely the service layer ignoring the verdict.

What changed

Every terminal writer now consumes the CAS verdict — lost CAS ⇒ reconcile/skip, never act:

  • SUCCESS path (inline): on a lost CAS it re-reads the row, completes the activity as FAILED (not COMPLETED), skips the breaker reset, and returns the persisted terminal status with a new RECONCILED error code instead of SUCCESS. The won path is unchanged.
  • 5 FAILED paths (CB fast-fail, timeout, backend call-budget, HTTPError salvage, generic Exception) converge onto one helper _write_terminal_and_gate(...), which writes the terminal through the CAS and completes the activity only when this writer won. This replaces the old check-then-act guard (get_execution() → status != CANCELLED), which both raced (TOCTOU) and only blocked CANCELLED — the CAS additionally blocks an already-SUCCESS/FAILED row.
  • The HTTPError path's AUTH dispatch-breaker outcome is now gated on the same won verdict, so a turn superseded by a cancel can't count toward the breaker.
  • Untouched: the finally slot-release (must always run, never gated), the no-side-effect FAILED sites (CapacityFull, CircuitOpen), and the already-correct CancelledError 3-state guard.

Testing

This was implemented test-first and verified at three levels.

1. TDD red → green (unit, backend-free)

  • New tests/unit/test_terminal_write_cas_gate.py (note: in tests/unit/, not the root test_cb_probe_execution_close.py — the root file's autouse cleanup_after_test fixture needs a live backend, so it can't yield a local RED; tests/unit/ overrides those fixtures and runs backend-free, next to the DB-layer half test_cancelled_not_overwritten.py).
  • Wrote the lost-CAS test first → RED (today's code reports SUCCESS + fires side effects); the won-path test passed (harness reaches the real terminal). Applied the fix → GREEN. The file covers both the SUCCESS reconcile and the FAILED-path gating, plus won-path regression pins so the fix can't over-gate the happy path. 4 passed.

2. Full unit suite (no regression)

  • pytest tests/unit/2135 passed, 24 skipped (deterministic order). ruff clean on both edited files; mypy adds zero new errors (the pre-existing errors are repo-wide loose typing on update_execution_status, unchanged by this PR).

3. verify-local — real prod images + isolated sibling stack (PASS, all 5 stages)

Stage Result What it proves
preflight OK (1s) environment
unit OK (168s) full unit suite inside the built prod image
build + import-smoke OK (8s) the new code reaches the image and imports (#1033 class)
boot + health OK (12s) backend boots with the change; :8100/health green
integration OK runtime against the live sibling stack

status: pass; sibling stack auto-torn-down.

Scope / not in this PR

The broad workspace-transaction subsystem stays NO-GO per the doc's §0 gate (re-runnable production-evidence query). Deferred, GO-gated: container-local coordinator, ~/.trinity/post-check validator, tx_state + restart reconciliation, the §11 crash-injection matrix, the worktree/concurrent track (#27), and external-effect rollback (#1084).

Refs #1095

🤖 Generated with Claude Code

AndriiPasternak31 and others added 3 commits June 13, 2026 00:27
Records the research deliverable for #1095 (research-gated): mechanism
selection (transaction-owned git ref, serialized+git v1), the §0 evidence
GO/NO-GO gate with a reachable NO-GO verdict on today's sample, the §9
hazard register, the §11 crash-injection matrix, and the §13 reachable
implementation path. Implementation remains gated on a production-evidence
GO + human review; §13 step 1 (the H4 CAS fix) is the one piece shipped
independently of that gate.

Refs #1095

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
task_execution_service discarded the bool returned by
db.update_execution_status — an atomic compare-and-set. A SUCCESS write
loses the CAS only to a CANCELLED row, but the caller ran the won-only
side effects unconditionally: it completed the activity as COMPLETED,
reset the dispatch breaker, and returned a SUCCESS result. A turn the
operator already cancelled was therefore reported back as a billable
success.

Now the SUCCESS terminal consumes the CAS verdict: on a lost CAS it
re-reads the row, completes the activity as FAILED (not COMPLETED),
skips the breaker reset, and returns the persisted terminal status with
a new RECONCILED error code instead of SUCCESS. The won path is
unchanged. Lost CAS => reconcile, never act.

Adds tests/unit/test_terminal_write_cas_gate.py (lost-CAS reconcile +
won-path regression pin). The DB-layer CAS itself stays covered by
tests/unit/test_cancelled_not_overwritten.py.

Refs #1095

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Converges the five FAILED-terminal writers (CB fast-fail, timeout, backend
call-budget, HTTPError salvage, generic Exception) onto one helper,
_write_terminal_and_gate, which writes the terminal through
update_execution_status's CAS and completes the activity ONLY when this
writer won. Replaces the old check-then-act guard
(get_execution() -> status != CANCELLED), which both raced (TOCTOU) and
only blocked CANCELLED — the CAS additionally blocks an already
SUCCESS/FAILED row. The HTTPError path's AUTH dispatch-breaker outcome is
now gated on the same won verdict, so a turn superseded by a cancel can't
count toward the breaker. The FAILED TaskExecutionResult return stays
unconditional (only side effects gate). Sites with no side effects
(CapacityFull, CircuitOpen) and the CancelledError 3-state guard are left
as-is.

Adds the FAILED-path lost-CAS coverage to test_terminal_write_cas_gate.py.
Full tests/unit suite green (2135 passed).

Refs #1095

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

⚠️ Nightly unit-suite check skipped — merge conflict against dev.

Resolve by running git merge dev locally and pushing the result. The next nightly run will re-test once the conflict is gone.

@AndriiPasternak31 AndriiPasternak31 requested review from dolho and vybe and removed request for dolho June 13, 2026 23:54
@AndriiPasternak31 AndriiPasternak31 self-assigned this Jun 13, 2026
@AndriiPasternak31 AndriiPasternak31 requested a review from dolho June 13, 2026 23:54

@vybe vybe left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validated & approved (/validate-pr)

Research deliverable + the H4 CAS reliability fix it surfaced (Refs #1095 P1; the #671/H4 service-layer bug). 3 files, +851/-77 (488 of which are the planning doc).

The bug is real: db.update_execution_status is an atomic CAS returning the winner; every terminal writer in execute_task discarded it, so an operator-cancelled turn could be written back as a billable SUCCESS (activity COMPLETED, breaker reset, SUCCESS to caller).

Fix review:

  • _write_terminal_and_gate consolidates the 5 FAILED paths; completes the activity only on a won CAS; won=True when execution_id falsy preserves prior no-record behavior.
  • SUCCESS path consumes the verdict → re-reads row, completes activity as FAILED (not COMPLETED), skips breaker reset, returns new RECONCILED code + the persisted terminal status.
  • AUTH dispatch-breaker outcome gated on won (a superseded turn can't count toward the breaker).
  • Correctly replaces check-then-act (TOCTOU + CANCELLED-only) with the CAS (also blocks already-SUCCESS/FAILED). Untouched sites (slot-release finally, CapacityFull/CircuitOpen no-effect, CancelledError 3-state guard) correctly left alone.

Tests: test_terminal_write_cas_gate.py pins won/lost on both SUCCESS and FAILED terminals + happy-path over-gating regression; TDD red→green; full unit 2135 passed; verify-local 5/5 (incl. import-smoke #1033 class).

  • Base → dev ✓ · 3 conventional commits ✓ · CI all green
  • Security scan clean; no new backend modules / env vars / infra
  • The broad transactional subsystem correctly stays NO-GO behind the §0 evidence gate.

Minor (non-blocking): the new RECONCILED reconciliation could earn a one-line mention in architecture.md's task_execution_service notes, but as an internal correctness fix a descriptive commit suffices.

Merge after #1195's sibling #1203 is rebased — they share task_execution_service.py, so land this first (smaller diff).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants