Skip to content

fix(hostrunner): bound teardown waits so a wedged child can't hang stop (#77.2/#77.3)#298

Merged
physercoe merged 1 commit into
mainfrom
fix/77-harden-hostrunner-teardown
Jun 27, 2026
Merged

fix(hostrunner): bound teardown waits so a wedged child can't hang stop (#77.2/#77.3)#298
physercoe merged 1 commit into
mainfrom
fix/77-harden-hostrunner-teardown

Conversation

@physercoe

Copy link
Copy Markdown
Owner

Closes the two LOW-severity residuals of #77 left after the #77.1 data-race fix (PR #276, agentsMu). Neither can hang a production path today, but both are unbounded waits that a nil/ineffective Closer or a ctx-ignoring driver.Input would turn into a permanently stuck host-runner stop.

#77.2 — input-router fire-and-forget dispatch

tick() spawned a per-event goroutine with no tracking; Detach/StopAll waited on loop.done (the run goroutine) but not the in-flight driver.Input goroutines, so a dispatch could outlive teardown.

Track them in a per-loop WaitGroup and drain it in run() before close(done). The drain is bounded (inputDispatchDrainTimeout): a dispatch parked in a ctx-ignoring Input (StdioDriver.Input keeps its stdin Write un-preempted on purpose) only unblocks when stopDriver calls the driver's Stop() after Detach — so an unbounded wait there would deadlock. The straggler is reaped by that Stop(); abandoning the bounded wait leaks nothing.

#77.3 — readLoop has no ctx.Done()

StdioDriver/ACPDriver readLoops unwind only on pipe EOF via Closer(); Stop() called wg.Wait() unconditionally, so a nil Closer would hang Stop forever. Bound both with driverStopDrainTimeout + a warn.

Changes

  • new shared waitTimeout helper (lifecycle.go)
  • input_router.go: per-loop dispatch WaitGroup, bounded drain in run()
  • driver_stdio.go / driver_acp.go: bounded Stop() wait + warn

Tests (all -race clean; full hostrunner package -race green)

  • DetachBoundedByDispatchDrain — a wedged ctx-ignoring dispatch doesn't hang Detach, and is reaped once unblocked.
  • DetachWaitsForFastDispatchDetach waits for an in-flight dispatch to finish (verified to FAIL without the drain).
  • StdioDriverStopBoundedWithoutCloserStop returns bounded with a nil Closer and a never-closing pipe.

Resolves the remaining sub-items of #77 (the HIGH-severity #77.1 race already shipped in #276).

🤖 Generated with Claude Code

…op (#77.2/#77.3)

Closes the two LOW-severity residuals left after the #77.1 data-race fix
(PR #276). Neither could hang a production path today, but both are unbounded
waits that a nil/ineffective Closer or a ctx-ignoring driver.Input would turn
into a permanent stuck host-runner stop.

#77.2 — input-router fire-and-forget dispatch. tick() spawned a per-event
goroutine with no tracking; Detach/StopAll waited on loop.done (the run
goroutine) but not the in-flight driver.Input goroutines, so a dispatch could
outlive teardown. Track them in a per-loop WaitGroup and drain it in run()
before close(done). The drain is BOUNDED (inputDispatchDrainTimeout): a dispatch
parked in a ctx-ignoring Input (StdioDriver.Input keeps its stdin Write
un-preempted) only unblocks when stopDriver calls the driver's Stop() right
AFTER Detach — so an unbounded wait there would deadlock. The straggler is
reaped by that Stop(); abandoning the bounded wait leaks nothing.

#77.3 — readLoop has no ctx.Done(). StdioDriver/ACPDriver readLoops unwind only
on pipe EOF via Closer(); Stop() called wg.Wait() unconditionally, so a nil
Closer would hang Stop forever. Bound both with driverStopDrainTimeout + a warn.

New shared waitTimeout helper (lifecycle.go). Tests (all -race clean):
- DetachBoundedByDispatchDrain: a wedged ctx-ignoring dispatch doesn't hang
  Detach, and is reaped once unblocked.
- DetachWaitsForFastDispatch: Detach waits for an in-flight dispatch to finish
  (verified to FAIL without the drain).
- StdioDriverStopBoundedWithoutCloser: Stop returns bounded with a nil Closer
  and a never-closing pipe.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@physercoe physercoe merged commit f4a3fda into main Jun 27, 2026
4 checks passed
@physercoe physercoe deleted the fix/77-harden-hostrunner-teardown branch June 27, 2026 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant