fix(hostrunner): bound teardown waits so a wedged child can't hang stop (#77.2/#77.3)#298
Merged
Merged
Conversation
…op (#77.2/#77.3) Closes the two LOW-severity residuals left after the #77.1 data-race fix (PR #276). Neither could hang a production path today, but both are unbounded waits that a nil/ineffective Closer or a ctx-ignoring driver.Input would turn into a permanent stuck host-runner stop. #77.2 — input-router fire-and-forget dispatch. tick() spawned a per-event goroutine with no tracking; Detach/StopAll waited on loop.done (the run goroutine) but not the in-flight driver.Input goroutines, so a dispatch could outlive teardown. Track them in a per-loop WaitGroup and drain it in run() before close(done). The drain is BOUNDED (inputDispatchDrainTimeout): a dispatch parked in a ctx-ignoring Input (StdioDriver.Input keeps its stdin Write un-preempted) only unblocks when stopDriver calls the driver's Stop() right AFTER Detach — so an unbounded wait there would deadlock. The straggler is reaped by that Stop(); abandoning the bounded wait leaks nothing. #77.3 — readLoop has no ctx.Done(). StdioDriver/ACPDriver readLoops unwind only on pipe EOF via Closer(); Stop() called wg.Wait() unconditionally, so a nil Closer would hang Stop forever. Bound both with driverStopDrainTimeout + a warn. New shared waitTimeout helper (lifecycle.go). Tests (all -race clean): - DetachBoundedByDispatchDrain: a wedged ctx-ignoring dispatch doesn't hang Detach, and is reaped once unblocked. - DetachWaitsForFastDispatch: Detach waits for an in-flight dispatch to finish (verified to FAIL without the drain). - StdioDriverStopBoundedWithoutCloser: Stop returns bounded with a nil Closer and a never-closing pipe. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the two LOW-severity residuals of #77 left after the #77.1 data-race fix (PR #276,
agentsMu). Neither can hang a production path today, but both are unbounded waits that a nil/ineffectiveCloseror a ctx-ignoringdriver.Inputwould turn into a permanently stuck host-runner stop.#77.2 — input-router fire-and-forget dispatch
tick()spawned a per-event goroutine with no tracking;Detach/StopAllwaited onloop.done(therungoroutine) but not the in-flightdriver.Inputgoroutines, so a dispatch could outlive teardown.Track them in a per-loop
WaitGroupand drain it inrun()beforeclose(done). The drain is bounded (inputDispatchDrainTimeout): a dispatch parked in a ctx-ignoringInput(StdioDriver.Inputkeeps its stdinWriteun-preempted on purpose) only unblocks whenstopDrivercalls the driver'sStop()afterDetach— so an unbounded wait there would deadlock. The straggler is reaped by thatStop(); abandoning the bounded wait leaks nothing.#77.3 — readLoop has no
ctx.Done()StdioDriver/ACPDriverreadLoops unwind only on pipe EOF viaCloser();Stop()calledwg.Wait()unconditionally, so a nilCloserwould hangStopforever. Bound both withdriverStopDrainTimeout+ a warn.Changes
waitTimeouthelper (lifecycle.go)input_router.go: per-loop dispatchWaitGroup, bounded drain inrun()driver_stdio.go/driver_acp.go: boundedStop()wait + warnTests (all
-raceclean; full hostrunner package-racegreen)Detach, and is reaped once unblocked.Detachwaits for an in-flight dispatch to finish (verified to FAIL without the drain).Stopreturns bounded with a nilCloserand a never-closing pipe.Resolves the remaining sub-items of #77 (the HIGH-severity #77.1 race already shipped in #276).
🤖 Generated with Claude Code