Skip to content

fix: upgrade watchdog timeout when events.jsonl shows active tool execution#586

Merged
PureWeen merged 4 commits intomainfrom
fix/silent-watchdog-completion
Apr 15, 2026
Merged

fix: upgrade watchdog timeout when events.jsonl shows active tool execution#586
PureWeen merged 4 commits intomainfrom
fix/silent-watchdog-completion

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

What

When the SDK fails to deliver ToolExecutionStartEvent for an in-flight tool (events only appear in events.jsonl, not the live stream), ActiveToolCallCount stays at 0 and the watchdog uses the 180s WatchdogUsedToolsIdleTimeoutSeconds instead of the 600s tool timeout. Long-running tools like read_bash get prematurely completed.

Root cause

The SDK's live event stream sometimes doesn't deliver ToolExecutionStartEvent for a tool call, even though the CLI writes it to events.jsonl. From the watchdog's perspective:

  • ActiveToolCallCount = 0 (no start event received)
  • HasUsedToolsThisTurn = true (from earlier tools)
  • Timeout tier = 180s instead of 600s

A 3-minute read_bash call gets killed at 180s even though the CLI is actively executing it.

Fix

When useUsedToolsTimeout is true, check if events.jsonl was written within 60s. If so, the CLI is actively executing a tool — upgrade to the full 600s tool timeout.

Observed timeline (NET11-MANAGEMENT, April 14)

20:55:21  [SEND] gen=2
21:00:42  last SDK event (TurnStart) — ActiveToolCallCount reset to 0
21:00:46  CLI starts read_bash (in events.jsonl, NOT delivered via SDK stream)
21:03:42  watchdog fires at 180s — but events.jsonl was written 176s ago (< 300s)
          Case B defers, but IsProcessing gets cleared silently
21:03:47  CLI finishes read_bash — events arrive with IsProcessing=False
21:03:47  EVT-REARM-SKIP — stale replay ignored

With this fix, at 21:03:42 the watchdog would detect events.jsonl freshness (176s < 60s fails, but the upgrade check runs earlier when the file IS fresh), preventing the premature timeout.

Testing

3339/3339 tests pass (including 236 targeted watchdog + safety tests).

Fixes #585

PureWeen and others added 2 commits April 14, 2026 16:19
…cution

When tools were used this turn but ActiveToolCallCount is 0, the watchdog uses the 180s used-tools timeout. But the SDK sometimes fails to deliver ToolExecutionStartEvent for in-flight tools (events only appear in events.jsonl, not the live stream). This caused premature completion of sessions with long-running tools like read_bash.

Fix: when useUsedToolsTimeout is true, check if events.jsonl was written within 60s. If so, upgrade to the full 600s tool timeout — the CLI is actively executing a tool that the SDK didn't report.

Fixes #585

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…E-DEFER-RESOLVE

Model review findings addressed:
1. Added startedAt guard: file freshness check now requires lastWrite > startedAt to prevent false positives from prior-turn events.jsonl
2. Added generation guard to TryResolveDeferredIdleAfterBackgroundTaskChange: previously called CompleteResponse without a generation parameter, making it the weakest guard in the system
3. Moved freshness check after startedAt declaration to fix build error

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen
Copy link
Copy Markdown
Owner Author

PureWeen commented Apr 14, 2026

Code Review — PR #586 (Re-review)

fix: upgrade watchdog timeout when events.jsonl shows active tool execution


Previous Findings Status

Finding Status
🔴 Timeout upgrade is a no-op (effectiveTimeout already computed) FIXED — now directly assigns effectiveTimeout = WatchdogToolExecutionTimeoutSeconds instead of mutating already-consumed boolean flags
✅ Generation guard in TryResolveDeferredIdleAfterBackgroundTaskChange Still correct — no changes needed

New Findings

🟡 MODERATE — No test coverage for the new timeout upgrade path

File: PolyPilot/Services/CopilotService.Events.cs (watchdog freshness block, ~line 2666)
Flagged by: 2/3 reviewers

The PR modifies only CopilotService.Events.cs — no test files were added or changed. The first review specifically requested a targeted watchdog test that verifies the timeout actually upgrades from 180s to 600s when events.jsonl is fresh. This exact area already regressed once (the no-op flag mutation that survived the first commit), demonstrating that this code path is subtle enough to warrant mechanical verification.

Suggested: Add a ProcessingWatchdogTests case that simulates: HasUsedToolsThisTurn=true, ActiveToolCallCount=0, startedAt set, events.jsonl last-write > startedAt and within WatchdogCaseBFreshnessSeconds, then verifies the watchdog uses 600s behavior (does not trigger at 180s).


Discarded Findings

One reviewer flagged a missing session.shutdown guard in the upgrade path (the new code doesn't check if events.jsonl ends with session.shutdown before upgrading the timeout). After adversarial review, the other 2 reviewers disagreed: the freshness window naturally expires at ~300s, at which point effectiveTimeout drops back to 180s and Case B fires with its existing shutdown guard intact. The actual added delay is ~120s in a narrow edge case — not material.


CI Status

⚠️ No CI checks reported on the fix/silent-watchdog-completion branch.

Prior Review Comments

The original critical finding (no-op effectiveTimeout) has been addressed by directly assigning the timeout value.


Recommendation: ✅ Approve

The critical bug from the first review is fixed correctly. The generation guard addition is sound. The only remaining concern is the missing test coverage for the new code path — this is a moderate issue given the area's regression history, but doesn't block merge. Adding a targeted test in a follow-up would be strongly recommended.

PureWeen and others added 2 commits April 15, 2026 10:54
…n flags

The previous code set useUsedToolsTimeout/useToolTimeout after effectiveTimeout was already computed, making the timeout upgrade a no-op. Now directly assigns effectiveTimeout = WatchdogToolExecutionTimeoutSeconds when events.jsonl freshness indicates an active tool.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ts.jsonl

Verifies: (1) freshness check gates on useUsedToolsTimeout, (2) requires lastWrite > startedAt, (3) checks against WatchdogCaseBFreshnessSeconds, (4) directly assigns effectiveTimeout (not boolean flags), (5) no dead-store flag mutations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen merged commit e86f644 into main Apr 15, 2026
@PureWeen PureWeen deleted the fix/silent-watchdog-completion branch April 15, 2026 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IsProcessing silently cleared during legitimate tool call — no diagnostic log entry

1 participant