Skip to content

Fix external MCP bridge timeouts with keepalive heartbeats#399

Open
justrach wants to merge 1 commit intomainfrom
fix/396-397-mcp-bridge-timeout-keepalive
Open

Fix external MCP bridge timeouts with keepalive heartbeats#399
justrach wants to merge 1 commit intomainfrom
fix/396-397-mcp-bridge-timeout-keepalive

Conversation

@justrach
Copy link
Copy Markdown
Owner

@justrach justrach commented Apr 4, 2026

Summary

  • External MCP bridges (Claude Desktop, etc.) impose their own timeout (~60s) on tool calls. Long-running agent tools (run_explorer, run_task, run_reviewer, run_swarm) produce no stdout during execution, so the bridge kills the connection with MCP error -32001 even though the agent is still working fine.
  • Adds periodic notifications/message heartbeats every 15s during agent execution. No timeouts imposed — agents run as long as they need.
  • Creates src/notify.zig (thread-safe MCP notification sender), modifies runChainStep to run dispatch on a worker thread with heartbeat loop alongside it.

Closes #396, closes #397

Changes

  • src/notify.zig (new): Thread-safe notifications/message sender with mutex-protected stdout writes. Supports both line-delimited and Content-Length framing.
  • src/tools.zig: runChainStep now spawns agent dispatch on a worker thread and loops on timedWait(15s) sending heartbeat notifications until the agent completes.
  • src/main.zig: Wire notify.init(g_use_headers) in the message loop so the notifier picks up the framing mode.

Test plan

  • zig build — clean
  • zig build test — all pass (including new notify: init sets ready flag test)
  • Manual: invoke run_explorer via external MCP bridge, verify heartbeat messages appear every 15s and no -32001 timeout
  • Manual: invoke run_task with a long-running preset, confirm it completes without bridge timeout

…397)

External MCP bridges impose their own timeout (~60s). Long-running agent
tools (run_explorer, run_task, run_reviewer, run_swarm) produce no output
during execution, causing the bridge to kill the connection with
MCP error -32001 even though the agent is still working.

Fix: run agent dispatch on a worker thread and send periodic
notifications/message heartbeats every 15s from the calling thread.
No timeouts are imposed — agents run as long as they need.

- Add src/notify.zig: thread-safe MCP notification sender
- Modify runChainStep to spawn heartbeat alongside agent dispatch
- Wire notify.init() in main loop to pick up framing mode

Closes #396, closes #397

Generated with AI

Co-Authored-By: AI <ai@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant