Skip to content

fix: 100% CPU usage on broken stdio#191

Open
latekvo wants to merge 4 commits intomainfrom
fix/mcp-uncaught-exception-loop
Open

fix: 100% CPU usage on broken stdio#191
latekvo wants to merge 4 commits intomainfrom
fix/mcp-uncaught-exception-loop

Conversation

@latekvo
Copy link
Copy Markdown
Member

@latekvo latekvo commented May 6, 2026

One of our users reported occasional 100% CPU usage stale MCP server process.

I wasn't able to reproduce this bug directly in Codex. We only have non-codex repros that are able to consistantly reproduce a bug that looks like the one reported to us.

The bug in question is: Our error handler causes errors which throws it into an infinite loop.

image

More details:

When argent mcp runs as an orphaned process whose stderr pipe is broken (e.g. parent died), every stderr.write from the uncaughtException handler emits an async 'error' event. Without an 'error' listener that becomes another uncaughtException, runs the handler again, and so on forever.

Move the handlers into fatal-handlers.ts and add three guards:

  • 'error' listeners on stdout/stderr exit cleanly when stdio is broken, before the failure becomes another uncaughtException
  • try/catch around stderr.write so a sync write failure exits instead of escaping
  • try/catch around the formatter so a throwing .stack getter or toString (the production trace pointed at defaultPrepareStackTrace) can't take down the handler

Idempotent: a second installFatalHandlers call is a no-op.

When `argent mcp` runs as an orphaned process whose stderr pipe is
broken (e.g. parent died), every `stderr.write` from the
uncaughtException handler emits an async 'error' event. Without an
'error' listener that becomes another uncaughtException, runs the
handler again, and so on forever.

Move the handlers into `fatal-handlers.ts` and add three guards:
- 'error' listeners on stdout/stderr exit cleanly when stdio is broken,
  before the failure becomes another uncaughtException
- try/catch around `stderr.write` so a sync write failure exits instead
  of escaping
- try/catch around the formatter so a throwing `.stack` getter or
  `toString` (the production trace pointed at defaultPrepareStackTrace)
  can't take down the handler

Idempotent: a second `installFatalHandlers` call is a no-op.
@latekvo latekvo changed the title fix: stop argent mcp from looping at 100% CPU on broken stdio fix: 100% CPU usage on broken stdio May 6, 2026
`fatal-handlers.test.ts` imports the built dist/fatal-handlers.js so it
exercises the same artifact that ships, but `npm test` would fail with
ERR_MODULE_NOT_FOUND if the dispatcher hadn't been built first. CI is
fine — it runs `tsc --build` at the workspace level — but local devs
running `npm test` cold hit a confusing error.

Add `pretest`/`pretest:watch` hooks that run `tsc` (incremental,
non-destructive — unlike `build:dispatcher` which `rm -rf dist` would
nuke any bundle outputs the dev had built).
@latekvo latekvo marked this pull request as ready for review May 6, 2026 19:46
@latekvo latekvo marked this pull request as draft May 6, 2026 19:46
latekvo added 2 commits May 7, 2026 14:12
Drop the pretest/pretest:watch hooks added in the previous commit.
Instead of importing from dist/fatal-handlers.js (which required a
prior build), the test now uses esbuild — already a devDependency — to
transform src/fatal-handlers.ts into a tmp .mjs once in beforeAll, and
spawned children import from there.

`npm test` now works cold without touching dist/, no build dependency
on the test, no script-hook artifacts.
@latekvo latekvo marked this pull request as ready for review May 8, 2026 14:21
@latekvo latekvo requested a review from filip131311 May 8, 2026 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant