Skip to content

fix(codex): add panic recovery in readLoop to prevent process crash#332

Open
LigphiDonk wants to merge 1 commit intochenhg5:mainfrom
LigphiDonk:fix/codex-readloop-panic-recovery
Open

fix(codex): add panic recovery in readLoop to prevent process crash#332
LigphiDonk wants to merge 1 commit intochenhg5:mainfrom
LigphiDonk:fix/codex-readloop-panic-recovery

Conversation

@LigphiDonk
Copy link
Copy Markdown

When handleItemCompleted encounters an unexpected event structure from the Codex CLI (e.g., a new item type or malformed JSON fields), it can panic. Since readLoop runs in a goroutine with no recover(), this crashes the entire cc-connect process.

Add a deferred recover() at the top of readLoop that:

  • Catches any panic from handleEvent and its callees
  • Logs the panic value and full stack trace via slog.Error
  • Emits an EventError to the events channel so the caller is notified
  • Allows the process to continue serving other projects/platforms

Also add TestHandleEvent_RecoverFromPanic to verify that malformed item.completed events (nil item, unknown types, unexpected nested structures) do not cause panics.

When handleItemCompleted encounters an unexpected event structure from
the Codex CLI (e.g., a new item type or malformed JSON fields), it can
panic. Since readLoop runs in a goroutine with no recover(), this
crashes the entire cc-connect process.

Add a deferred recover() at the top of readLoop that:
- Catches any panic from handleEvent and its callees
- Logs the panic value and full stack trace via slog.Error
- Emits an EventError to the events channel so the caller is notified
- Allows the process to continue serving other projects/platforms

Also add TestHandleEvent_RecoverFromPanic to verify that malformed
item.completed events (nil item, unknown types, unexpected nested
structures) do not cause panics.
@chenhg5
Copy link
Copy Markdown
Owner

chenhg5 commented Mar 27, 2026

Thanks for this — panic recovery in readLoop is a solid hardening change (log + stack + EventError so the engine can fail gracefully instead of losing the goroutine).

Merge status: GitHub reports the PR as conflicting with main (mergeable: CONFLICTING). Could you rebase onto latest main or merge main into your branch and resolve the conflicts? Once it is clean, this should be straightforward to merge.

Quick notes (non-blocking):

  • Consider registering the recover defer after the cmd.Wait defer so a panic is handled before blocking on Wait (optional ordering tweak).
  • TestHandleEvent_RecoverFromPanic mainly asserts handleEvent does not panic on malformed payloads; the new recovery path lives in readLoop — renaming or adding a test that exercises the readLoop path would match the commit message a bit better.

Appreciate the contribution.

Copy link
Copy Markdown
Owner

@chenhg5 chenhg5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Good defensive programming with panic recovery.

Review notes:

  • ✅ CI passes
  • ✅ Correct approach: recover() in readLoop prevents process crash
  • ✅ Error properly propagated via Event channel
  • ✅ Test coverage for malformed events
  • ✅ Stack trace logged for debugging

This prevents cascading failures when codex CLI produces unexpected output. Approved for merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants