fix(agent-core, kimi-code): handle stale closed sessions during resume#1110
fix(agent-core, kimi-code): handle stale closed sessions during resume#1110qyz7438 wants to merge 3 commits into
Conversation
When a session is closed while initialization is still running (for example, because an MCP server fails to start), the session object can remain in CoreRPCImpl.sessions even though the underlying directory exists on disk. Resuming that session later then fails with [session.not_found] because requireSession rejects the closed object. Changes: - Track Session.closed state and guard close() against double close. - In resumeSessionWithOverrides, drop stale closed session objects so a fresh session can be reconstructed from persisted state. - In requireSession, return a clear SESSION_NOT_FOUND error when the requested session has been closed. - In the TUI, catch SESSION_NOT_FOUND when applying startup modes to a resumed session and show an actionable message instead of crashing. Fixes symptom reported in MoonshotAI/kimi-code where resuming a session after an MCP initialization failure produced [session.not_found].
🦋 Changeset detectedLatest commit: 5afd3dc The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fb517a24f6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| `(for example, an MCP server failed to start). ` + | ||
| `Try running the command again, or start a fresh session.`, | ||
| ); | ||
| return; |
There was a problem hiding this comment.
Abort after the resumed session disappears
When setPermission, getStatus, or setPlanMode reports SESSION_NOT_FOUND, this handler only displays a message and then returns to its callers. The startup path still proceeds to setSession(session) and syncRuntimeState(session), so the same dead SDK session is installed and the next RPC fails again instead of cleanly aborting or starting over; the session-picker path likewise continues to hide the picker after a failed apply. This affects the exact stale/closed-session race this catch is trying to handle.
Useful? React with 👍 / 👎.
Address review feedback: when applyStartupModesToResumedSession detects SESSION_NOT_FOUND, it now returns false instead of swallowing the error. - Startup path throws so init() fails instead of installing a dead session. - Session-picker path leaves the picker open so the user can choose again.
Problem
When a session is closed while initialization is still running (for example, because an MCP server fails to start), the
Sessionobject can remain inCoreRPCImpl.sessionseven though the underlying session directory still exists on disk. Resuming that session later then fails with[session.not_found]becauserequireSessionrejects the closed object.This matches the symptom reported by users who see
[session.not_found]after an earlier initialization error (such as an MCP failure).Changes
packages/agent-core/src/session/index.ts: trackSession.closedstate and guardclose()/closeForReload()against double close.packages/agent-core/src/rpc/core-impl.ts:resumeSessionWithOverrides, drop stale closed session objects so a fresh session can be reconstructed from persisted state.requireSession, return a clearSESSION_NOT_FOUNDerror when the requested session has been closed.apps/kimi-code/src/tui/kimi-tui.ts: catchSESSION_NOT_FOUNDwhen applying startup modes to a resumed session and show an actionable message instead of crashing.Verification
pnpm typecheckpasses for@moonshot-ai/agent-coreand@moonshot-ai/kimi-code.packages/agent-core/testrun; failures observed are unrelated to this change (skill scanner / provider mocks / environment-specific path issues).Related
Fixes the
[session.not_found]resume failure path described in user reports.