fix(bundles): transition-aware logging for broken connectors (#194)#321
Open
Ovaculos wants to merge 2 commits into
Open
fix(bundles): transition-aware logging for broken connectors (#194)#321Ovaculos wants to merge 2 commits into
Ovaculos wants to merge 2 commits into
Conversation
Replace per-enumeration warn in ToolRegistry.availableTools with
edge-triggered warn in BundleLifecycleManager.transition. Operator
now gets one warn at the moment a connector enters dead, crashed,
or reauth_required, plus one info on recovery to running.
Sticky lastBrokenState breadcrumb on BundleInstance carries the
broken signal across multi-step recoveries — URL bundles go
reauth_required -> pending_auth -> running, and neither leg is a
direct broken->running edge.
recordConnectionStateChange now funnels through transition() so
URL bundles get the same edge logging path as stdio.
Registry skip path demoted to log.debug("registry", ...) — gated
behind NB_DEBUG=registry for trace-level diagnosis. Refs NimbleBrainInc#194.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire HealthMonitor's crash/recovery/dead detection into BundleLifecycleManager so stdio subprocess deaths transition BundleInstance.state through the same funnel as URL bundles. Without this the user-facing state stayed stuck at "running" when a subprocess died mid-session; Configure UI showed green and operators only saw the plain stderr line from workspace-runtime's boot-start catch. HealthMonitor exposes a single reportSourceTransition hook (crashed|running|dead). src/api/server.ts wires it to recordCrash/Recovery/Dead with a (serverName, wsId) lookup against runtime.getBundleInstances(). McpSource gains getWorkspaceId() returning bundleContext .workspaceId (already populated at construction). Runtime.mcpSources() drops the name-based dedupe — the same bundle in two workspaces produces two distinct McpSource processes, both of which need monitoring; collapsing them would leave one workspace's state pill permanently stale on a crash. HealthMonitor's local BundleState renamed HealthRecordState to stop shadowing src/bundles/types.ts BundleState. Closes NimbleBrainInc#194. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ToolRegistry.availableToolswith edge-triggered warns atBundleLifecycleManager.transition. One warn at the moment a connector entersdead/crashed/reauth_required; one info at recovery torunning.lastBrokenStatebreadcrumb so multi-step recoveries (notably URL bundlereauth_required → pending_auth → running) still fire the recovery info log.HealthMonitor→ lifecycle so stdio subprocess deaths propagate toBundleInstance.state— closes the gap where URL bundles transitioned correctly but stdio bundles stayed stuck atrunning.(serverName, wsId)lookup.McpSource.getWorkspaceId()sources from existingbundleContext.workspaceId(no constructor surface change).Closes #194.
Behavior change
deadBundleInstance.statestale atrunningcrashed/dead, UI state pill accuratereauth_required → pending_auth → running)NB_DEBUG=registryDesign notes
{ dead, crashed, reauth_required }. Excludespending_auth(in-flight OAuth, expected during normal Connect) andnot_authenticated(resting state on fresh install / post-Disconnect).transition()is the single state-write site.recordConnectionStateChangenow routes through it so URL and stdio bundles share one logging policy.BundleInstance.lastBrokenStatecarries the broken label across non-broken intermediates. Cleared on reachingrunning(after emitting the info log) and onstopped(explicit operator-end of episode).reportSourceTransition(source, "crashed"|"running"|"dead")callback. The (source → BundleInstance) lookup lives instartServer, not HealthMonitor.Test plan
test/unit/bundle-transition-logging.test.ts— 25 tests: edge semantics, sticky-bit multi-step recovery, broken→broken no-rewarn, operator-stop episode termination, idempotency on no-optest/unit/registry-tool-enumeration-resilience.test.ts— extended: 0 warns across 100 enumerations of broken source (proves registry no longer spams)test/integration/health-monitor-lifecycle.test.ts— 2 tests: full chainHealthMonitor.check → reportSourceTransition → recordCrash → transition → warn; same-serverName-cross-workspace disambiguationbun run verifygreen: 3162 unit + 644 integration + 24 smokeentered 'dead' from 'starting'warn; subsequent chats produce no further warnstokens.json→ trigger Notion tool call → expect singleentered 'reauth_required'warn; reconnect via UI → expect singlerecovered: 'reauth_required' → 'running'infoFiles changed
```
AGENTS.md
src/api/server.ts
src/bundles/connection.ts
src/bundles/lifecycle.ts
src/bundles/types.ts
src/cli/log.ts
src/runtime/runtime.ts
src/tools/health-monitor.ts
src/tools/mcp-source.ts
src/tools/registry.ts
test/integration/health-monitor-lifecycle.test.ts (new)
test/unit/bundle-transition-logging.test.ts (new)
test/unit/registry-tool-enumeration-resilience.test.ts
```