Skip to content

feat(ext): add /api/metrics fallback and single-flight startup guard#67

Merged
Killea merged 1 commit intoKillea:mainfrom
bertheto:feat/UP-37-startup-probe-metrics-fallback
Mar 23, 2026
Merged

feat(ext): add /api/metrics fallback and single-flight startup guard#67
Killea merged 1 commit intoKillea:mainfrom
bertheto:feat/UP-37-startup-probe-metrics-fallback

Conversation

@bertheto
Copy link
Contributor

Summary

  • /api/metrics fallback probe: When /health is unavailable during startup (slow backend, Uvicorn reloader delay, or legacy Python server), the extension now falls back to /api/metrics before declaring the server as down. This eliminates false-negative startup failures observed in practice (follow-up to fix(server): UP-33 - Default AGENTCHATBUS_RELOAD to 0 and pass it explicitly from extension #61).
  • Single-flight startup guard: ensureServerRunning() is now wrapped in a createSingleFlightRunner() to coalesce concurrent calls from VS Code activation and MCP provider resolution, preventing duplicate startup sequences.
  • Structured probe diagnostics: Probe failures now produce typed, descriptive messages (timeout duration, HTTP status, network error) logged during both initial probe and the wait-for-ready loop, improving startup troubleshooting.

Changes

File What
src/logic/busServerManager.ts New types (MetricsPayload, StartupProbeEndpoint, StartupProbeOutcome, StartupProbeResolution) + helpers (describeStartupProbeFailure, resolveStartupProbeResult, createSingleFlightRunner)
src/busServerManager.ts Generic probeJsonEndpoint, dual-endpoint probeServer(), single-flight wrapper, timeout 1s to 2s, failure logging
src/logic/testExports.ts 3 new re-exports
test/logicBusServerManager.test.js 4 new tests (probe failure description, probe resolution with /health preference, /metrics fallback, single-flight coalescing)

What is NOT changed

  • All workspace-dev-service logic remains intact and unmodified
  • No changes to workspaceDev.ts or workspace-dev launch specs
  • Existing tests continue to pass unchanged

Test plan

  • npm run compile passes (TypeScript clean)
  • node --test test/logicBusServerManager.test.js: 12/12 pass (7 existing + 4 new + 1 async)
  • node --test test/logicWorkspaceDev.test.js: 2/2 pass (no regression)
  • Manual verification: install the built .vsix and confirm startup succeeds when /health is slow but /api/metrics responds

The extension's probeServer() only checked /health with a 1s timeout.
When /health was slow or unavailable during startup, the extension
reported the server as down even though /api/metrics was responding.

Additionally, ensureServerRunning() had no concurrency guard, so
VS Code activation and MCP provider resolution could race and
trigger duplicate startup sequences.

Changes:
- Replace untyped probeServer() with generic probeJsonEndpoint<T>()
  supporting typed StartupProbeOutcome/StartupProbeResolution
- probeServer() now tries /health first, falls back to /api/metrics
- Add describeStartupProbeFailure() for structured probe diagnostics
- Add resolveStartupProbeResult() for multi-endpoint resolution
- Add createSingleFlightRunner() to coalesce concurrent calls
- Increase probe timeout from 1000ms to 2000ms
- Log probe failure details in startup and wait-for-ready loops

All workspace-dev logic remains intact and unmodified.
@Killea Killea merged commit df13c97 into Killea:main Mar 23, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants