Skip to content

fix: wait for daemon readiness after auto-start#263

Open
peterj wants to merge 2 commits intomainfrom
peterj/fixdaemonreadiness
Open

fix: wait for daemon readiness after auto-start#263
peterj wants to merge 2 commits intomainfrom
peterj/fixdaemonreadiness

Conversation

@peterj
Copy link
Contributor

@peterj peterj commented Mar 5, 2026

Description

Fixes #48arctl list -A (and other commands) failed with "failed to reach API after 3 attempts" when the daemon was auto-started because the API server wasn't fully ready when the client tried to connect.

Change Type

/kind fix

Changelog

wait for daemon readiness after auto-start

Root cause

After docker compose up -d --wait returns, the container is "healthy" per Docker's health check, but the API server inside may still be initializing. The previous retry logic (3 attempts, 1s/2s/3s linear backoff = 6 seconds total) was insufficient for slow networks or cold starts.

Changes

  • pkg/types/types.go: Added WaitForReady() to DaemonManager interface
  • pkg/daemon/daemon.go: Implemented WaitForReady() — polls /v0/ping with exponential backoff (500ms → 4s) for up to 30 seconds
  • pkg/cli/root.go: Calls dm.WaitForReady() after dm.Start() so the API is confirmed responsive before creating the client
  • internal/client/client.go: Increased pingWithRetry from 3 to 5 attempts with exponential backoff as a secondary safety net

Tests

  • 4 new daemon tests (pkg/daemon/daemon_test.go): interface compliance, already-ready, becomes-ready-after-delay, isServerResponding
  • 3 new client tests (internal/client/client_test.go): immediate success, success after failures, all fail
  • All existing tests pass: go test ./... clean

Test plan

🤖 Generated with Claude Code

Co-Authored-By: Joel Klabo <joel.klabo@microsoft.com>
Copilot AI review requested due to automatic review settings March 5, 2026 18:22
Signed-off-by: Peter Jausovec <peter.jausovec@solo.io>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Comment on lines +155 to +156
// WaitForReady polls the daemon API until it responds or the timeout expires
WaitForReady(baseURL string) error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have IsRunning & Start methods defined for this interface. Wondering whether it makes sense to deflate this method into Start() since we already call --wait on docker compose. The one benefit is being able to call Start and then WaitForReady by consumers of this interface but the value seems low.

Overall, fine with extending this for now. I think there's some other cruft in this interface that's worth addressing first.

Comment on lines +264 to +272
if !isDockerComposeAvailable() {
fmt.Println("Docker compose is not available. Please install docker compose and try again.")
fmt.Println("See https://docs.docker.com/compose/install/ for installation instructions.")
fmt.Println("agent registry uses docker compose to start the server and the agent gateway.")
return fmt.Errorf("docker compose is not available")
}
if dm.IsRunning() {
return nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this logic be flipped? We check if dm.IsRunning() first before validating whether docker compose is available in the env?

Comment on lines +52 to +53
// Package-level var so tests can stub out the real docker-compose check.
isDockerComposeAvailable = utils.IsDockerComposeAvailable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for now - I think this indicates a clear smell with the daemon manager component though. The interface should abstract whether docker compose is available, or we define something like a no-op daemon manager and call the right constructor in the CLI run logic, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When I ran arctl list -A without the daemon running the command didn't wait for the daemon to be fully ready before returning

3 participants