fix(hub): surface silent agent auth failures (sessions not populating)#69
Merged
Conversation
Two silent failure paths in handleAgentAuth made it impossible to diagnose "new sessions not populating in the UI" from Coolify logs: 1. Invalid api_key: verifyApiKey returns null, hub sends auth_error frame and closes 4001 — but logs NOTHING. The user's only signal is a bare "[agent] connection opened" with no follow-up, indistinguishable from a network blip. Now logs `[agent] auth fail reason=invalid_api_key hash=<8hex>... host=<hostname>` matching the supervisor disambiguation pattern from PR #50. 2. Missing project_dir AND rootless_sessions: Phase 05 made project_dir optional in the AgentAuth zod schema (rootless-only agents), but the subsequent `msg.project_dir.replace(...)` would throw at runtime, again with no diagnostic. Explicit reject with reason=no_project_or_rootless. Adds hub/test/agent-auth-logging.test.ts (REMO_E2E_DB_URL gated) covering both new log lines + verifying the schema-reject path (already logged) is intact. Root cause of the user-visible bug: agents reconnecting with stale/wrong API keys after recent dashboard changes were silently rejected. The fix itself is purely diagnostic — once deployed, the operator can see exactly why an "[agent] connection opened" didn't produce "[agent] authenticated" and act (rotate key, point agent at correct hub, etc.).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Root cause of "new sessions not populating in UI": agents reconnecting with stale/invalid API keys were silently rejected by the hub. The auth-fail path on the agent role didn't log anything — operator's only signal was a bare `[agent] connection opened` with no follow-up `[agent] authenticated`.
This PR is purely diagnostic + a defensive guard:
No behavioral change for legitimate agents — only adds visibility for the failure modes that were silent.
Why
User reported the UI not populating new `claude-remote` sessions. Coolify logs showed multiple `[agent] connection opened` lines but only one `[agent] authenticated`. Without a server-side log on the auth-fail path, it was impossible to tell whether the silent connections were schema-rejected, key-rejected, or network-flapping. Now distinguishable at a glance.
Test plan
🤖 Generated with Claude Code