Skip to content

fix(hub): surface silent agent auth failures (sessions not populating)#69

Merged
finedesignz merged 1 commit into
mainfrom
fix/sessions-not-populating
May 26, 2026
Merged

fix(hub): surface silent agent auth failures (sessions not populating)#69
finedesignz merged 1 commit into
mainfrom
fix/sessions-not-populating

Conversation

@finedesignz
Copy link
Copy Markdown
Owner

Summary

Root cause of "new sessions not populating in UI": agents reconnecting with stale/invalid API keys were silently rejected by the hub. The auth-fail path on the agent role didn't log anything — operator's only signal was a bare `[agent] connection opened` with no follow-up `[agent] authenticated`.

This PR is purely diagnostic + a defensive guard:

  • Log invalid api_key rejections with reason + key-hash prefix + hostname, mirroring the supervisor disambiguation in PR fix(hub): disambiguate supervisor auth failure reasons #50.
  • Guard the Phase 05 rootless-only path: `AgentAuth.project_dir` was made optional in the schema, but the runtime code still does `msg.project_dir.replace(...)`. Added an explicit reject when both `project_dir` and `rootless_sessions` are missing — would otherwise throw with no diagnostic.

No behavioral change for legitimate agents — only adds visibility for the failure modes that were silent.

Why

User reported the UI not populating new `claude-remote` sessions. Coolify logs showed multiple `[agent] connection opened` lines but only one `[agent] authenticated`. Without a server-side log on the auth-fail path, it was impossible to tell whether the silent connections were schema-rejected, key-rejected, or network-flapping. Now distinguishable at a glance.

Test plan

  • `bun test hub/test/agent-auth-logging.test.ts` (gated on `REMO_E2E_DB_URL` like the rest of the DAL suites)
  • Full `bun test` from repo root — 380 pass, 5 pre-existing fails on main unchanged
  • After deploy, reproduce: spawn `claude-remote` with a bogus key; verify `[agent] auth fail reason=invalid_api_key ...` appears in Coolify logs

🤖 Generated with Claude Code

Two silent failure paths in handleAgentAuth made it impossible to diagnose
"new sessions not populating in the UI" from Coolify logs:

1. Invalid api_key: verifyApiKey returns null, hub sends auth_error frame
   and closes 4001 — but logs NOTHING. The user's only signal is a bare
   "[agent] connection opened" with no follow-up, indistinguishable from a
   network blip. Now logs `[agent] auth fail reason=invalid_api_key
   hash=<8hex>... host=<hostname>` matching the supervisor disambiguation
   pattern from PR #50.

2. Missing project_dir AND rootless_sessions: Phase 05 made project_dir
   optional in the AgentAuth zod schema (rootless-only agents), but the
   subsequent `msg.project_dir.replace(...)` would throw at runtime, again
   with no diagnostic. Explicit reject with reason=no_project_or_rootless.

Adds hub/test/agent-auth-logging.test.ts (REMO_E2E_DB_URL gated) covering
both new log lines + verifying the schema-reject path (already logged) is
intact.

Root cause of the user-visible bug: agents reconnecting with stale/wrong
API keys after recent dashboard changes were silently rejected. The fix
itself is purely diagnostic — once deployed, the operator can see exactly
why an "[agent] connection opened" didn't produce "[agent] authenticated"
and act (rotate key, point agent at correct hub, etc.).
@finedesignz finedesignz merged commit 53e7960 into main May 26, 2026
1 check passed
@finedesignz finedesignz deleted the fix/sessions-not-populating branch May 26, 2026 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant