fix(hub): surface silent agent auth failures (sessions not populating) by finedesignz · Pull Request #69 · finedesignz/remo-code

finedesignz · 2026-05-26T22:55:16Z

Summary

Root cause of "new sessions not populating in UI": agents reconnecting with stale/invalid API keys were silently rejected by the hub. The auth-fail path on the agent role didn't log anything — operator's only signal was a bare `[agent] connection opened` with no follow-up `[agent] authenticated`.

This PR is purely diagnostic + a defensive guard:

Log invalid api_key rejections with reason + key-hash prefix + hostname, mirroring the supervisor disambiguation in PR fix(hub): disambiguate supervisor auth failure reasons #50.
Guard the Phase 05 rootless-only path: `AgentAuth.project_dir` was made optional in the schema, but the runtime code still does `msg.project_dir.replace(...)`. Added an explicit reject when both `project_dir` and `rootless_sessions` are missing — would otherwise throw with no diagnostic.

No behavioral change for legitimate agents — only adds visibility for the failure modes that were silent.

Why

User reported the UI not populating new `claude-remote` sessions. Coolify logs showed multiple `[agent] connection opened` lines but only one `[agent] authenticated`. Without a server-side log on the auth-fail path, it was impossible to tell whether the silent connections were schema-rejected, key-rejected, or network-flapping. Now distinguishable at a glance.

Test plan

`bun test hub/test/agent-auth-logging.test.ts` (gated on `REMO_E2E_DB_URL` like the rest of the DAL suites)
Full `bun test` from repo root — 380 pass, 5 pre-existing fails on main unchanged
After deploy, reproduce: spawn `claude-remote` with a bogus key; verify `[agent] auth fail reason=invalid_api_key ...` appears in Coolify logs

🤖 Generated with Claude Code

Two silent failure paths in handleAgentAuth made it impossible to diagnose "new sessions not populating in the UI" from Coolify logs: 1. Invalid api_key: verifyApiKey returns null, hub sends auth_error frame and closes 4001 — but logs NOTHING. The user's only signal is a bare "[agent] connection opened" with no follow-up, indistinguishable from a network blip. Now logs `[agent] auth fail reason=invalid_api_key hash=<8hex>... host=<hostname>` matching the supervisor disambiguation pattern from PR #50. 2. Missing project_dir AND rootless_sessions: Phase 05 made project_dir optional in the AgentAuth zod schema (rootless-only agents), but the subsequent `msg.project_dir.replace(...)` would throw at runtime, again with no diagnostic. Explicit reject with reason=no_project_or_rootless. Adds hub/test/agent-auth-logging.test.ts (REMO_E2E_DB_URL gated) covering both new log lines + verifying the schema-reject path (already logged) is intact. Root cause of the user-visible bug: agents reconnecting with stale/wrong API keys after recent dashboard changes were silently rejected. The fix itself is purely diagnostic — once deployed, the operator can see exactly why an "[agent] connection opened" didn't produce "[agent] authenticated" and act (rotate key, point agent at correct hub, etc.).

finedesignz merged commit 53e7960 into main May 26, 2026
1 check passed

finedesignz deleted the fix/sessions-not-populating branch May 26, 2026 22:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(hub): surface silent agent auth failures (sessions not populating)#69

fix(hub): surface silent agent auth failures (sessions not populating)#69
finedesignz merged 1 commit into
mainfrom
fix/sessions-not-populating

finedesignz commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

finedesignz commented May 26, 2026

Summary

Why

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant