Skip to content

fix(hub): JWKS warm should not block port binding#61

Merged
finedesignz merged 1 commit into
mainfrom
fix/jwks-boot-gate
May 26, 2026
Merged

fix(hub): JWKS warm should not block port binding#61
finedesignz merged 1 commit into
mainfrom
fix/jwks-boot-gate

Conversation

@finedesignz
Copy link
Copy Markdown
Owner

Summary

  • Boot-time `warmJwksCache()` was calling `process.exit(1)` on failure, taking down every Coolify deploy because the upstream Keygen JWKS endpoint `/v1/accounts/{ACCOUNT_ID}/.well-known/jwks.json` returns 404 on this self-hosted instance.
  • The hub serves many surfaces that don't need Titanium JWT verification (health, public webhooks, SPA, scheduler, agent WS). A stalled Titanium endpoint should never block port bind.
  • Auth-gated routes already fail closed at request time via `verifyLicenseJwt`, which lazy-warms through `jose`'s `createRemoteJWKSet` on first use. So switching the boot gate to log-and-continue is safe.

Why this is urgent

Blocks PR #60 (started_at scheduler fix) from reaching production before the midnight 4h cron fires.

Verification

Probed `https://keygen.titaniumlabs.us\` directly:

  • `/v1/accounts/{ACCOUNT_ID}/.well-known/jwks.json` → 404
  • `/v1/accounts/{ACCOUNT_ID}` → 401 (account exists, auth required — confirms host/id correct)
  • `/.well-known/jwks.json`, `/jwks`, `/jwks.json`, `/public-key`, `/.well-known/openid-configuration` → all 404

Keygen self-hosted does not expose a JWKS endpoint at any standard path. A follow-up should investigate the correct license-key distribution mechanism (likely RSA/ED25519 public key via account metadata), but production cannot wait.

Test plan

  • Hub starts even when `TITANIUM_KEYGEN_API_URL` points at a 404 JWKS path
  • Warn line printed to logs on warm failure
  • License-gated routes still fail closed when JWKS unavailable at verify time (existing behavior via `verifyLicenseJwt` → `createRemoteJWKSet`)
  • Post-merge: confirm Coolify deploy comes up, scheduled-task started_at fix from fix(scheduler.registry): ensure started_at always populated on cron fires #60 is live

🤖 Generated with Claude Code

The Titanium JWKS warm at boot called process.exit(1) on any failure,
which has been taking down every Coolify deploy of remo-code-hub since
the upstream Keygen JWKS endpoint at
`/v1/accounts/{ACCOUNT_ID}/.well-known/jwks.json` returns 404.

Architecturally, a stalled Titanium JWKS endpoint should never block the
hub from binding its port — the hub serves health checks, public
webhooks (Coolify, Sentry intake), the web SPA, the scheduler, and the
agent WebSocket, none of which require Titanium JWT verification. Only
license-gated routes do, and those fail closed at request time via
`verifyLicenseJwt` which already lazy-warms on first use through
jose's `createRemoteJWKSet` resolver.

Change: log the warm failure loudly and continue binding the port.
Misconfiguration remains visible in deploy logs; production stays up.

Unblocks PR #60 (started_at scheduler fix) from going live before the
midnight cron fires.
@finedesignz finedesignz merged commit 753b02f into main May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant