fix(server): refresh api-keys cache on miss + TTL so multi-instance deploys see new accounts (#2351)#2681
Conversation
…eploys see new accounts (volcengine#2351) In a multi-instance deployment behind a load balancer, accounts and users created on Instance A via /api/v1/admin/accounts wrote to shared AGFS but never propagated to Instance B's in-memory api-keys cache, which loaded once at startup. Subsequent requests routed to Instance B returned 401 even though the new account was persisted. The cache now refreshes on cache-miss before declaring an unknown key unauthorized, and entries expire after a 30-second TTL so changes written elsewhere are picked up on the next request after the TTL. Local writes (admin endpoints) invalidate immediately so the originating instance always sees its own writes without waiting. Concurrent misses for the same key dedupe through an asyncio.Lock so a thundering-herd of 401s after a fresh account creation only triggers one reload. No new dependencies (no Redis / no pub-sub). Storage format unchanged. TTL is a module constant; no new config field. Closes volcengine#2351 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
|
Direction matches #2351: cache-miss reload + TTL + local invalidation is a reasonable no-new-infra bridge. But this PR is not mergeable as-is:
Please split/drop unrelated parser changes and rerun the targeted auth/cache tests. After that, this can be reviewed as the #2351 fix. |
Summary
Closes #2351. In a multi-instance deployment behind a load balancer, an account/user created via
POST /api/v1/admin/accountson Instance A wrote to shared AGFS but never propagated to Instance B's in-memory api-keys cache (loaded once at startup). Subsequent authenticated requests routed to Instance B returned 401.Maintainer
@zhoujh01confirmed in the issue thread: "We also welcome community members to share their code."This PR keeps it simple — no new infrastructure dependency:
X-API-Keythat isn't in the in-memory cache, the cache reloads from AGFS once before declaring 401. New accounts created on a peer instance become visible on the next request.asyncio.Lock, so a thundering-herd doesn't multiply storage reads.No new dependencies (no Redis / no pub-sub). Storage format unchanged. TTL is a module constant; no new config field.
Test plan
pytest tests/server/test_api_keys_cache_invalidation.py -x -qpasses (5 cases).tests/server/test_auth.pyand admin tests still pass.Closes #2351