-
-
Notifications
You must be signed in to change notification settings - Fork 326
Description
Background
When the deployment has a large number of OAuth auth files (for example, 500-1000+), request routing currently performs repeated candidate filtering/selection from the full in-memory auth set on every request.
Even with existing hash/debounce/hot-reload optimizations, this still creates avoidable runtime overhead under high concurrency (CPU + lock contention), especially when many auth entries are in cooldown/disabled states.
Problem
Current matching is effectively:
- iterate auths
- filter by provider/model/disabled/cooldown/retry context
- choose by priority + selector
This happens per request and scales with total auth count rather than available auth count.
Proposal
Introduce explicit state pools (or indexed queues) managed globally and updated incrementally on state transitions:
- Available pool: immediately selectable auths
- Waiting/Cooldown pool: temporarily unavailable auths with
next_recover_at - Disabled pool: manually/system disabled auths
Routing path should select directly from the available pool (plus provider/model index), instead of rescanning all auths each time.
State transitions
- success -> available
- 429/quota -> waiting (record recover time)
- 401/403/404 policy -> disabled or waiting by policy
- manual disable -> disabled
- timer/recover -> waiting -> available
Suggested implementation direction
- Build provider+model indexes keyed to auth IDs in each pool.
- Keep priority buckets inside available pool for O(1)/O(log n) top-priority selection.
- Maintain transitions via existing MarkResult / refresh / watcher update hooks.
- Use a min-heap (by recover time) for waiting pool wake-up.
- Keep selector behavior (round-robin/fill-first), but operate on prefiltered available set.
UI/Management improvements
Please also consider management-page improvements for large auth fleets:
-
Status filtering (not sorting):
- available
- waiting/cooldown
- disabled
- default view: available only
-
Batch operations:
- batch enable/disable
- batch set priority
- batch set prefix/proxy
- batch move between groups/pools (if introduced)
Expected benefits
- Lower per-request routing overhead at scale
- Better p95/p99 latency under large auth counts
- Cleaner operational visibility and control for large credential pools
Thanks for considering this.