Performance: avoid per-request full auth scan by introducing availability pools

## Background

When the deployment has a large number of OAuth auth files (for example, 500-1000+), request routing currently performs repeated candidate filtering/selection from the full in-memory auth set on every request.

Even with existing hash/debounce/hot-reload optimizations, this still creates avoidable runtime overhead under high concurrency (CPU + lock contention), especially when many auth entries are in cooldown/disabled states.

## Problem

Current matching is effectively:
- iterate auths
- filter by provider/model/disabled/cooldown/retry context
- choose by priority + selector

This happens per request and scales with total auth count rather than available auth count.

## Proposal

Introduce explicit state pools (or indexed queues) managed globally and updated incrementally on state transitions:

- **Available pool**: immediately selectable auths
- **Waiting/Cooldown pool**: temporarily unavailable auths with `next_recover_at`
- **Disabled pool**: manually/system disabled auths

Routing path should select directly from the **available pool** (plus provider/model index), instead of rescanning all auths each time.

### State transitions

- success -> available
- 429/quota -> waiting (record recover time)
- 401/403/404 policy -> disabled or waiting by policy
- manual disable -> disabled
- timer/recover -> waiting -> available

## Suggested implementation direction

1. Build provider+model indexes keyed to auth IDs in each pool.
2. Keep priority buckets inside available pool for O(1)/O(log n) top-priority selection.
3. Maintain transitions via existing MarkResult / refresh / watcher update hooks.
4. Use a min-heap (by recover time) for waiting pool wake-up.
5. Keep selector behavior (round-robin/fill-first), but operate on prefiltered available set.

## UI/Management improvements

Please also consider management-page improvements for large auth fleets:

1. **Status filtering** (not sorting):
   - available
   - waiting/cooldown
   - disabled
   - default view: **available only**

2. **Batch operations**:
   - batch enable/disable
   - batch set priority
   - batch set prefix/proxy
   - batch move between groups/pools (if introduced)

## Expected benefits

- Lower per-request routing overhead at scale
- Better p95/p99 latency under large auth counts
- Cleaner operational visibility and control for large credential pools

Thanks for considering this.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance: avoid per-request full auth scan by introducing availability pools #412

Background

Problem

Proposal

State transitions

Suggested implementation direction

UI/Management improvements

Expected benefits

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Performance: avoid per-request full auth scan by introducing availability pools #412

Description

Background

Problem

Proposal

State transitions

Suggested implementation direction

UI/Management improvements

Expected benefits

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions