Skip to content

Performance: avoid per-request full auth scan by introducing availability pools #412

@ibadoo

Description

@ibadoo

Background

When the deployment has a large number of OAuth auth files (for example, 500-1000+), request routing currently performs repeated candidate filtering/selection from the full in-memory auth set on every request.

Even with existing hash/debounce/hot-reload optimizations, this still creates avoidable runtime overhead under high concurrency (CPU + lock contention), especially when many auth entries are in cooldown/disabled states.

Problem

Current matching is effectively:

  • iterate auths
  • filter by provider/model/disabled/cooldown/retry context
  • choose by priority + selector

This happens per request and scales with total auth count rather than available auth count.

Proposal

Introduce explicit state pools (or indexed queues) managed globally and updated incrementally on state transitions:

  • Available pool: immediately selectable auths
  • Waiting/Cooldown pool: temporarily unavailable auths with next_recover_at
  • Disabled pool: manually/system disabled auths

Routing path should select directly from the available pool (plus provider/model index), instead of rescanning all auths each time.

State transitions

  • success -> available
  • 429/quota -> waiting (record recover time)
  • 401/403/404 policy -> disabled or waiting by policy
  • manual disable -> disabled
  • timer/recover -> waiting -> available

Suggested implementation direction

  1. Build provider+model indexes keyed to auth IDs in each pool.
  2. Keep priority buckets inside available pool for O(1)/O(log n) top-priority selection.
  3. Maintain transitions via existing MarkResult / refresh / watcher update hooks.
  4. Use a min-heap (by recover time) for waiting pool wake-up.
  5. Keep selector behavior (round-robin/fill-first), but operate on prefiltered available set.

UI/Management improvements

Please also consider management-page improvements for large auth fleets:

  1. Status filtering (not sorting):

    • available
    • waiting/cooldown
    • disabled
    • default view: available only
  2. Batch operations:

    • batch enable/disable
    • batch set priority
    • batch set prefix/proxy
    • batch move between groups/pools (if introduced)

Expected benefits

  • Lower per-request routing overhead at scale
  • Better p95/p99 latency under large auth counts
  • Cleaner operational visibility and control for large credential pools

Thanks for considering this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions