Skip to content

feat(parser): continuous-queue scheduler + tail-variance bench (#845)#857

Merged
gfargo merged 1 commit intomainfrom
feat/continuous-queue-waves-845
May 6, 2026
Merged

feat(parser): continuous-queue scheduler + tail-variance bench (#845)#857
gfargo merged 1 commit intomainfrom
feat/continuous-queue-waves-845

Conversation

@gfargo
Copy link
Copy Markdown
Owner

@gfargo gfargo commented May 6, 2026

PR 4 of the #845 sprint. Replaces both pipeline passes' wave-by-wave Promise.all with a continuous-queue semaphore so the slowest call in a batch no longer blocks the next batch from starting. Monorepo-shape fixtures see a 58% wall-clock cut (213s → 89s); other fixtures stay flat because they fit in a single wave at concurrency 24.

Two changes

1. Continuous queue

Both schedulers replaced:

  • summarizeInWaves in summarizeDiffs.ts (directory consolidation)
  • processInWaves in summarizeLargeFiles.ts (per-file pre-process)

Old loop:

for (each batch of maxConcurrent items)
  await Promise.all(batch)

New loop: createLimit(maxConcurrent) semaphore — items dispatch in order, the next item starts as soon as any in-flight slot frees. The directory pass also re-checks the budget at dispatch time so subsequent items short-circuit when earlier completions already dropped the total under maxTokens.

2. Tail-variance latency model in the bench

The previous mock latency was baseLatencyMs + perTokenMs * tokens — same shape for every call. That hid wave-locking entirely because the slowest call in any wave was the same as every other call. Real LLMs don't behave that way; they have meaningful tail variance (one of every ten calls is noticeably slow, sometimes 5-10x).

Added a deterministic per-call multiplier keyed on the input hash:

  • ~10% of calls hit a 3x slow bucket
  • ~3% of calls hit a 6x slow bucket
  • Same input → same latency every run (reproducible)

This finally makes wave-locking measurable in the bench. Without it, the architectural change was correct but invisible.

3. New monorepo fixture

80 modification-shaped files across 35 directories — sized to produce more LLM calls than maxConcurrent (24), so wave-based requires multiple waves and shows the wave-locking pathology under variance.

Bench: wave-based vs continuous-queue (both at variance latency)

fixture wave-based continuous-queue Δ wall
tiny 4 ms 2 ms -2 ms
medium 8,511 ms 8,502 ms -9 ms
large 14,420 ms 14,413 ms -7 ms
feature-add 13,066 ms 13,062 ms -4 ms
refactor 51,148 ms 51,116 ms -32 ms
initial-commit 16,974 ms 16,965 ms -9 ms
docs-update 24,128 ms 24,384 ms +256 ms
dep-bump 0 ms 0 ms 0 ms
monorepo 213,233 ms 88,957 ms -124,276 ms (-58.3%)

Every other fixture has ≤20 LLM calls — they fit in a single wave at concurrency 24, so scheduler choice is moot. The monorepo fixture (80 calls = ~4 waves at wave-based) is where wave-locking surfaces.

LLM call counts are unchanged across the board. PR 4 is pure scheduler restructuring; no calls eliminated like in PR 2's skip-trivial. The win is purely from filling slots more efficiently under variance.

Test plan

  • npm run lint
  • npm run test:jest (1294 tests pass — 1 updated assertion in summarizeDiffs.test.ts to match the new "continuous queue" log line, plus 19 new fixture tests covering the monorepo addition)
  • npm run build
  • npm run test:cli
  • npm run bench — numbers above

Plan reference

PR 4 of the #845 sprint. PR 5 (per-repo content-hashed disk cache) is the final headline win — re-runs of coco commit after small changes will hit the cache for unchanged files and skip the LLM entirely.

Two changes that together let the bench show wave-locking
pathology and PR 4's fix for it.

## Continuous queue replaces wave-by-wave Promise.all

Both `summarizeInWaves` (directory consolidation in
`summarizeDiffs.ts`) and `processInWaves` (per-file pre-process
in `summarizeLargeFiles.ts`) used a wave-by-wave loop:

    for (each batch of maxConcurrent items)
      await Promise.all(batch)

The slowest call in each wave forced the next wave to wait — even
if the other wave members had finished long before. Real-world
LLM latencies have meaningful tail variance (one slow call in
every wave is common); the wave-locking compounded that into
significant dead time.

Replaced both passes with a `createLimit(maxConcurrent)`
semaphore (the same primitive `collectDiffs` already uses).
Items dispatch in order; as soon as any in-flight slot frees,
the next item starts. The directory pass also re-checks the
budget at dispatch time — if earlier completions already dropped
the total under maxTokens, subsequent items return their input
unchanged instead of paying for an unnecessary LLM call.

## Bench: tail-variance latency model + monorepo fixture

The previous mock latency formula (base + per-token) gave every
call the same shape, so wave-locking never showed in numbers.
Added a deterministic per-call multiplier keyed on the input
hash: ~10% of calls land in a 3x slow bucket, ~3% in a 6x slow
bucket. Same input still yields the same latency every run
(reproducible) but the workload now exhibits the long-tail
behavior real LLMs do.

Added a `monorepo` fixture (80 modification-shaped files across
35 directories) sized to produce more LLM calls than the default
maxConcurrent. With the variance model + monorepo fixture, the
wave-locking is finally measurable.

## Bench: wave-based vs continuous-queue (variance latency)

| fixture        | wave-based  | continuous-queue | Δ wall      |
|----------------|------------:|-----------------:|------------:|
| tiny           |       4 ms  |          2 ms    |       -2 ms |
| medium         |   8,511 ms  |      8,502 ms    |       -9 ms |
| large          |  14,420 ms  |     14,413 ms    |       -7 ms |
| feature-add    |  13,066 ms  |     13,062 ms    |       -4 ms |
| refactor       |  51,148 ms  |     51,116 ms    |      -32 ms |
| initial-commit |  16,974 ms  |     16,965 ms    |       -9 ms |
| docs-update    |  24,128 ms  |     24,384 ms    |     +256 ms |
| dep-bump       |       0 ms  |          0 ms    |        0 ms |
| **monorepo**   | **213,233 ms** | **88,957 ms**  | **-124,276 ms (-58%)** |

All fixtures except monorepo have ≤20 LLM calls — they fit in a
single wave at concurrency 24, so the scheduler choice is moot
for them. The monorepo fixture (80 calls = ~4 waves at wave-
based) is where wave-locking shows: continuous queue cuts the
wall by 58%.

The other fixtures stay flat as expected (within noise). The
LLM call counts are unchanged — this PR is pure scheduler
restructuring, no skip-trivial-style call-elimination.
@gfargo gfargo merged commit 51a7041 into main May 6, 2026
8 checks passed
@gfargo gfargo deleted the feat/continuous-queue-waves-845 branch May 6, 2026 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant