feat(parser): continuous-queue scheduler + tail-variance bench (#845) by gfargo · Pull Request #857 · gfargo/coco

gfargo · 2026-05-06T04:53:48Z

PR 4 of the #845 sprint. Replaces both pipeline passes' wave-by-wave Promise.all with a continuous-queue semaphore so the slowest call in a batch no longer blocks the next batch from starting. Monorepo-shape fixtures see a 58% wall-clock cut (213s → 89s); other fixtures stay flat because they fit in a single wave at concurrency 24.

Two changes

1. Continuous queue

Both schedulers replaced:

summarizeInWaves in summarizeDiffs.ts (directory consolidation)
processInWaves in summarizeLargeFiles.ts (per-file pre-process)

Old loop:

for (each batch of maxConcurrent items)
  await Promise.all(batch)

New loop: createLimit(maxConcurrent) semaphore — items dispatch in order, the next item starts as soon as any in-flight slot frees. The directory pass also re-checks the budget at dispatch time so subsequent items short-circuit when earlier completions already dropped the total under maxTokens.

2. Tail-variance latency model in the bench

The previous mock latency was baseLatencyMs + perTokenMs * tokens — same shape for every call. That hid wave-locking entirely because the slowest call in any wave was the same as every other call. Real LLMs don't behave that way; they have meaningful tail variance (one of every ten calls is noticeably slow, sometimes 5-10x).

Added a deterministic per-call multiplier keyed on the input hash:

~10% of calls hit a 3x slow bucket
~3% of calls hit a 6x slow bucket
Same input → same latency every run (reproducible)

This finally makes wave-locking measurable in the bench. Without it, the architectural change was correct but invisible.

3. New `monorepo` fixture

80 modification-shaped files across 35 directories — sized to produce more LLM calls than maxConcurrent (24), so wave-based requires multiple waves and shows the wave-locking pathology under variance.

Bench: wave-based vs continuous-queue (both at variance latency)

fixture	wave-based	continuous-queue	Δ wall
tiny	4 ms	2 ms	-2 ms
medium	8,511 ms	8,502 ms	-9 ms
large	14,420 ms	14,413 ms	-7 ms
feature-add	13,066 ms	13,062 ms	-4 ms
refactor	51,148 ms	51,116 ms	-32 ms
initial-commit	16,974 ms	16,965 ms	-9 ms
docs-update	24,128 ms	24,384 ms	+256 ms
dep-bump	0 ms	0 ms	0 ms
monorepo	213,233 ms	88,957 ms	-124,276 ms (-58.3%)

Every other fixture has ≤20 LLM calls — they fit in a single wave at concurrency 24, so scheduler choice is moot. The monorepo fixture (80 calls = ~4 waves at wave-based) is where wave-locking surfaces.

LLM call counts are unchanged across the board. PR 4 is pure scheduler restructuring; no calls eliminated like in PR 2's skip-trivial. The win is purely from filling slots more efficiently under variance.

Test plan

npm run lint
npm run test:jest (1294 tests pass — 1 updated assertion in summarizeDiffs.test.ts to match the new "continuous queue" log line, plus 19 new fixture tests covering the monorepo addition)
npm run build
npm run test:cli
npm run bench — numbers above

Plan reference

PR 4 of the #845 sprint. PR 5 (per-repo content-hashed disk cache) is the final headline win — re-runs of coco commit after small changes will hit the cache for unchanged files and skip the LLM entirely.

Two changes that together let the bench show wave-locking pathology and PR 4's fix for it. ## Continuous queue replaces wave-by-wave Promise.all Both `summarizeInWaves` (directory consolidation in `summarizeDiffs.ts`) and `processInWaves` (per-file pre-process in `summarizeLargeFiles.ts`) used a wave-by-wave loop: for (each batch of maxConcurrent items) await Promise.all(batch) The slowest call in each wave forced the next wave to wait — even if the other wave members had finished long before. Real-world LLM latencies have meaningful tail variance (one slow call in every wave is common); the wave-locking compounded that into significant dead time. Replaced both passes with a `createLimit(maxConcurrent)` semaphore (the same primitive `collectDiffs` already uses). Items dispatch in order; as soon as any in-flight slot frees, the next item starts. The directory pass also re-checks the budget at dispatch time — if earlier completions already dropped the total under maxTokens, subsequent items return their input unchanged instead of paying for an unnecessary LLM call. ## Bench: tail-variance latency model + monorepo fixture The previous mock latency formula (base + per-token) gave every call the same shape, so wave-locking never showed in numbers. Added a deterministic per-call multiplier keyed on the input hash: ~10% of calls land in a 3x slow bucket, ~3% in a 6x slow bucket. Same input still yields the same latency every run (reproducible) but the workload now exhibits the long-tail behavior real LLMs do. Added a `monorepo` fixture (80 modification-shaped files across 35 directories) sized to produce more LLM calls than the default maxConcurrent. With the variance model + monorepo fixture, the wave-locking is finally measurable. ## Bench: wave-based vs continuous-queue (variance latency) | fixture | wave-based | continuous-queue | Δ wall | |----------------|------------:|-----------------:|------------:| | tiny | 4 ms | 2 ms | -2 ms | | medium | 8,511 ms | 8,502 ms | -9 ms | | large | 14,420 ms | 14,413 ms | -7 ms | | feature-add | 13,066 ms | 13,062 ms | -4 ms | | refactor | 51,148 ms | 51,116 ms | -32 ms | | initial-commit | 16,974 ms | 16,965 ms | -9 ms | | docs-update | 24,128 ms | 24,384 ms | +256 ms | | dep-bump | 0 ms | 0 ms | 0 ms | | **monorepo** | **213,233 ms** | **88,957 ms** | **-124,276 ms (-58%)** | All fixtures except monorepo have ≤20 LLM calls — they fit in a single wave at concurrency 24, so the scheduler choice is moot for them. The monorepo fixture (80 calls = ~4 waves at wave- based) is where wave-locking shows: continuous queue cuts the wall by 58%. The other fixtures stay flat as expected (within noise). The LLM call counts are unchanged — this PR is pure scheduler restructuring, no skip-trivial-style call-elimination.

gfargo merged commit 51a7041 into main May 6, 2026
8 checks passed

gfargo deleted the feat/continuous-queue-waves-845 branch May 6, 2026 04:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(parser): continuous-queue scheduler + tail-variance bench (#845)#857

feat(parser): continuous-queue scheduler + tail-variance bench (#845)#857
gfargo merged 1 commit intomainfrom
feat/continuous-queue-waves-845

gfargo commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gfargo commented May 6, 2026

Two changes

1. Continuous queue

2. Tail-variance latency model in the bench

3. New monorepo fixture

Bench: wave-based vs continuous-queue (both at variance latency)

Test plan

Plan reference

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

3. New `monorepo` fixture