feat(parser): raise default maxConcurrent to 24 + adaptive backoff (#845) by gfargo · Pull Request #856 · gfargo/coco

gfargo · 2026-05-06T03:06:09Z

PR 3 of the #845 sprint. Headline: modification-heavy fixtures (refactor, docs-update) get 35-45% wall-clock cut, and the pipeline now degrades gracefully under rate limits instead of failing the whole run on a transient 429.

Two changes

1. Default maxConcurrent: 12 → 24

src/lib/langchain/utils.ts — both openai and anthropic service defaults bumped. The fast tier on each provider comfortably handles ~30 concurrent on the per-key default rate limit; 24 doubles throughput while leaving headroom for retries. Ollama stays at 1 (single local instance, more would just queue). The bench default in bin/benchmark.ts is also bumped so per-PR diffs reflect the new production setting.

2. Adaptive backoff in `summarize()`

New invokeWithBackoff wrapper around chain.invoke() that catches retryable errors and waits with exponential backoff before retrying:

Error shape	Retried?
HTTP 429 / 502 / 503 / 504	yes
`code: 'rate_limit_exceeded'`	yes
`code: 'ECONNRESET'` / `'ETIMEDOUT'`	yes
Message matches `/rate.?limit\|429\|too many requests\|timeout\|temporarily unavailable/i`	yes
HTTP 4xx (other than 429)	no — fail fast

3 retries, 1s / 2s / 4s waits (capped at 5s). Worst-case extra latency: ~7s before surfacing the failure. Bounded enough that the user doesn't wait forever; long enough to ride out brief rate-limit blips that would otherwise kill an entire pipeline mid-run.

Bench (post-PR-2 baseline → PR 3)

fixture	wall before	wall after	Δ wall
tiny	1 ms	2 ms	+1 ms
medium	6,906 ms	6,905 ms	-1 ms
large	9,749 ms	9,754 ms	+5 ms
feature-add	5,640 ms	5,641 ms	+1 ms
refactor	41,347 ms	26,718 ms	-14,629 ms (-35%)
initial-commit	9,818 ms	9,816 ms	-2 ms
docs-update	18,564 ms	10,248 ms	-8,316 ms (-45%)
dep-bump	0 ms	0 ms	0 ms

PR 3 targets the modification-heavy fixtures that PR 2 couldn't help. refactor (20 LLM calls) and docs-update (7 calls) were serializing across multiple waves at the old concurrency 12 — at 24 they fit in a single wave, so the slowest call in the wave dominates instead of the wave count.

Pure-add fixtures already had ≤7 calls each which fit in one wave at concurrency 12 — bumping to 24 doesn't help them, which is why they're flat. They needed the skip-trivial work from PR 2, which is what they got.

Cumulative state vs original baseline (#847)

fixture	original wall	post-PR-3 wall	Δ
tiny	2 ms	2 ms	0 ms
medium	31,124 ms	6,905 ms	-77.8%
large	72,151 ms	9,754 ms	-86.5%
feature-add	15,967 ms	5,641 ms	-64.7%
refactor	33,994 ms	26,718 ms	-21.4%
initial-commit	72,291 ms	9,816 ms	-86.4%
docs-update	18,563 ms	10,248 ms	-44.8%
dep-bump	27,158 ms	0 ms	-100.0%

Test plan

npm run lint
npm run test:jest (1275 tests pass — 10 new in chains/summarize/index.test.ts covering the full backoff matrix: success-no-retry, retry-then-succeed for each retryable error shape, no-retry on non-retryable, give-up after 3 retries, unchanged Document[] across retries)
npm run build
npm run test:cli
npm run bench — numbers above

Plan reference

PR 3 of the #845 sprint. PR 4 (continuous-queue waves — kills the wave-locking on refactor's slowest call) is next, then PR 5 (disk cache).

) Two changes that compound: 1. Default `maxConcurrent` bumped 12 → 24 in the openai + anthropic service configs (`langchain/utils.ts`). The fast tier on each provider comfortably handles ~30 concurrent on the per-key default rate limit; 24 doubles throughput while leaving headroom for retries. Ollama stays at 1 (single local instance, more would just queue). Bench default also bumped so per-PR diffs reflect the new production setting. 2. New `invokeWithBackoff` wrapper around `chain.invoke()` in `summarize()`. Catches retryable errors (HTTP 429 / 502 / 503 / 504, rate-limit codes, connection resets, "rate limit" or "timeout" in the message body), waits with exponential backoff (1s / 2s / 4s, capped at 5s), retries up to 3 times. Non-retryable errors (4xx other than 429) still propagate immediately so a malformed prompt fails fast. Bench (post-PR-2 baseline → PR 3): | fixture | wall before | wall after | Δ wall | |----------------|------------:|-----------:|------------:| | tiny | 1 ms | 2 ms | +1 ms | | medium | 6,906 ms | 6,905 ms | -1 ms | | large | 9,749 ms | 9,754 ms | +5 ms | | feature-add | 5,640 ms | 5,641 ms | +1 ms | | refactor | 41,347 ms | 26,718 ms | -14,629 ms (-35%) | | initial-commit | 9,818 ms | 9,816 ms | -2 ms | | docs-update | 18,564 ms | 10,248 ms | -8,316 ms (-45%) | | dep-bump | 0 ms | 0 ms | 0 ms | PR 3 targets the modification-heavy fixtures that PR 2 couldn't help (refactor's 20 LLM calls and docs-update's 7 calls were serializing across multiple waves at the old concurrency=12). At concurrency 24, both fit in a single wave — the slowest call in the wave dominates, not the wave count. Pure-add fixtures (medium / large / initial-commit / feature-add) already had ≤7 calls each which fit in one wave at concurrency 12 — bumping to 24 doesn't help them. They needed the skip-trivial work in PR 2 instead, which is exactly what they got. 10 new tests in `chains/summarize/index.test.ts` cover the backoff matrix: success-no-retry, retry-then-succeed for each retryable error shape, no-retry on non-retryable errors, give-up after 3 retries on persistent rate limits, and that the chain receives unchanged Document[] input across retries.

gfargo merged commit 14d770e into main May 6, 2026
9 checks passed

gfargo deleted the feat/raise-concurrency-backoff-845 branch May 6, 2026 03:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(parser): raise default maxConcurrent to 24 + adaptive backoff (#845)#856

feat(parser): raise default maxConcurrent to 24 + adaptive backoff (#845)#856
gfargo merged 1 commit intomainfrom
feat/raise-concurrency-backoff-845

gfargo commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gfargo commented May 6, 2026

Two changes

1. Default maxConcurrent: 12 → 24

2. Adaptive backoff in summarize()

Bench (post-PR-2 baseline → PR 3)

Cumulative state vs original baseline (#847)

Test plan

Plan reference

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

2. Adaptive backoff in `summarize()`