Skip to content

feat(parser): raise default maxConcurrent to 24 + adaptive backoff (#845)#856

Merged
gfargo merged 1 commit intomainfrom
feat/raise-concurrency-backoff-845
May 6, 2026
Merged

feat(parser): raise default maxConcurrent to 24 + adaptive backoff (#845)#856
gfargo merged 1 commit intomainfrom
feat/raise-concurrency-backoff-845

Conversation

@gfargo
Copy link
Copy Markdown
Owner

@gfargo gfargo commented May 6, 2026

PR 3 of the #845 sprint. Headline: modification-heavy fixtures (refactor, docs-update) get 35-45% wall-clock cut, and the pipeline now degrades gracefully under rate limits instead of failing the whole run on a transient 429.

Two changes

1. Default maxConcurrent: 12 → 24

src/lib/langchain/utils.ts — both openai and anthropic service defaults bumped. The fast tier on each provider comfortably handles ~30 concurrent on the per-key default rate limit; 24 doubles throughput while leaving headroom for retries. Ollama stays at 1 (single local instance, more would just queue). The bench default in bin/benchmark.ts is also bumped so per-PR diffs reflect the new production setting.

2. Adaptive backoff in summarize()

New invokeWithBackoff wrapper around chain.invoke() that catches retryable errors and waits with exponential backoff before retrying:

Error shape Retried?
HTTP 429 / 502 / 503 / 504 yes
code: 'rate_limit_exceeded' yes
code: 'ECONNRESET' / 'ETIMEDOUT' yes
Message matches /rate.?limit|429|too many requests|timeout|temporarily unavailable/i yes
HTTP 4xx (other than 429) no — fail fast

3 retries, 1s / 2s / 4s waits (capped at 5s). Worst-case extra latency: ~7s before surfacing the failure. Bounded enough that the user doesn't wait forever; long enough to ride out brief rate-limit blips that would otherwise kill an entire pipeline mid-run.

Bench (post-PR-2 baseline → PR 3)

fixture wall before wall after Δ wall
tiny 1 ms 2 ms +1 ms
medium 6,906 ms 6,905 ms -1 ms
large 9,749 ms 9,754 ms +5 ms
feature-add 5,640 ms 5,641 ms +1 ms
refactor 41,347 ms 26,718 ms -14,629 ms (-35%)
initial-commit 9,818 ms 9,816 ms -2 ms
docs-update 18,564 ms 10,248 ms -8,316 ms (-45%)
dep-bump 0 ms 0 ms 0 ms

PR 3 targets the modification-heavy fixtures that PR 2 couldn't help. refactor (20 LLM calls) and docs-update (7 calls) were serializing across multiple waves at the old concurrency 12 — at 24 they fit in a single wave, so the slowest call in the wave dominates instead of the wave count.

Pure-add fixtures already had ≤7 calls each which fit in one wave at concurrency 12 — bumping to 24 doesn't help them, which is why they're flat. They needed the skip-trivial work from PR 2, which is what they got.

Cumulative state vs original baseline (#847)

fixture original wall post-PR-3 wall Δ
tiny 2 ms 2 ms 0 ms
medium 31,124 ms 6,905 ms -77.8%
large 72,151 ms 9,754 ms -86.5%
feature-add 15,967 ms 5,641 ms -64.7%
refactor 33,994 ms 26,718 ms -21.4%
initial-commit 72,291 ms 9,816 ms -86.4%
docs-update 18,563 ms 10,248 ms -44.8%
dep-bump 27,158 ms 0 ms -100.0%

Test plan

  • npm run lint
  • npm run test:jest (1275 tests pass — 10 new in chains/summarize/index.test.ts covering the full backoff matrix: success-no-retry, retry-then-succeed for each retryable error shape, no-retry on non-retryable, give-up after 3 retries, unchanged Document[] across retries)
  • npm run build
  • npm run test:cli
  • npm run bench — numbers above

Plan reference

PR 3 of the #845 sprint. PR 4 (continuous-queue waves — kills the wave-locking on refactor's slowest call) is next, then PR 5 (disk cache).

)

Two changes that compound:

1. Default `maxConcurrent` bumped 12 → 24 in the openai +
   anthropic service configs (`langchain/utils.ts`). The fast
   tier on each provider comfortably handles ~30 concurrent on
   the per-key default rate limit; 24 doubles throughput while
   leaving headroom for retries. Ollama stays at 1 (single local
   instance, more would just queue). Bench default also bumped
   so per-PR diffs reflect the new production setting.

2. New `invokeWithBackoff` wrapper around `chain.invoke()` in
   `summarize()`. Catches retryable errors (HTTP 429 / 502 / 503
   / 504, rate-limit codes, connection resets, "rate limit" or
   "timeout" in the message body), waits with exponential
   backoff (1s / 2s / 4s, capped at 5s), retries up to 3 times.
   Non-retryable errors (4xx other than 429) still propagate
   immediately so a malformed prompt fails fast.

Bench (post-PR-2 baseline → PR 3):

| fixture        | wall before | wall after | Δ wall      |
|----------------|------------:|-----------:|------------:|
| tiny           |       1 ms  |      2 ms  |       +1 ms |
| medium         |   6,906 ms  |  6,905 ms  |       -1 ms |
| large          |   9,749 ms  |  9,754 ms  |       +5 ms |
| feature-add    |   5,640 ms  |  5,641 ms  |       +1 ms |
| refactor       |  41,347 ms  | 26,718 ms  | -14,629 ms (-35%) |
| initial-commit |   9,818 ms  |  9,816 ms  |       -2 ms |
| docs-update    |  18,564 ms  | 10,248 ms  |  -8,316 ms (-45%) |
| dep-bump       |       0 ms  |      0 ms  |        0 ms |

PR 3 targets the modification-heavy fixtures that PR 2 couldn't
help (refactor's 20 LLM calls and docs-update's 7 calls were
serializing across multiple waves at the old concurrency=12).
At concurrency 24, both fit in a single wave — the slowest call
in the wave dominates, not the wave count.

Pure-add fixtures (medium / large / initial-commit / feature-add)
already had ≤7 calls each which fit in one wave at concurrency
12 — bumping to 24 doesn't help them. They needed the skip-trivial
work in PR 2 instead, which is exactly what they got.

10 new tests in `chains/summarize/index.test.ts` cover the
backoff matrix: success-no-retry, retry-then-succeed for each
retryable error shape, no-retry on non-retryable errors, give-up
after 3 retries on persistent rate limits, and that the chain
receives unchanged Document[] input across retries.
@gfargo gfargo merged commit 14d770e into main May 6, 2026
9 checks passed
@gfargo gfargo deleted the feat/raise-concurrency-backoff-845 branch May 6, 2026 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant