feat(parser): raise default token budget from 2048 to 4096 (#845) by gfargo · Pull Request #852 · gfargo/coco

gfargo · 2026-05-06T01:01:31Z

First optimization PR in the #845 sprint. Single-line fallback raises that match the budget the rest of the system already assumes.

Why

The diff-condensing pipeline had two 2048 fallbacks for maxTokens:

src/lib/parsers/default/utils/summarizeDiffs.ts:250 (default param)
src/lib/parsers/default/index.ts:55 (maxTokens || 2048 in fileChangeParser)

Both came from when 4k context was a stretch for fast models. Every shipped service config in src/lib/langchain/utils.ts already sets tokenLimit: 4096 (openai, anthropic, ollama defaults), so the fallback only fires when:

A user's custom service definition omits tokenLimit
A caller skips the service.tokenLimit → maxTokens plumbing entirely

Both cases land in a needlessly tight budget that triggers extra pre-summarization on diffs the model could swallow whole. Raising the fallback to 4096 just synchronizes the parser's "no value" assumption with the rest of the system.

Bench (against the realistic post-#849 baseline)

bin/benchmark.ts's default maxTokens was also bumped from 2048 to 4096 so per-PR diffs reflect the most-common production budget.

fixture	calls before	calls after	Δ calls	wall before	wall after	Δ wall
tiny	0	0	0	1 ms	2 ms	+1 ms
medium	20	19	-1	31,137 ms	29,267 ms	-1,870 ms
large	41	30	-11 (-27%)	72,093 ms	59,992 ms	-12,101 ms (-17%)
feature-add	11	11	0	15,967 ms	19,591 ms	+3,624 ms
refactor	28	20	-8 (-29%)	33,999 ms	41,340 ms	+7,341 ms
initial-commit	41	30	-11 (-27%)	72,285 ms	60,034 ms	-12,251 ms (-17%)
docs-update	8	7	-1	18,570 ms	18,563 ms	-7 ms
dep-bump	0	0	0	0 ms	0 ms	0 ms

Reading the numbers

The headline is the 27-29% drop in LLM call count on the heavy fixtures (large, initial-commit, refactor). That's direct API cost reduction — the user pays for fewer round-trips regardless of wall-clock effects.

Wall-clock follows the call-count drop on large / initial-commit (-12s / -17%). On refactor it moves the other way (+7s) because fewer-but-larger calls each pay the bench latency model's per-call base cost twice over; with realistic API latency the cross-over point may sit differently — worth measuring on a real run before declaring this a regression vs. a wall-clock-neutral cost win.

Test plan

npm run lint
npm run test:jest (1250 tests pass — no behavior change to assert beyond the existing default-bound tests)
npm run build
npm run test:cli
npm run bench → numbers above

Plan reference

PR 1 of the #845 sprint. PR 2 (skip-trivial-diffs) is the next chunk.

Match the canonical service tokenLimit shipped in `langchain/utils.ts` for openai / anthropic / ollama (all 4096). The 2048 fallback was a holdover from when 4k was a stretch for fast models — today every shipped service overrides it to 4096 already, so the fallback only fires for users whose custom service definition omits tokenLimit. Without this raise, those users hit a needlessly tight budget that triggers extra pre-summarization on diffs the model could absorb whole. Two call sites updated: - `summarizeDiffs.ts:250` default param - `parsers/default/index.ts:55` `||` fallback Bench (bin/benchmark.ts default also bumped to 4096 so per-PR diffs reflect the most-common production budget): | fixture | calls before | calls after | Δ calls | |----------------|-------------:|------------:|--------:| | tiny | 0 | 0 | 0 | | medium | 20 | 19 | -1 | | large | 41 | 30 | -11 (-27%) | | feature-add | 11 | 11 | 0 | | refactor | 28 | 20 | -8 (-29%) | | initial-commit | 41 | 30 | -11 (-27%) | | docs-update | 8 | 7 | -1 | | dep-bump | 0 | 0 | 0 | Heavy fixtures (large, initial-commit, refactor) get a real 27-29% reduction in LLM call count — direct API cost reduction. Wall clock for `large` / `initial-commit` improved 12 s (17%); the `refactor` wall went up slightly because fewer-but-larger calls serialize a bit (each pays the latency model's per-call base cost), trading API spend for a small wall-clock cost. Net is a clear win on the cost dimension that scales with diff size.

gfargo merged commit 259b93b into main May 6, 2026
9 checks passed

gfargo deleted the feat/raise-token-budget-default-845 branch May 6, 2026 01:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(parser): raise default token budget from 2048 to 4096 (#845)#852

feat(parser): raise default token budget from 2048 to 4096 (#845)#852
gfargo merged 1 commit intomainfrom
feat/raise-token-budget-default-845

gfargo commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gfargo commented May 6, 2026

Why

Bench (against the realistic post-#849 baseline)

Reading the numbers

Test plan

Plan reference

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant