feat(bench): realistic per-language fixture generators + scenarios (#845) by gfargo · Pull Request #849 · gfargo/coco

gfargo · 2026-05-06T00:34:06Z

Builds on the bench harness from #847. Replaces the LCG-noise content with per-language code-shaped generators and adds named scenario fixtures that mirror real commit workflows. Re-captures the baseline so all subsequent #845 PRs measure against realistic input.

Why bother

The v0 LCG fixtures were good for deterministic latency measurement but useless for telling whether an optimization actually translates to real-shaped diffs. The skip-trivial work (PR 2 in the plan) needs realistic shape detection (pure additions vs. modifications vs. renames). The markdown fast-path (PR 6) needs realistic markdown content. The cache work (PR 5) needs realistic content hashing. Without that, "wins" claimed against synthetic noise might disappear on the user's actual diffs.

What's in the PR

Generators (src/lib/parsers/default/__fixtures__/generators.ts) — one per file type, each seeded for determinism:

generateTypeScript — imports, types, functions, classes, JSDoc
generatePython — imports, defs, classes, decorators, docstrings
generateMarkdown — headers, lists, paragraphs, code blocks, tables
generateJson — nested config object with realistic key names
generateYaml — CI workflow shape
generateLockfile — yarn lock-style entries
generateContentForFile — extension-based dispatcher

Diff-shape wrappers (diffs.ts) — produce git diff output with the right header + body format for each shape:

asAdditionDiff / asDeletionDiff — pure +/- shapes
asModificationDiff — context + remove + add interleaving
asRenameDiff — git rename header (no body)
asBinaryDiff — binary file marker

New scenario fixtures (in addition to the original sized ones):

feature-add (14 files) — new module + tests + docs touch
refactor (30 files) — rename + ~25 modifications
initial-commit (50 files) — mirrors the user's coco commit pipeline takes ~4 minutes on a 43-file / 77k-token initial commit #845 repro
docs-update (9 files) — markdown-heavy
dep-bump (3 files) — package.json + lockfile + CHANGELOG

Re-captured baseline

fixture	wall-clock	llm calls	llm total ms	prompt tokens
tiny	2 ms	0	0 ms	0
medium	31,124 ms	20	106,333 ms	34,237
large	72,151 ms	41	244,112 ms	74,197
feature-add	15,967 ms	11	54,726 ms	18,937
refactor	33,994 ms	28	153,871 ms	52,430
initial-commit	72,291 ms	41	245,148 ms	74,546
docs-update	18,563 ms	8	56,293 ms	13,908
dep-bump	27,158 ms	1	27,141 ms	19,597

Three things the realistic fixtures surface that the LCG hid

dep-bump pays 27 seconds for a single LLM call — the lockfile pre-summary. Skip-trivial / per-extension fast-path should basically zero this.
refactor fires 28 LLM calls on 30 files of mixed +/-. The continuous-queue wave consolidation work (PR 4 in the plan) targets exactly this shape.
docs-update is markdown-heavy with 8 calls in 19s. A markdown-specific shorter prompt could measurably trim this.

Test plan

npm run lint
npm run test:jest (1249 tests pass — 19 new across generators.test.ts and index.test.ts covering determinism, expected-marker presence, scaling, and shape properties of the rename / dep-bump scenarios)
npm run build
npm run test:cli
npm run bench --update produces stable numbers (re-runs match within ms)

Plan reference

/Users/gfargo/.claude/plans/polymorphic-wondering-sunbeam.md (the #845 master plan). PR 1 (raise default token budget) is the next chunk; with realistic fixtures committed it can post both wall-clock and per-scenario behavior diffs in its description.

) The v0 fixtures from #847 used a seeded LCG to generate noise. Good for deterministic latency measurement, useless for telling whether an optimization translates to real-shaped diffs. This PR swaps that out for code-shaped content per file type and adds named scenarios that mirror real commit workflows. Generators (src/lib/parsers/default/__fixtures__/generators.ts): - generateTypeScript — imports, types, functions, classes, JSDoc - generatePython — imports, defs, classes, decorators, docstrings - generateMarkdown — headers, lists, paragraphs, code blocks, tables - generateJson — nested config object with realistic key names - generateYaml — CI workflow shape - generateLockfile — yarn lock-style entries - generateContentForFile — extension-based dispatcher Diff-shape wrappers (diffs.ts): - asAdditionDiff / asDeletionDiff — pure +/- shapes - asModificationDiff — context + remove + add interleaving - asRenameDiff — git rename header (no body) - asBinaryDiff — binary file marker Scenarios in addition to the original tiny/medium/large: - feature-add (14 files) — new module + tests + docs touch - refactor (30 files) — rename + ~25 modifications - initial-commit (50) — same shape as user's #845 repro - docs-update (9) — markdown-heavy - dep-bump (3) — package.json + lockfile + CHANGELOG Re-captured baseline (committed at .bench/baseline.json): | fixture | wall-clock | calls | llm total ms | prompt tokens | |----------------|-----------:|------:|-------------:|--------------:| | tiny | 2 ms | 0 | 0 ms| 0 | | medium | 31,124 ms| 20 | 106,333 ms| 34,237 | | large | 72,151 ms| 41 | 244,112 ms| 74,197 | | feature-add | 15,967 ms| 11 | 54,726 ms| 18,937 | | refactor | 33,994 ms| 28 | 153,871 ms| 52,430 | | initial-commit | 72,291 ms| 41 | 245,148 ms| 74,546 | | docs-update | 18,563 ms| 8 | 56,293 ms| 13,908 | | dep-bump | 27,158 ms| 1 | 27,141 ms| 19,597 | Three observations the realistic fixtures surface that the LCG fixtures hid: 1. dep-bump pays 27s for one LLM call — the lockfile pre-summary. Skip-trivial / per-extension fast-path should basically zero this. 2. refactor (30 files of mixed +/-) fires 28 LLM calls. The continuous-queue wave consolidation work (PR 4) targets exactly this shape. 3. docs-update is markdown-heavy with 8 calls in 19s. A markdown- specific shorter prompt could measurably trim this. Tests: 14 new generator tests + 5 new fixture-level tests covering determinism, expected-marker presence, scaling behavior, and shape properties of the rename / dep-bump scenarios.

#845) (#850) The dep-bump fixture from #849 included `yarn.lock` and reported 27 seconds of LLM work for it. That's a bench-fixture artifact, not a real-world cost. Lockfiles live in `DEFAULT_IGNORED_FILES` and the `.lock` extension lives in `DEFAULT_IGNORED_EXTENSIONS` (see `src/lib/config/constants.ts`), so `getChanges` strips them before the diff-condensing pipeline ever sees them on a real `coco commit`. - Drop `yarn.lock` from DEP_BUMP_FILES; the realistic shape is just `package.json` + `CHANGELOG.md`. - Update the dep-bump-shape test to assert the post-filter invariant (no lockfiles in the fixture) instead of asserting a lockfile is present. - Add a guard test that fails loudly if any future fixture accidentally drifts back into including a default-ignored file. - Re-baseline. dep-bump now reports 0 ms / 0 LLM calls (early exit, total budget already under threshold), reflecting what a real dep bump costs the pipeline. The "lockfile fast-path" optimization angle from the original plan is dropped — the existing ignore filter already handles that, and any pipeline-level skip would be redundant.

gfargo merged commit ff491f9 into main May 6, 2026
9 checks passed

gfargo mentioned this pull request May 6, 2026

fix(bench): dep-bump fixture should reflect post-ignore-filter content (#845) #850

Merged

5 tasks

gfargo mentioned this pull request May 6, 2026

feat(parser): raise default token budget from 2048 to 4096 (#845) #852

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bench): realistic per-language fixture generators + scenarios (#845)#849

feat(bench): realistic per-language fixture generators + scenarios (#845)#849
gfargo merged 1 commit intomainfrom
feat/realistic-bench-fixtures-845

gfargo commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gfargo commented May 6, 2026

Why bother

What's in the PR

Re-captured baseline

Three things the realistic fixtures surface that the LCG hid

Test plan

Plan reference

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant