feat(service): bump default models to current fast tiers (#854) by gfargo · Pull Request #855 · gfargo/coco

gfargo · 2026-05-06T02:56:01Z

Closes #854.

Why

The diff-condensing pipeline is bounded summarization (input: a diff up to a few thousand tokens; output: 1-3 sentences). That profile fits each provider's fast / cheap tier — not the older flagships the defaults were pinned to. The Anthropic default in particular was nearly two model generations stale (Sonnet 3.5 from June 2024); shipping Haiku 4.5 by default gives Anthropic users a likely 2-3x speedup at the same or better quality on summarization.

Defaults updated

provider	before	after	rationale
openai	`gpt-4o-mini`	`gpt-4.1-nano`	Cheapest / fastest in the GPT-4.1 line; matches what the #845 user was already running
anthropic	`claude-3-5-sonnet-20240620`	`claude-haiku-4-5-20251001`	Current fast tier; Sonnet 3.5 is two generations stale
ollama	`llama3`	unchanged	It's a tag the user pulls themselves

Type surface

Extended AnthropicModel to include the 4.x line so users can pin to the bigger models when quality matters more than speed:

claude-sonnet-4-6
claude-haiku-4-5-20251001
claude-haiku-4-5
claude-opus-4-7

Pre-4.x entries kept for back-compat with users whose service config is already pinned to those.

Wiki

Updated Config-Overview.md:

Default tokenLimit corrected to 4096 (already raised in PR 1 of coco commit pipeline takes ~4 minutes on a 43-file / 77k-token initial commit #845).
Anthropic config example updated to recommend claude-haiku-4-5-20251001.
Supported-models list reorganized so users see current generation first; older models grouped as "kept for back-compat".
Added a one-paragraph note explaining the "default to your provider's fast tier" rationale.

(Wiki commit lands in a separate coco.wiki push.)

Risk

The synthetic bench (bin/benchmark.ts) uses a mock LLM and can't measure quality — only call count + wall-clock. Before merging, recommend a manual eyeball pass:

coco commit against a small repo with the old default (override via service config), capture the generated message.
coco commit again with the new default. Compare quality.
Document in PR-merge notes if the new default underperforms; we can swap to a different fast-tier model without changing the rest of the PR.

Quality regressions on summarization are unlikely on the fast-tier swap (the task is well-bounded), but the eyeball check is the gate.

Test plan

npm run lint
npm run test:jest (1265 tests pass; schema regenerated)
npm run build
npm run test:cli
Manual: coco commit on a small real repo with each new default; compare summary quality vs. old defaults.

Follow-up

This is a recurring task. Provider model lineups refresh every 6-12 months; worth adding a ## Model defaults section to CONTRIBUTING.md reminding maintainers to revisit. Tracking as a standalone follow-up if it doesn't get covered here.

Pure additions / deletions / renames-no-edit / binary file changes have no information content beyond the diff's shape — an LLM summary just produces "Added X" / "Removed Y" / "Renamed A to B" that we can templated for free. New `trivialDiff.ts` helper detects these shapes from the hunk body and returns a deterministic summary string; `summarizeFileDiff` short-circuits on a non-undefined return and skips the LLM call entirely. Detection rules (cheap on purpose — runs per file in pre-process): - "Binary files X and Y differ" header → 'binary' - rename from / rename to headers AND no `+`/`-` body → 'rename' - all body lines start with `+` (and at least one does) → 'addition' - all body lines start with `-` (and at least one does) → 'deletion' - otherwise → undefined (LLM path stays in charge) Headers (diff --git, index, ---, +++, @@, new file mode, etc.) are ignored when classifying so the metadata `--- /dev/null` doesn't fool the deletion detector. Bench (post-PR-1 baseline → PR 2): | fixture | calls before | calls after | wall before | wall after | |----------------|-------------:|------------:|------------:|-----------:| | tiny | 0 | 0 | 1 ms | 1 ms | | medium | 19 | 6 (-68%) | 29.3 s | 6.9 s (-76%) | | large | 30 | 6 (-80%) | 60.0 s | 9.7 s (-84%) | | feature-add | 11 | 4 (-64%) | 19.6 s | 5.6 s (-71%) | | refactor | 20 | 20 | 41.3 s | 41.3 s | | initial-commit | 30 | 6 (-80%) | 60.0 s | 9.8 s (-84%) | | docs-update | 7 | 7 | 18.6 s | 18.6 s | | dep-bump | 0 | 0 | 0 ms | 0 ms | Initial-commit-shaped repos (the user's reported #845 pain point) collapse from 60 s / 30 LLM calls to 9.8 s / 6 calls — an 84 % wall-clock cut. Modification-heavy fixtures (refactor, docs-update) still need real LLM work and stay flat as expected; their optimization comes from PR 4 (continuous-queue waves) and PR 6 (per-type prompts). 15 new tests in `trivialDiff.test.ts` cover the four trivial shapes, headers-vs-body classification, line-count templating, singular wording for 1-line edits, and the rename-with-edit fall-through (which must NOT be classified as trivial because the body has actual changes).

The diff-condensing pipeline is bounded summarization (input: a diff up to a few thousand tokens; output: 1-3 sentences). That profile fits each provider's fast/cheap tier, not the older flagships the defaults were pinned to. Defaults updated in `src/lib/langchain/utils.ts`: | provider | before | after | |-----------|-----------------------------|------------------------------| | openai | gpt-4o-mini | gpt-4.1-nano | | anthropic | claude-3-5-sonnet-20240620 | claude-haiku-4-5-20251001 | | ollama | llama3 | (unchanged — user-pulled tag)| The Anthropic default was nearly two model generations stale; Haiku 4.5 is the current fast tier and the right fit for summarization. The OpenAI bump matches what the user reporting #845 was already running locally and what the GPT-4.1 line ships as its lightest member. Type surface: extended `AnthropicModel` to include the 4.x line (`claude-sonnet-4-6`, `claude-haiku-4-5`, `claude-haiku-4-5-20251001`, `claude-opus-4-7`) so users can pin to the bigger models when quality matters more than speed. Pre-4.x entries kept for back-compat with existing pinned configs. Schema regenerated. `npm run test:cli` still passes; the synthetic bench can't measure model quality but a manual eyeball pass on real diffs against each new default is part of the merge protocol (documented in #854). Users who want the old defaults can set them explicitly via service config — nothing about this is exclusive.

gfargo added 2 commits May 5, 2026 21:14

gfargo merged commit daf4331 into main May 6, 2026
9 checks passed

gfargo deleted the feat/bump-default-models-854 branch May 6, 2026 02:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(service): bump default models to current fast tiers (#854)#855

feat(service): bump default models to current fast tiers (#854)#855
gfargo merged 2 commits intomainfrom
feat/bump-default-models-854

gfargo commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gfargo commented May 6, 2026

Why

Defaults updated

Type surface

Wiki

Risk

Test plan

Follow-up

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant