feat(service): bump default models to current fast tiers (#854)#855
Merged
feat(service): bump default models to current fast tiers (#854)#855
Conversation
Pure additions / deletions / renames-no-edit / binary file changes have no information content beyond the diff's shape — an LLM summary just produces "Added X" / "Removed Y" / "Renamed A to B" that we can templated for free. New `trivialDiff.ts` helper detects these shapes from the hunk body and returns a deterministic summary string; `summarizeFileDiff` short-circuits on a non-undefined return and skips the LLM call entirely. Detection rules (cheap on purpose — runs per file in pre-process): - "Binary files X and Y differ" header → 'binary' - rename from / rename to headers AND no `+`/`-` body → 'rename' - all body lines start with `+` (and at least one does) → 'addition' - all body lines start with `-` (and at least one does) → 'deletion' - otherwise → undefined (LLM path stays in charge) Headers (diff --git, index, ---, +++, @@, new file mode, etc.) are ignored when classifying so the metadata `--- /dev/null` doesn't fool the deletion detector. Bench (post-PR-1 baseline → PR 2): | fixture | calls before | calls after | wall before | wall after | |----------------|-------------:|------------:|------------:|-----------:| | tiny | 0 | 0 | 1 ms | 1 ms | | medium | 19 | 6 (-68%) | 29.3 s | 6.9 s (-76%) | | large | 30 | 6 (-80%) | 60.0 s | 9.7 s (-84%) | | feature-add | 11 | 4 (-64%) | 19.6 s | 5.6 s (-71%) | | refactor | 20 | 20 | 41.3 s | 41.3 s | | initial-commit | 30 | 6 (-80%) | 60.0 s | 9.8 s (-84%) | | docs-update | 7 | 7 | 18.6 s | 18.6 s | | dep-bump | 0 | 0 | 0 ms | 0 ms | Initial-commit-shaped repos (the user's reported #845 pain point) collapse from 60 s / 30 LLM calls to 9.8 s / 6 calls — an 84 % wall-clock cut. Modification-heavy fixtures (refactor, docs-update) still need real LLM work and stay flat as expected; their optimization comes from PR 4 (continuous-queue waves) and PR 6 (per-type prompts). 15 new tests in `trivialDiff.test.ts` cover the four trivial shapes, headers-vs-body classification, line-count templating, singular wording for 1-line edits, and the rename-with-edit fall-through (which must NOT be classified as trivial because the body has actual changes).
The diff-condensing pipeline is bounded summarization (input: a diff up to a few thousand tokens; output: 1-3 sentences). That profile fits each provider's fast/cheap tier, not the older flagships the defaults were pinned to. Defaults updated in `src/lib/langchain/utils.ts`: | provider | before | after | |-----------|-----------------------------|------------------------------| | openai | gpt-4o-mini | gpt-4.1-nano | | anthropic | claude-3-5-sonnet-20240620 | claude-haiku-4-5-20251001 | | ollama | llama3 | (unchanged — user-pulled tag)| The Anthropic default was nearly two model generations stale; Haiku 4.5 is the current fast tier and the right fit for summarization. The OpenAI bump matches what the user reporting #845 was already running locally and what the GPT-4.1 line ships as its lightest member. Type surface: extended `AnthropicModel` to include the 4.x line (`claude-sonnet-4-6`, `claude-haiku-4-5`, `claude-haiku-4-5-20251001`, `claude-opus-4-7`) so users can pin to the bigger models when quality matters more than speed. Pre-4.x entries kept for back-compat with existing pinned configs. Schema regenerated. `npm run test:cli` still passes; the synthetic bench can't measure model quality but a manual eyeball pass on real diffs against each new default is part of the merge protocol (documented in #854). Users who want the old defaults can set them explicitly via service config — nothing about this is exclusive.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #854.
Why
The diff-condensing pipeline is bounded summarization (input: a diff up to a few thousand tokens; output: 1-3 sentences). That profile fits each provider's fast / cheap tier — not the older flagships the defaults were pinned to. The Anthropic default in particular was nearly two model generations stale (Sonnet 3.5 from June 2024); shipping Haiku 4.5 by default gives Anthropic users a likely 2-3x speedup at the same or better quality on summarization.
Defaults updated
gpt-4o-minigpt-4.1-nanoclaude-3-5-sonnet-20240620claude-haiku-4-5-20251001llama3Type surface
Extended
AnthropicModelto include the 4.x line so users can pin to the bigger models when quality matters more than speed:claude-sonnet-4-6claude-haiku-4-5-20251001claude-haiku-4-5claude-opus-4-7Pre-4.x entries kept for back-compat with users whose service config is already pinned to those.
Wiki
Updated
Config-Overview.md:tokenLimitcorrected to4096(already raised in PR 1 of coco commit pipeline takes ~4 minutes on a 43-file / 77k-token initial commit #845).claude-haiku-4-5-20251001.(Wiki commit lands in a separate
coco.wikipush.)Risk
The synthetic bench (
bin/benchmark.ts) uses a mock LLM and can't measure quality — only call count + wall-clock. Before merging, recommend a manual eyeball pass:coco commitagainst a small repo with the old default (override via service config), capture the generated message.coco commitagain with the new default. Compare quality.Quality regressions on summarization are unlikely on the fast-tier swap (the task is well-bounded), but the eyeball check is the gate.
Test plan
npm run lintnpm run test:jest(1265 tests pass; schema regenerated)npm run buildnpm run test:clicoco commiton a small real repo with each new default; compare summary quality vs. old defaults.Follow-up
This is a recurring task. Provider model lineups refresh every 6-12 months; worth adding a
## Model defaultssection toCONTRIBUTING.mdreminding maintainers to revisit. Tracking as a standalone follow-up if it doesn't get covered here.