Skip to content

feat(service): bump default models to current fast tiers (#854)#855

Merged
gfargo merged 2 commits intomainfrom
feat/bump-default-models-854
May 6, 2026
Merged

feat(service): bump default models to current fast tiers (#854)#855
gfargo merged 2 commits intomainfrom
feat/bump-default-models-854

Conversation

@gfargo
Copy link
Copy Markdown
Owner

@gfargo gfargo commented May 6, 2026

Closes #854.

Why

The diff-condensing pipeline is bounded summarization (input: a diff up to a few thousand tokens; output: 1-3 sentences). That profile fits each provider's fast / cheap tier — not the older flagships the defaults were pinned to. The Anthropic default in particular was nearly two model generations stale (Sonnet 3.5 from June 2024); shipping Haiku 4.5 by default gives Anthropic users a likely 2-3x speedup at the same or better quality on summarization.

Defaults updated

provider before after rationale
openai gpt-4o-mini gpt-4.1-nano Cheapest / fastest in the GPT-4.1 line; matches what the #845 user was already running
anthropic claude-3-5-sonnet-20240620 claude-haiku-4-5-20251001 Current fast tier; Sonnet 3.5 is two generations stale
ollama llama3 unchanged It's a tag the user pulls themselves

Type surface

Extended AnthropicModel to include the 4.x line so users can pin to the bigger models when quality matters more than speed:

  • claude-sonnet-4-6
  • claude-haiku-4-5-20251001
  • claude-haiku-4-5
  • claude-opus-4-7

Pre-4.x entries kept for back-compat with users whose service config is already pinned to those.

Wiki

Updated Config-Overview.md:

  • Default tokenLimit corrected to 4096 (already raised in PR 1 of coco commit pipeline takes ~4 minutes on a 43-file / 77k-token initial commit #845).
  • Anthropic config example updated to recommend claude-haiku-4-5-20251001.
  • Supported-models list reorganized so users see current generation first; older models grouped as "kept for back-compat".
  • Added a one-paragraph note explaining the "default to your provider's fast tier" rationale.

(Wiki commit lands in a separate coco.wiki push.)

Risk

The synthetic bench (bin/benchmark.ts) uses a mock LLM and can't measure quality — only call count + wall-clock. Before merging, recommend a manual eyeball pass:

  1. coco commit against a small repo with the old default (override via service config), capture the generated message.
  2. coco commit again with the new default. Compare quality.
  3. Document in PR-merge notes if the new default underperforms; we can swap to a different fast-tier model without changing the rest of the PR.

Quality regressions on summarization are unlikely on the fast-tier swap (the task is well-bounded), but the eyeball check is the gate.

Test plan

  • npm run lint
  • npm run test:jest (1265 tests pass; schema regenerated)
  • npm run build
  • npm run test:cli
  • Manual: coco commit on a small real repo with each new default; compare summary quality vs. old defaults.

Follow-up

This is a recurring task. Provider model lineups refresh every 6-12 months; worth adding a ## Model defaults section to CONTRIBUTING.md reminding maintainers to revisit. Tracking as a standalone follow-up if it doesn't get covered here.

gfargo added 2 commits May 5, 2026 21:14
Pure additions / deletions / renames-no-edit / binary file changes
have no information content beyond the diff's shape — an LLM
summary just produces "Added X" / "Removed Y" / "Renamed A to B"
that we can templated for free. New `trivialDiff.ts` helper
detects these shapes from the hunk body and returns a
deterministic summary string; `summarizeFileDiff` short-circuits
on a non-undefined return and skips the LLM call entirely.

Detection rules (cheap on purpose — runs per file in pre-process):
  - "Binary files X and Y differ" header → 'binary'
  - rename from / rename to headers AND no `+`/`-` body → 'rename'
  - all body lines start with `+` (and at least one does) → 'addition'
  - all body lines start with `-` (and at least one does) → 'deletion'
  - otherwise → undefined (LLM path stays in charge)

Headers (diff --git, index, ---, +++, @@, new file mode, etc.) are
ignored when classifying so the metadata `--- /dev/null` doesn't
fool the deletion detector.

Bench (post-PR-1 baseline → PR 2):

| fixture        | calls before | calls after | wall before | wall after |
|----------------|-------------:|------------:|------------:|-----------:|
| tiny           |            0 |           0 |      1 ms   |     1 ms   |
| medium         |           19 |  6  (-68%)  |     29.3 s  |   6.9 s (-76%) |
| large          |           30 |  6  (-80%)  |     60.0 s  |   9.7 s (-84%) |
| feature-add    |           11 |  4  (-64%)  |     19.6 s  |   5.6 s (-71%) |
| refactor       |           20 |          20 |     41.3 s  |  41.3 s    |
| initial-commit |           30 |  6  (-80%)  |     60.0 s  |   9.8 s (-84%) |
| docs-update    |            7 |           7 |     18.6 s  |  18.6 s    |
| dep-bump       |            0 |           0 |      0 ms   |     0 ms   |

Initial-commit-shaped repos (the user's reported #845 pain point)
collapse from 60 s / 30 LLM calls to 9.8 s / 6 calls — an 84 %
wall-clock cut. Modification-heavy fixtures (refactor, docs-update)
still need real LLM work and stay flat as expected; their
optimization comes from PR 4 (continuous-queue waves) and PR 6
(per-type prompts).

15 new tests in `trivialDiff.test.ts` cover the four trivial
shapes, headers-vs-body classification, line-count templating,
singular wording for 1-line edits, and the rename-with-edit
fall-through (which must NOT be classified as trivial because
the body has actual changes).
The diff-condensing pipeline is bounded summarization (input: a
diff up to a few thousand tokens; output: 1-3 sentences). That
profile fits each provider's fast/cheap tier, not the older
flagships the defaults were pinned to.

Defaults updated in `src/lib/langchain/utils.ts`:

| provider  | before                      | after                        |
|-----------|-----------------------------|------------------------------|
| openai    | gpt-4o-mini                 | gpt-4.1-nano                 |
| anthropic | claude-3-5-sonnet-20240620  | claude-haiku-4-5-20251001    |
| ollama    | llama3                      | (unchanged — user-pulled tag)|

The Anthropic default was nearly two model generations stale;
Haiku 4.5 is the current fast tier and the right fit for
summarization. The OpenAI bump matches what the user reporting
#845 was already running locally and what the GPT-4.1 line ships
as its lightest member.

Type surface: extended `AnthropicModel` to include the 4.x line
(`claude-sonnet-4-6`, `claude-haiku-4-5`, `claude-haiku-4-5-20251001`,
`claude-opus-4-7`) so users can pin to the bigger models when
quality matters more than speed. Pre-4.x entries kept for
back-compat with existing pinned configs.

Schema regenerated. `npm run test:cli` still passes; the
synthetic bench can't measure model quality but a manual
eyeball pass on real diffs against each new default is part
of the merge protocol (documented in #854).

Users who want the old defaults can set them explicitly via
service config — nothing about this is exclusive.
@gfargo gfargo merged commit daf4331 into main May 6, 2026
9 checks passed
@gfargo gfargo deleted the feat/bump-default-models-854 branch May 6, 2026 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bump default service models to current fast-tier (#845 follow-up)

1 participant