Context
Claude Fable 5 and Mythos 5 launched June 9, 2026 — the first public Mythos-class model. This is a research/evaluation issue, not an adoption issue. Same discipline as the Opus 4.8 evaluation (#365): field signal first, recommendation second.
What we know so far (launch day)
Benchmarks (positive):
- SWE-bench Verified: 95.0% vs Opus 4.8's 88.6%
- SWE-bench Pro: 80.0% vs 69.2%
- FrontierCode Diamond: 29.3% vs 13.4% (2× Opus 4.8)
- "Even at medium effort, Fable 5 outperforms every other model at any effort level"
- Stripe: compressed months of engineering into days on a 50M-line Ruby codebase
Token efficiency (potentially positive):
- HN early testers: "better results with about half the tokens, making it cost ~same as Opus 4.8 price-wise"
- $10/$50 per Mtok (2× Opus), but fewer turns → similar effective cost per task
Concerns (need monitoring):
- Safety classifiers "super aggressive and sensitive" for benign coding tasks — fallback to Opus 4.8 on sensitive queries (not 4.6)
- "Didn't really notice a difference vs 4.8" for standard conversation/assistant tasks
- Free on Max through June 22 only — credits may be required after June 23
- No community field data yet (launched TODAY)
- No Andon Labs / Vending-Bench data
- No effort-level regression data (does max overthink like 4.8?)
What the wizard should evaluate before recommending
Same gates as #365, but this time actually run them:
- Gate 1: proof of life — does
claude --model claude-fable-5 resolve? Does /model show it in the picker?
- Gate 2: A/B coder quality — run the same task on Fable 5 vs 4.6 max on a real PR. Compare token spend, quality, context exhaustion
- Gate 3: dogfood for 24h — maintainer runs Fable 5 as daily driver for a full day
- Gate 4: effort-level behavior — does
max overthink on Fable 5 like it does on 4.7/4.8? Or does it behave more like 4.6?
- Gate 5: community signal (7-day wait) — monitor HN, Reddit, GitHub issues for field reports after launch-day hype settles
- Gate 6: Andon Labs / independent benchmarks — wait for Vending-Bench arena results
Current wizard state
v1.80.0 recommends Opus 4.6 max as flagship default. That was the right call based on 12 days of 4.8 post-launch data. Fable 5 is a new ceiling — but it's also $10/$50 (2× the cost) and day-zero. No rush.
Pricing tier question
If Fable 5 validates, it would be a new tier above flagship (maybe "Frontier" or "Premium+"), not a replacement for 4.6 max. The $10/$50 pricing makes it a conscious choice, not a default.
Timeline
- June 9-22: free trial window on Max plans (13 days to test at no cost)
- June 23+: may require credits — pricing becomes load-bearing
- June 16+ (7 days post-launch): earliest to evaluate community signal
- July: earliest to consider a wizard recommendation if all gates pass
Related
Context
Claude Fable 5 and Mythos 5 launched June 9, 2026 — the first public Mythos-class model. This is a research/evaluation issue, not an adoption issue. Same discipline as the Opus 4.8 evaluation (#365): field signal first, recommendation second.
What we know so far (launch day)
Benchmarks (positive):
Token efficiency (potentially positive):
Concerns (need monitoring):
What the wizard should evaluate before recommending
Same gates as #365, but this time actually run them:
claude --model claude-fable-5resolve? Does/modelshow it in the picker?maxoverthink on Fable 5 like it does on 4.7/4.8? Or does it behave more like 4.6?Current wizard state
v1.80.0 recommends Opus 4.6 max as flagship default. That was the right call based on 12 days of 4.8 post-launch data. Fable 5 is a new ceiling — but it's also $10/$50 (2× the cost) and day-zero. No rush.
Pricing tier question
If Fable 5 validates, it would be a new tier above flagship (maybe "Frontier" or "Premium+"), not a replacement for 4.6 max. The $10/$50 pricing makes it a conscious choice, not a default.
Timeline
Related