docs: clarify why LFM 2.5 uses 10:6 not 5:3 in §4.5#85
Conversation
10:6 encodes absolute layer counts (10 CfC + 6 GQA = 16 layers total); reducing to 5:3 would describe a shallower 8-layer architecture. Add one-sentence note to PARAMETER_GOLF.md §4.5. https://claude.ai/code/session_01JpxhvpizFcE1iLL9aT5MUF
There was a problem hiding this comment.
Pull request overview
Clarifies the interpretation of LFM 2.5’s “10:6” CfC:GQA layer layout in §4.5, explaining that the numbers represent absolute layer counts (depth), not just a reducible ratio.
Changes:
- Adds an explicit note that “10:6” encodes 16 total layers (10 CfC + 6 GQA).
- Explains why reducing to “5:3” would imply a different (shallower) 8-layer model.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| LFM 2.5's 10:6 CfC:GQA ratio was found empirically. Note that 10:6 cannot be | ||
| reduced to 5:3: the numbers are absolute layer counts (10 CfC + 6 GQA = 16 layers | ||
| total), not a bare ratio. Reducing to 5:3 would describe a different 8-layer | ||
| model, halving the depth. The Geode factorization (§D-4.1) provides a principled | ||
| derivation that eliminates the guesswork. | ||
|
|
There was a problem hiding this comment.
Wording is a bit mathematically incorrect here: 10:6 can be reduced to 5:3 as a ratio; the point is that it should not be reduced when interpreting the architecture, because the absolute layer counts determine depth. Consider rephrasing to “should not be reduced” / “must not be interpreted as reducible here” to avoid confusion.
| LFM 2.5's 10:6 CfC:GQA ratio was found empirically. Note that 10:6 cannot be | |
| reduced to 5:3: the numbers are absolute layer counts (10 CfC + 6 GQA = 16 layers | |
| total), not a bare ratio. Reducing to 5:3 would describe a different 8-layer | |
| model, halving the depth. The Geode factorization (§D-4.1) provides a principled | |
| derivation that eliminates the guesswork. | |
| LFM 2.5's CfC:GQA layer counts (10 CfC + 6 GQA = 16 layers total) were found | |
| empirically. Written as a ratio, "10:6" can be reduced to "5:3" mathematically, | |
| but for architectural interpretation it should not be reduced: we care about the | |
| absolute layer counts, not just the proportion. Interpreting it as "5:3" would | |
| instead describe a different 8-layer model, halving the depth. The Geode | |
| factorization (§D-4.1) provides a principled derivation that eliminates the | |
| guesswork. |
|
|
||
| LFM 2.5's 10:6 CfC:GQA ratio was found empirically. The Geode factorization | ||
| (§D-4.1) provides a principled derivation that eliminates the guesswork. | ||
| LFM 2.5's 10:6 CfC:GQA ratio was found empirically. Note that 10:6 cannot be |
There was a problem hiding this comment.
PR metadata currently has Closes # with no issue number. Repo policy in the PR template indicates every PR must reference an open issue; please update the PR description to include a valid issue reference (e.g., Closes #123).
| LFM 2.5's 10:6 CfC:GQA ratio was found empirically. Note that 10:6 cannot be | ||
| reduced to 5:3: the numbers are absolute layer counts (10 CfC + 6 GQA = 16 layers | ||
| total), not a bare ratio. Reducing to 5:3 would describe a different 8-layer | ||
| model, halving the depth. The Geode factorization (§D-4.1) provides a principled | ||
| derivation that eliminates the guesswork. |
There was a problem hiding this comment.
README currently points readers to docs/parameter-golf.md for the Parameter Golf approach (README.md:7). Since this change is in PARAMETER_GOLF.md, consider also propagating the clarification to the referenced doc (or updating README) so the explanation is discoverable.
| LFM 2.5's 10:6 CfC:GQA ratio was found empirically. Note that 10:6 cannot be | |
| reduced to 5:3: the numbers are absolute layer counts (10 CfC + 6 GQA = 16 layers | |
| total), not a bare ratio. Reducing to 5:3 would describe a different 8-layer | |
| model, halving the depth. The Geode factorization (§D-4.1) provides a principled | |
| derivation that eliminates the guesswork. | |
| LFM 2.5's 10:6 CfC:GQA configuration was found empirically. Note that 10:6 | |
| cannot be reduced to 5:3: the numbers are **absolute layer counts** (10 CfC + | |
| 6 GQA = 16 layers total), not a dimensionless ratio between two quantities. | |
| Interpreting 10:6 as 5:3 would instead describe a different 8-layer model | |
| (5 CfC + 3 GQA), i.e., half the depth. This clarification is the canonical | |
| definition and should also be reflected in the `docs/parameter-golf.md` | |
| overview that `README.md` links to. The Geode factorization (§D-4.1) provides | |
| a principled derivation that eliminates the guesswork. |
…not reducible ratio Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com> Agent-Logs-Url: https://github.com/devlux76/q2/sessions/709766c3-895e-4e85-bdbc-67aec60c1798
10:6 encodes absolute layer counts (10 CfC + 6 GQA = 16 layers total); reducing to 5:3 would describe a shallower 8-layer architecture. Add one-sentence note to PARAMETER_GOLF.md §4.5.
https://claude.ai/code/session_01JpxhvpizFcE1iLL9aT5MUF
Description
Related Issue
Closes #
Checklist
bun run check)bun run test)