Skip to content

docs: clarify why LFM 2.5 uses 10:6 not 5:3 in §4.5#85

Open
devlux76 wants to merge 1 commit intomainfrom
claude/review-pr-84-parameter-golf-kzDll
Open

docs: clarify why LFM 2.5 uses 10:6 not 5:3 in §4.5#85
devlux76 wants to merge 1 commit intomainfrom
claude/review-pr-84-parameter-golf-kzDll

Conversation

@devlux76
Copy link
Copy Markdown
Owner

10:6 encodes absolute layer counts (10 CfC + 6 GQA = 16 layers total); reducing to 5:3 would describe a shallower 8-layer architecture. Add one-sentence note to PARAMETER_GOLF.md §4.5.

https://claude.ai/code/session_01JpxhvpizFcE1iLL9aT5MUF

Description

Related Issue

Closes #

Checklist

  • This PR references at least one open issue (see above)
  • Tests have been added or updated where applicable
  • Linting and type-checking pass (bun run check)
  • All tests pass (bun run test)

10:6 encodes absolute layer counts (10 CfC + 6 GQA = 16 layers total);
reducing to 5:3 would describe a shallower 8-layer architecture.
Add one-sentence note to PARAMETER_GOLF.md §4.5.

https://claude.ai/code/session_01JpxhvpizFcE1iLL9aT5MUF
Copilot AI review requested due to automatic review settings March 21, 2026 23:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Clarifies the interpretation of LFM 2.5’s “10:6” CfC:GQA layer layout in §4.5, explaining that the numbers represent absolute layer counts (depth), not just a reducible ratio.

Changes:

  • Adds an explicit note that “10:6” encodes 16 total layers (10 CfC + 6 GQA).
  • Explains why reducing to “5:3” would imply a different (shallower) 8-layer model.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +268 to 273
LFM 2.5's 10:6 CfC:GQA ratio was found empirically. Note that 10:6 cannot be
reduced to 5:3: the numbers are absolute layer counts (10 CfC + 6 GQA = 16 layers
total), not a bare ratio. Reducing to 5:3 would describe a different 8-layer
model, halving the depth. The Geode factorization (§D-4.1) provides a principled
derivation that eliminates the guesswork.

Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wording is a bit mathematically incorrect here: 10:6 can be reduced to 5:3 as a ratio; the point is that it should not be reduced when interpreting the architecture, because the absolute layer counts determine depth. Consider rephrasing to “should not be reduced” / “must not be interpreted as reducible here” to avoid confusion.

Suggested change
LFM 2.5's 10:6 CfC:GQA ratio was found empirically. Note that 10:6 cannot be
reduced to 5:3: the numbers are absolute layer counts (10 CfC + 6 GQA = 16 layers
total), not a bare ratio. Reducing to 5:3 would describe a different 8-layer
model, halving the depth. The Geode factorization (§D-4.1) provides a principled
derivation that eliminates the guesswork.
LFM 2.5's CfC:GQA layer counts (10 CfC + 6 GQA = 16 layers total) were found
empirically. Written as a ratio, "10:6" can be reduced to "5:3" mathematically,
but for architectural interpretation it should not be reduced: we care about the
absolute layer counts, not just the proportion. Interpreting it as "5:3" would
instead describe a different 8-layer model, halving the depth. The Geode
factorization (§D-4.1) provides a principled derivation that eliminates the
guesswork.

Copilot uses AI. Check for mistakes.

LFM 2.5's 10:6 CfC:GQA ratio was found empirically. The Geode factorization
(§D-4.1) provides a principled derivation that eliminates the guesswork.
LFM 2.5's 10:6 CfC:GQA ratio was found empirically. Note that 10:6 cannot be
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR metadata currently has Closes # with no issue number. Repo policy in the PR template indicates every PR must reference an open issue; please update the PR description to include a valid issue reference (e.g., Closes #123).

Copilot uses AI. Check for mistakes.
Comment on lines +268 to +272
LFM 2.5's 10:6 CfC:GQA ratio was found empirically. Note that 10:6 cannot be
reduced to 5:3: the numbers are absolute layer counts (10 CfC + 6 GQA = 16 layers
total), not a bare ratio. Reducing to 5:3 would describe a different 8-layer
model, halving the depth. The Geode factorization (§D-4.1) provides a principled
derivation that eliminates the guesswork.
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README currently points readers to docs/parameter-golf.md for the Parameter Golf approach (README.md:7). Since this change is in PARAMETER_GOLF.md, consider also propagating the clarification to the referenced doc (or updating README) so the explanation is discoverable.

Suggested change
LFM 2.5's 10:6 CfC:GQA ratio was found empirically. Note that 10:6 cannot be
reduced to 5:3: the numbers are absolute layer counts (10 CfC + 6 GQA = 16 layers
total), not a bare ratio. Reducing to 5:3 would describe a different 8-layer
model, halving the depth. The Geode factorization (§D-4.1) provides a principled
derivation that eliminates the guesswork.
LFM 2.5's 10:6 CfC:GQA configuration was found empirically. Note that 10:6
cannot be reduced to 5:3: the numbers are **absolute layer counts** (10 CfC +
6 GQA = 16 layers total), not a dimensionless ratio between two quantities.
Interpreting 10:6 as 5:3 would instead describe a different 8-layer model
(5 CfC + 3 GQA), i.e., half the depth. This clarification is the canonical
definition and should also be reflected in the `docs/parameter-golf.md`
overview that `README.md` links to. The Geode factorization (§D-4.1) provides
a principled derivation that eliminates the guesswork.

Copilot uses AI. Check for mistakes.
Copilot AI added a commit that referenced this pull request Mar 21, 2026
…not reducible ratio

Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
Agent-Logs-Url: https://github.com/devlux76/q2/sessions/709766c3-895e-4e85-bdbc-67aec60c1798
Base automatically changed from copilot/consolidate-parameter-golf-docs to main March 21, 2026 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants