Skip to content

Make ModelDims.from_hf_config robust to explicit head_dim#1731

Open
hamishivi wants to merge 2 commits into
mainfrom
hamishivi/modeldims-head-dim-robustness
Open

Make ModelDims.from_hf_config robust to explicit head_dim#1731
hamishivi wants to merge 2 commits into
mainfrom
hamishivi/modeldims-head-dim-robustness

Conversation

@hamishivi

Copy link
Copy Markdown
Collaborator

Summary

  • Honor an explicit head_dim from HF configs (e.g. composite/VLM models) instead of always deriving it from hidden_size // num_attention_heads, which fails when hidden_size is not divisible by num_attention_heads.
  • Relax the __post_init__ assertion to require a positive head_dim.

Test plan

  • Existing ModelDims tests.

GPU_TESTS=bypass

Made with Cursor

hamishivi and others added 2 commits June 23, 2026 13:37
Honor an explicit head_dim from HF configs (e.g. composite/VLM models)
instead of always deriving it from hidden_size // num_attention_heads,
and relax the __post_init__ assertion to require a positive head_dim.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the model dimension configuration in open_instruct/utils.py to validate and compute head_dim from the Hugging Face configuration. The reviewer recommends replacing the assert statements with explicit ValueError exceptions to ensure robust runtime validation, as assertions can be optimized away when Python is run with optimization flags.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread open_instruct/utils.py
Comment thread open_instruct/utils.py
@hamishivi hamishivi added this pull request to the merge queue Jun 27, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants