Skip to content

Should "smollm2" alias point at HuggingFaceTB/SmolLM-135M (v1) or SmolLM2-135M? #18828

@cgkim-nota

Description

@cgkim-nota

I was reading the smollm2 wiring in examples/models/llama/export_llama_lib.py and ran into something I can't tell is intentional or a wiring mistake. Asking before doing anything that depends on either interpretation.

What I observed in current main

# examples/models/llama/export_llama_lib.py
HUGGING_FACE_REPO_IDS = {
    ...
    "smollm2": "HuggingFaceTB/SmolLM-135M",
    ...
}

HuggingFaceTB/SmolLM-135M is the SmolLM v1 repo. The SmolLM2 model from the same org lives at HuggingFaceTB/SmolLM2-135M; they are listed as separate repos on HuggingFace.

examples/models/smollm2/135M_config.json has "rope_theta": 10000.0, which matches v1's HF config.json. v2's HF config.json has rope_theta: 100000.

To be sure this wasn't just a metadata-naming thing, I downloaded both checkpoints and compared model.safetensors:

  • Different SHA-256, different on-disk sizes (v1 ships fp32 538 MB, v2 ships bf16 269 MB).
  • Identical key set — both are LlamaForCausalLM, 272 tensors, same shapes.
  • 0 of 272 tensors are bit-identical. Per-tensor max_abs_diff between the two ranges from 0.67 to 10.56, well above any dtype-precision noise floor.
  • First row of model.embed_tokens.weight:
    • v1: [-0.379, -0.219, 0.028, -0.262, -0.231, -0.164, 0.082, -0.246]
    • v2: [-0.118, 0.028, 0.048, -0.008, -0.056, -0.052, 0.016, -0.134]

So the two repos contain genuinely different weights, not the same model under two names.

What confused me when I looked at the original PR

Looking at the seeding PR #9354 (Add SmolLM (smollm2)), the review trail looks like the intent was SmolLM2 v2:

  • The PR was originally submitted with the directory named examples/models/smollm/ (v1 family name). The reviewer asked the author to rename it to smollm2:

    "rename this and directory to smolllm2" — Reviewer
    "Ah - it should be smollm2*" — Reviewer

  • While reviewing the params JSON, the reviewer cross-referenced the SmolLM2 HuggingFace config to validate hidden_dim:

    "Should be 1536 - https://huggingface.co/HuggingFaceTB/SmolLM2-135M/blob/main/config.json#L12" — Reviewer

So from the review history it looks like the reviewer believed they were merging SmolLM2 v2 support. But the resulting params JSON has v1's rope_theta, and the HUGGING_FACE_REPO_IDS entry points at v1's repo. The fields the reviewer actually checked (hidden_dim, use_hf_rope, tied embeddings, model_type) all happen to be identical between v1 and v2, so a v1-vs-v2 mismatch on the unchecked fields wouldn't have shown up in review.

My question

Is the smollm2 alias here intentionally meant to point at SmolLM v1 (in which case the naming is just historical and I should treat it that way), or is this an unnoticed wiring mistake where the alias was supposed to land on SmolLM2 but ended up on v1?

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions