Should "smollm2" alias point at HuggingFaceTB/SmolLM-135M (v1) or SmolLM2-135M?

I was reading the `smollm2` wiring in `examples/models/llama/export_llama_lib.py` and ran into something I can't tell is intentional or a wiring mistake. Asking before doing anything that depends on either interpretation.

## What I observed in current `main`

```python
# examples/models/llama/export_llama_lib.py
HUGGING_FACE_REPO_IDS = {
    ...
    "smollm2": "HuggingFaceTB/SmolLM-135M",
    ...
}
```

`HuggingFaceTB/SmolLM-135M` is the SmolLM **v1** repo. The SmolLM2 model from the same org lives at `HuggingFaceTB/SmolLM2-135M`; they are listed as separate repos on HuggingFace.

`examples/models/smollm2/135M_config.json` has `"rope_theta": 10000.0`, which matches v1's HF `config.json`. v2's HF `config.json` has `rope_theta: 100000`.

To be sure this wasn't just a metadata-naming thing, I downloaded both checkpoints and compared `model.safetensors`:

- Different SHA-256, different on-disk sizes (v1 ships fp32 538 MB, v2 ships bf16 269 MB).
- Identical key set — both are `LlamaForCausalLM`, 272 tensors, same shapes.
- **0 of 272 tensors are bit-identical.** Per-tensor `max_abs_diff` between the two ranges from `0.67` to `10.56`, well above any dtype-precision noise floor.
- First row of `model.embed_tokens.weight`:
  - v1: `[-0.379, -0.219, 0.028, -0.262, -0.231, -0.164, 0.082, -0.246]`
  - v2: `[-0.118, 0.028, 0.048, -0.008, -0.056, -0.052, 0.016, -0.134]`

So the two repos contain genuinely different weights, not the same model under two names.

## What confused me when I looked at the original PR

Looking at the seeding PR #9354 ([Add SmolLM (smollm2)](https://github.com/pytorch/executorch/pull/9354)), the review trail looks like the *intent* was SmolLM2 v2:

- The PR was originally submitted with the directory named `examples/models/smollm/` (v1 family name). The reviewer asked the author to rename it to `smollm2`:
  > *"rename this and directory to smolllm2"* — Reviewer
  > *"Ah - it should be smollm2\*"* — Reviewer
- While reviewing the params JSON, the reviewer cross-referenced the **SmolLM2** HuggingFace config to validate `hidden_dim`:
  > *"Should be 1536 - https://huggingface.co/HuggingFaceTB/SmolLM2-135M/blob/main/config.json#L12"* — Reviewer

So from the review history it looks like the reviewer believed they were merging SmolLM2 v2 support. But the resulting params JSON has v1's `rope_theta`, and the `HUGGING_FACE_REPO_IDS` entry points at v1's repo. The fields the reviewer actually checked (`hidden_dim`, `use_hf_rope`, tied embeddings, `model_type`) all happen to be identical between v1 and v2, so a v1-vs-v2 mismatch on the unchecked fields wouldn't have shown up in review.

## My question

Is the `smollm2` alias here intentionally meant to point at SmolLM v1 (in which case the naming is just historical and I should treat it that way), or is this an unnoticed wiring mistake where the alias was supposed to land on SmolLM2 but ended up on v1?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should "smollm2" alias point at HuggingFaceTB/SmolLM-135M (v1) or SmolLM2-135M? #18828

What I observed in current `main`

What confused me when I looked at the original PR

My question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Should "smollm2" alias point at HuggingFaceTB/SmolLM-135M (v1) or SmolLM2-135M? #18828

Description

What I observed in current main

What confused me when I looked at the original PR

My question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What I observed in current `main`