Checkpoint tokenizer vocab mismatch causes lm_head shape error (152064 vs 151646) on first /predict

Hi maintainers,
I’m hitting a vocab-size mismatch error when loading the provided checkpoint and making the first inference call.

Steps to reproduce:

1. Create env and install deps (per README).
2. Download checkpoint to ckpt/checkpoint-5554.
3. Ensure SigLIP is available (e.g., ./siglip-so400m-patch14-384).
4. Start server:
```bash
conda activate reconvla
python reconvla/serve/flask_server.py \
  --model-path ckpt/checkpoint-5554 \
  --action_stat reconvla/calvin/dataset/calvin_debug_dataset/validation/statistics.yaml \
  --port 9097
```
5. Trigger a /predict call (e.g., via evaluation script). The server errors on first request.
Actual result:
```
ValueError: Trying to set a tensor of shape torch.Size([152064, 3584]) in "weight"
(which has shape torch.Size([151646, 3584])), this looks incorrect.
```
Additional info:

- config.json:
  - "vocab_size": 152064
- AutoTokenizer.from_pretrained(ckpt/checkpoint-5554):
  - len(tokenizer)=151646, vocab_size=151643
- vocab.json size: 151643
- added_tokens.json: 3 tokens


This suggests the checkpoint expects a larger vocab than the tokenizer files provide. If resize_token_embeddings(len(tokenizer)) is called, the embedding shrinks to 151646 and then conflicts with the stored weights (152064).

Possible fix / question:

- Should the tokenizer files include the extra tokens used during training?
- Or should the loader avoid shrinking embeddings when len(tokenizer) < config.vocab_size and instead pad tokenizer to match the model vocab?

Environment:

- OS: Ubuntu (container)
- Python: 3.10 (reconvla env)
- GPU: RTX 4090, Driver 570.195.03, CUDA 12.8


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checkpoint tokenizer vocab mismatch causes lm_head shape error (152064 vs 151646) on first /predict #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Checkpoint tokenizer vocab mismatch causes lm_head shape error (152064 vs 151646) on first /predict #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions