Hi maintainers,
I’m hitting a vocab-size mismatch error when loading the provided checkpoint and making the first inference call.
Steps to reproduce:
- Create env and install deps (per README).
- Download checkpoint to ckpt/checkpoint-5554.
- Ensure SigLIP is available (e.g., ./siglip-so400m-patch14-384).
- Start server:
conda activate reconvla
python reconvla/serve/flask_server.py \
--model-path ckpt/checkpoint-5554 \
--action_stat reconvla/calvin/dataset/calvin_debug_dataset/validation/statistics.yaml \
--port 9097
- Trigger a /predict call (e.g., via evaluation script). The server errors on first request.
Actual result:
ValueError: Trying to set a tensor of shape torch.Size([152064, 3584]) in "weight"
(which has shape torch.Size([151646, 3584])), this looks incorrect.
Additional info:
- config.json:
- AutoTokenizer.from_pretrained(ckpt/checkpoint-5554):
- len(tokenizer)=151646, vocab_size=151643
- vocab.json size: 151643
- added_tokens.json: 3 tokens
This suggests the checkpoint expects a larger vocab than the tokenizer files provide. If resize_token_embeddings(len(tokenizer)) is called, the embedding shrinks to 151646 and then conflicts with the stored weights (152064).
Possible fix / question:
- Should the tokenizer files include the extra tokens used during training?
- Or should the loader avoid shrinking embeddings when len(tokenizer) < config.vocab_size and instead pad tokenizer to match the model vocab?
Environment:
- OS: Ubuntu (container)
- Python: 3.10 (reconvla env)
- GPU: RTX 4090, Driver 570.195.03, CUDA 12.8
Hi maintainers,
I’m hitting a vocab-size mismatch error when loading the provided checkpoint and making the first inference call.
Steps to reproduce:
Actual result:
Additional info:
This suggests the checkpoint expects a larger vocab than the tokenizer files provide. If resize_token_embeddings(len(tokenizer)) is called, the embedding shrinks to 151646 and then conflicts with the stored weights (152064).
Possible fix / question:
Environment: