Skip to content

add option to skip adding pad tok#47

Merged
garrett361 merged 2 commits intomainfrom
no-pad-tok
Nov 7, 2025
Merged

add option to skip adding pad tok#47
garrett361 merged 2 commits intomainfrom
no-pad-tok

Conversation

@garrett361
Copy link
Copy Markdown
Owner

The OI tokenization logic sometimes forces the addition of a token corresponding to "" to the tokenizer, thereby resizing the vocab. This PR lets users avoid that by specifying:

--get_tokenizer_fn get_tokenizer_tulu_no_pad_tok_addition

The get_tokenizer_tulu_no_pad_tok_addition function is just get_tokenizer_tulu_v2_2 (the default option) with the forced addition removed.

@garrett361 garrett361 merged commit bebd9ec into main Nov 7, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant