Models that are constructed from pretrained models should bring their tokenizer + vocabulary along for the ride, since those are a necessary part of the model (you won't get the same result with a different tokenizer, for example).
If users want to do something weird (like using a subset of the vocabulary), they can construct the model more manually; if they use make_and_load_bert, they're specifying a BERT model.
Even within {torchtransformers} (before moving to the more constrained models in {tidybert}), we can then include tools that work with things more automatically (eg, the input to the model can be raw text).
Models that are constructed from pretrained models should bring their tokenizer + vocabulary along for the ride, since those are a necessary part of the model (you won't get the same result with a different tokenizer, for example).
If users want to do something weird (like using a subset of the vocabulary), they can construct the model more manually; if they use
make_and_load_bert, they're specifying a BERT model.Even within {torchtransformers} (before moving to the more constrained models in {tidybert}), we can then include tools that work with things more automatically (eg, the input to the model can be raw text).