Skip to content

More self-contained info in BERT models #30

@jonthegeek

Description

@jonthegeek

Models that are constructed from pretrained models should bring their tokenizer + vocabulary along for the ride, since those are a necessary part of the model (you won't get the same result with a different tokenizer, for example).

If users want to do something weird (like using a subset of the vocabulary), they can construct the model more manually; if they use make_and_load_bert, they're specifying a BERT model.

Even within {torchtransformers} (before moving to the more constrained models in {tidybert}), we can then include tools that work with things more automatically (eg, the input to the model can be raw text).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions