Skip to content

Tokenization for the large model #49

@ccharest93

Description

@ccharest93
  1. The config for large model specifies a vocab size of 51200, is there a separate tokenizer file for it? Weirdly vocab falls back down to 32 for xlarge which makes me think typo?
  2. The tokenizer file specifies a vocab_size of 30, while the config for base and small specifies 32. Is this rounding to a power of two for efficiency?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions