You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The `claude` encoding uses Anthropic's official tokenizer vocabulary with NFKC normalization. It is accurate for pre-Claude 3 models and a rough approximation for Claude 3+.
All `claude-*` model names are supported (e.g. `claude-3-opus`, `claude-3.5-sonnet`, `claude-3.7-sonnet`, `claude-4-sonnet`).
105
+
93
106
### Model Prefix Matching
94
107
95
108
Apart from specifying direct model names, SharpToken also provides functionality to map model names based on specific prefixes. This allows users to retrieve an encoding based on a model's prefix.
@@ -98,6 +111,7 @@ Here are the current supported prefixes and their corresponding encodings:
98
111
99
112
| Model Prefix | Encoding |
100
113
| ---------------- | ------------- |
114
+
|`claude-`|`claude`|
101
115
|`gpt-5`|`o200k_base`|
102
116
|`gpt-4o`|`o200k_base`|
103
117
|`gpt-4-`|`cl100k_base`|
@@ -106,7 +120,8 @@ Here are the current supported prefixes and their corresponding encodings:
106
120
107
121
Examples of model names that fall under these prefixes include:
108
122
109
-
- For the prefix `gpt-5`: `gpt-5`, `gpt-5-mini`, `gpt-5-nano`, `gpt-5-pro`, `gpt-5-thinking`, `gpt-5-2024-08-07`, `gpt-5-chat-latest`, etc.
123
+
- For the prefix `claude-`: `claude-3-opus-20240229`, `claude-3.5-sonnet-20241022`, etc.
124
+
- For the prefix `gpt-5`: `gpt-5`, `gpt-5-mini`, `gpt-5-nano`, `gpt-5-pro`, `gpt-5-thinking`, `gpt-5-2024-08-07`, etc.
110
125
- For the prefix `gpt-4o`: `gpt-4o`, `gpt-4o-2024-05-13`, etc.
111
126
- For the prefix `gpt-4-`: `gpt-4-0314`, `gpt-4-32k`, etc.
112
127
- For the prefix `gpt-3.5-turbo-`: `gpt-3.5-turbo-0301`, `gpt-3.5-turbo-0401`, etc.
@@ -115,10 +130,11 @@ Examples of model names that fall under these prefixes include:
115
130
To retrieve the encoding name based on a model name or its prefix, you can use the `GetEncodingNameForModel` method:
116
131
117
132
```csharp
118
-
stringencodingName=Model.GetEncodingNameForModel("gpt-4-0314"); // This will return "cl100k_base"
0 commit comments