Skip to content

Commit 36b8cb5

Browse files
readme update
1 parent b4203b9 commit 36b8cb5

1 file changed

Lines changed: 20 additions & 4 deletions

File tree

README.md

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -78,8 +78,9 @@ SharpToken currently supports the following models:
7878
- `cl100k_base`
7979
- `o200k_base`
8080
- `o200k_harmony`
81+
- `claude`
8182

82-
You can use any of these models when creating an instance of GptEncoding:
83+
You can use any of these encodings when creating an instance of GptEncoding:
8384

8485
```csharp
8586
var r50kBaseEncoding = GptEncoding.GetEncoding("r50k_base");
@@ -88,8 +89,20 @@ var p50kEditEncoding = GptEncoding.GetEncoding("p50k_edit");
8889
var cl100kBaseEncoding = GptEncoding.GetEncoding("cl100k_base");
8990
var o200kBaseEncoding = GptEncoding.GetEncoding("o200k_base");
9091
var o200kHarmonyEncoding = GptEncoding.GetEncoding("o200k_harmony");
92+
var claudeEncoding = GptEncoding.GetEncoding("claude");
9193
```
9294

95+
### Claude Model Support
96+
97+
The `claude` encoding uses Anthropic's official tokenizer vocabulary with NFKC normalization. It is accurate for pre-Claude 3 models and a rough approximation for Claude 3+.
98+
99+
```csharp
100+
var encoding = GptEncoding.GetEncodingForModel("claude-3.5-sonnet");
101+
var count = encoding.CountTokens("Hello, Claude!");
102+
```
103+
104+
All `claude-*` model names are supported (e.g. `claude-3-opus`, `claude-3.5-sonnet`, `claude-3.7-sonnet`, `claude-4-sonnet`).
105+
93106
### Model Prefix Matching
94107

95108
Apart from specifying direct model names, SharpToken also provides functionality to map model names based on specific prefixes. This allows users to retrieve an encoding based on a model's prefix.
@@ -98,6 +111,7 @@ Here are the current supported prefixes and their corresponding encodings:
98111

99112
| Model Prefix | Encoding |
100113
| ---------------- | ------------- |
114+
| `claude-` | `claude` |
101115
| `gpt-5` | `o200k_base` |
102116
| `gpt-4o` | `o200k_base` |
103117
| `gpt-4-` | `cl100k_base` |
@@ -106,7 +120,8 @@ Here are the current supported prefixes and their corresponding encodings:
106120

107121
Examples of model names that fall under these prefixes include:
108122

109-
- For the prefix `gpt-5`: `gpt-5`, `gpt-5-mini`, `gpt-5-nano`, `gpt-5-pro`, `gpt-5-thinking`, `gpt-5-2024-08-07`, `gpt-5-chat-latest`, etc.
123+
- For the prefix `claude-`: `claude-3-opus-20240229`, `claude-3.5-sonnet-20241022`, etc.
124+
- For the prefix `gpt-5`: `gpt-5`, `gpt-5-mini`, `gpt-5-nano`, `gpt-5-pro`, `gpt-5-thinking`, `gpt-5-2024-08-07`, etc.
110125
- For the prefix `gpt-4o`: `gpt-4o`, `gpt-4o-2024-05-13`, etc.
111126
- For the prefix `gpt-4-`: `gpt-4-0314`, `gpt-4-32k`, etc.
112127
- For the prefix `gpt-3.5-turbo-`: `gpt-3.5-turbo-0301`, `gpt-3.5-turbo-0401`, etc.
@@ -115,10 +130,11 @@ Examples of model names that fall under these prefixes include:
115130
To retrieve the encoding name based on a model name or its prefix, you can use the `GetEncodingNameForModel` method:
116131

117132
```csharp
118-
string encodingName = Model.GetEncodingNameForModel("gpt-4-0314"); // This will return "cl100k_base"
133+
string encodingName = Model.GetEncodingNameForModel("claude-3.5-sonnet"); // Returns "claude"
134+
string encodingName = Model.GetEncodingNameForModel("gpt-4-0314"); // Returns "cl100k_base"
119135
```
120136

121-
If the provided model name doesn't match any direct model names or prefixes, the method will return `null`.
137+
If the provided model name doesn't match any direct model names or prefixes, an exception is thrown.
122138

123139
## Understanding Encoded Values
124140

0 commit comments

Comments
 (0)