Commit 244bf06
fix: clean and truncate doc content before Meilisearch upload
Strip frontmatter, HTML/MDX tags, and import statements from content
before uploading to Meilisearch. Truncate cleaned content to 6000 chars
to stay within DashScope text-embedding-v4's 8192-token input limit,
which was causing batch embedding failures.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 9443d97 commit 244bf06
1 file changed
+18
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
31 | 34 | | |
32 | 35 | | |
33 | 36 | | |
| |||
145 | 148 | | |
146 | 149 | | |
147 | 150 | | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
148 | 163 | | |
149 | 164 | | |
150 | 165 | | |
151 | | - | |
| 166 | + | |
152 | 167 | | |
153 | 168 | | |
154 | 169 | | |
155 | 170 | | |
156 | 171 | | |
| 172 | + | |
157 | 173 | | |
158 | 174 | | |
159 | 175 | | |
160 | 176 | | |
161 | | - | |
| 177 | + | |
162 | 178 | | |
163 | 179 | | |
164 | 180 | | |
| |||
0 commit comments