fix(translate): sweep common-word wrong-forms from 7 locales' _protected by heznpc · Pull Request #197 · heznpc/skillBridge

heznpc · 2026-06-10T05:36:31Z

Dictionary-accuracy audit follow-up. #172 swept the CJK dictionaries only — the same corruption class was still live in es/fr/it/de/pt-BR/ru/vi: everyday words registered as brand "wrong-forms", so restoreProtectedTerms's unanchored replaceAll rewrote correct prose into English on virtually every lesson:

locale	ordinary prose	rendered as
de	die Zusammenarbeit im Unternehmen	die Cowork im Enterprise
it	le competenze che svilupperai	le skills che svilupperai
fr	l'extension de navigateur	l'Plugin de navigateur
ru	эти навыки общения	эти Skills общения
vi	phần đầu của bài học	frontmatter của bài học

Education content uses these words constantly → 7 of 11 premium languages degraded on every page.

Curation (same rule as #172)

Keep: proper-noun mistranslations / transliterations / coined anchored phrases (Claudio, Anthropique, Клод Код, Mã Claude, Claude-Code, Schrägstrich-Befehl, sous-agent, …). Russian keeps its loanwords (Плагин, хук, Коворк) matching the Korean precedent (플러그인/후크/코워크).
Drop: the languages' everyday words/phrases (company-words, skills-words, extension-words, hook-words, collaboration-phrases, sending-words, computer-use phrases, preamble-words, personal-words).
it.json bonus: removes the Spanish leftovers inherited from the es-derived v1 (Código Claude, habilidades, gancho, …) and the actively wrong "Plugin"→"Plugins" mapping.

Proof (TDD)

Extended the real-dictionary regression test with ordinary-prose sentences for all 7 locales — all 7 failed against the old dictionaries and pass after the sweep (42/42). CJK guard tests unchanged.

Gates: 527 unit · full E2E 19/19 · validate · glossary · check:* — green. Plugin terms data regenerated in sync. Store zips deliberately not rebuilt (owner builds on instruction).

🤖 Generated with Claude Code

The #172 sweep removed dangerous common-word wrong-forms from the CJK dictionaries only. The same bug class was still live in es/fr/it/de/pt-BR/ ru/vi: everyday words were registered as brand "wrong-forms", so the unanchored replaceAll in restoreProtectedTerms rewrote correct prose into English on virtually every lesson — e.g. de "die Zusammenarbeit im Unternehmen" -> "die Cowork im Enterprise" it "le competenze che svilupperai" -> "le skills che svilupperai" fr "l'extension de navigateur" -> "l'Plugin de navigateur" ru "эти навыки общения" -> "эти Skills общения" vi "phần đầu của bài học" -> "frontmatter của bài học" Education content uses these words (skills/compétences/competenze/навыки/ kỹ năng …) constantly, so 7 of the 11 premium languages were degraded on every page. Curation rule (same as #172): keep only proper-noun mistranslations, transliterations, and coined anchored phrases (Claudio, Anthropique, Клод Код, Mã Claude, Claude-Code, Schrägstrich-Befehl, sous-agent, …); drop the everyday words/phrases (Enterprise/Unternehmen/Empresa/Impresa, skills-words, Plugin/extension/Erweiterung/complemento, hook/crochet/ Haken/gancho/móc, Cowork/Zusammenarbeit/travail collaboratif, Dispatch/ envío/envio/Отправка, Computer-Use phrases, frontmatter/préambule/ preámbulo/phần đầu, Personal-words). Russian keeps its loanword forms (Плагин, хук, Коворк, Диспетчеризация) matching the Korean precedent (플러그인/후크/코워크). it.json also loses the Spanish leftovers its _protected inherited from the es-derived v1 (Código Claude, habilidades, gancho, …) and the actively wrong "Plugin"->"Plugins" mapping. Proof: extended the real-dictionary regression test with ordinary-prose sentences for all 7 locales — all 7 failed against the old dictionaries and pass after the sweep (42/42). Plugin terms data regenerated in sync. Gates: 527 unit tests, full E2E 19/19, validate, glossary, check:plugin/ i18n/dict-coverage/locales/dicts, lint, prettier — green. Store zips NOT rebuilt (owner builds on instruction).

… + machine-readable) (#203) Makes the moat visible and the QA state machine-readable instead of claimed: - _meta gains two QA fields in all 11 dictionaries: lastAudited (stamped by the pre-release LLM audit; 2026-06-10 for the audit that shipped in #197/#199) and nativeReview ("recruiting" -> "reviewed" after a native pass). - generate-docs.js gains a LOCALE_QA marker: the README per-locale QA table (entries / last curated / last audit / native-review status) is generated from _meta by `npm run docs`, so the public table cannot drift from reality. - README "Terminology QA" section: the standing pipeline (drift watcher -> same-day dictionary wiring -> CI gates -> real-dictionary regression suite) with the verifiable same-day proof (2026-06-10: #196 detected morning, #201 wired all 11 locales same day) + the generated table + native-reviewer call. - docs/TRANSLATION_QA.md: the three-layer assurance model, honest about what each layer does NOT catch and why no paid API can sit in CI (free-forever). - RELEASE_CHECKLIST step 0: the pre-release LLM dictionary audit is now a release convention, with the _meta.lastAudited stamping step. - CONTRIBUTING "Native language reviewers" section; recruitment umbrella issue #202 (help wanted / good first issue / i18n) with per-locale checklist. Gates: validate, glossary, check:i18n/dict-coverage/locales/dicts/plugin, 527 unit tests, lint, prettier — green.

heznpc enabled auto-merge (squash) June 10, 2026 05:36

heznpc merged commit 7bf8b25 into main Jun 10, 2026
9 checks passed

heznpc deleted the fix/protected-terms-latin-sweep branch June 10, 2026 05:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(translate): sweep common-word wrong-forms from 7 locales' _protected#197

fix(translate): sweep common-word wrong-forms from 7 locales' _protected#197
heznpc merged 1 commit into
mainfrom
fix/protected-terms-latin-sweep

heznpc commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

heznpc commented Jun 10, 2026

Curation (same rule as #172)

Proof (TDD)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant