Skip to content

fix(translate): sweep common-word wrong-forms from 7 locales' _protected#197

Merged
heznpc merged 1 commit into
mainfrom
fix/protected-terms-latin-sweep
Jun 10, 2026
Merged

fix(translate): sweep common-word wrong-forms from 7 locales' _protected#197
heznpc merged 1 commit into
mainfrom
fix/protected-terms-latin-sweep

Conversation

@heznpc

@heznpc heznpc commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Dictionary-accuracy audit follow-up. #172 swept the CJK dictionaries only — the same corruption class was still live in es/fr/it/de/pt-BR/ru/vi: everyday words registered as brand "wrong-forms", so restoreProtectedTerms's unanchored replaceAll rewrote correct prose into English on virtually every lesson:

locale ordinary prose rendered as
de die Zusammenarbeit im Unternehmen die Cowork im Enterprise
it le competenze che svilupperai le skills che svilupperai
fr l'extension de navigateur l'Plugin de navigateur
ru эти навыки общения эти Skills общения
vi phần đầu của bài học frontmatter của bài học

Education content uses these words constantly → 7 of 11 premium languages degraded on every page.

Curation (same rule as #172)

  • Keep: proper-noun mistranslations / transliterations / coined anchored phrases (Claudio, Anthropique, Клод Код, Mã Claude, Claude-Code, Schrägstrich-Befehl, sous-agent, …). Russian keeps its loanwords (Плагин, хук, Коворк) matching the Korean precedent (플러그인/후크/코워크).
  • Drop: the languages' everyday words/phrases (company-words, skills-words, extension-words, hook-words, collaboration-phrases, sending-words, computer-use phrases, preamble-words, personal-words).
  • it.json bonus: removes the Spanish leftovers inherited from the es-derived v1 (Código Claude, habilidades, gancho, …) and the actively wrong "Plugin"→"Plugins" mapping.

Proof (TDD)

Extended the real-dictionary regression test with ordinary-prose sentences for all 7 locales — all 7 failed against the old dictionaries and pass after the sweep (42/42). CJK guard tests unchanged.

Gates: 527 unit · full E2E 19/19 · validate · glossary · check:* — green. Plugin terms data regenerated in sync. Store zips deliberately not rebuilt (owner builds on instruction).

🤖 Generated with Claude Code

The #172 sweep removed dangerous common-word wrong-forms from the CJK
dictionaries only. The same bug class was still live in es/fr/it/de/pt-BR/
ru/vi: everyday words were registered as brand "wrong-forms", so the
unanchored replaceAll in restoreProtectedTerms rewrote correct prose into
English on virtually every lesson — e.g.

  de  "die Zusammenarbeit im Unternehmen" -> "die Cowork im Enterprise"
  it  "le competenze che svilupperai"     -> "le skills che svilupperai"
  fr  "l'extension de navigateur"         -> "l'Plugin de navigateur"
  ru  "эти навыки общения"                -> "эти Skills общения"
  vi  "phần đầu của bài học"              -> "frontmatter của bài học"

Education content uses these words (skills/compétences/competenze/навыки/
kỹ năng …) constantly, so 7 of the 11 premium languages were degraded on
every page.

Curation rule (same as #172): keep only proper-noun mistranslations,
transliterations, and coined anchored phrases (Claudio, Anthropique,
Клод Код, Mã Claude, Claude-Code, Schrägstrich-Befehl, sous-agent, …);
drop the everyday words/phrases (Enterprise/Unternehmen/Empresa/Impresa,
skills-words, Plugin/extension/Erweiterung/complemento, hook/crochet/
Haken/gancho/móc, Cowork/Zusammenarbeit/travail collaboratif, Dispatch/
envío/envio/Отправка, Computer-Use phrases, frontmatter/préambule/
preámbulo/phần đầu, Personal-words). Russian keeps its loanword forms
(Плагин, хук, Коворк, Диспетчеризация) matching the Korean precedent
(플러그인/후크/코워크). it.json also loses the Spanish leftovers its
_protected inherited from the es-derived v1 (Código Claude, habilidades,
gancho, …) and the actively wrong "Plugin"->"Plugins" mapping.

Proof: extended the real-dictionary regression test with ordinary-prose
sentences for all 7 locales — all 7 failed against the old dictionaries
and pass after the sweep (42/42). Plugin terms data regenerated in sync.

Gates: 527 unit tests, full E2E 19/19, validate, glossary, check:plugin/
i18n/dict-coverage/locales/dicts, lint, prettier — green. Store zips NOT
rebuilt (owner builds on instruction).
@heznpc heznpc enabled auto-merge (squash) June 10, 2026 05:36
@heznpc heznpc merged commit 7bf8b25 into main Jun 10, 2026
9 checks passed
@heznpc heznpc deleted the fix/protected-terms-latin-sweep branch June 10, 2026 05:37
heznpc added a commit that referenced this pull request Jun 10, 2026
… + machine-readable) (#203)

Makes the moat visible and the QA state machine-readable instead of claimed:

- _meta gains two QA fields in all 11 dictionaries: lastAudited (stamped by
  the pre-release LLM audit; 2026-06-10 for the audit that shipped in
  #197/#199) and nativeReview ("recruiting" -> "reviewed" after a native pass).
- generate-docs.js gains a LOCALE_QA marker: the README per-locale QA table
  (entries / last curated / last audit / native-review status) is generated
  from _meta by `npm run docs`, so the public table cannot drift from reality.
- README "Terminology QA" section: the standing pipeline (drift watcher ->
  same-day dictionary wiring -> CI gates -> real-dictionary regression suite)
  with the verifiable same-day proof (2026-06-10: #196 detected morning,
  #201 wired all 11 locales same day) + the generated table + native-reviewer
  call.
- docs/TRANSLATION_QA.md: the three-layer assurance model, honest about what
  each layer does NOT catch and why no paid API can sit in CI (free-forever).
- RELEASE_CHECKLIST step 0: the pre-release LLM dictionary audit is now a
  release convention, with the _meta.lastAudited stamping step.
- CONTRIBUTING "Native language reviewers" section; recruitment umbrella
  issue #202 (help wanted / good first issue / i18n) with per-locale
  checklist.

Gates: validate, glossary, check:i18n/dict-coverage/locales/dicts/plugin,
527 unit tests, lint, prettier — green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant