Skip to content

fix(i18n): re-translate Italian locale (was ~51% Spanish) + contamination guard#166

Merged
heznpc merged 3 commits into
mainfrom
fix/it-locale-spanish-contamination
Jun 2, 2026
Merged

fix(i18n): re-translate Italian locale (was ~51% Spanish) + contamination guard#166
heznpc merged 3 commits into
mainfrom
fix/it-locale-spanish-contamination

Conversation

@heznpc

@heznpc heznpc commented Jun 2, 2026

Copy link
Copy Markdown
Owner

What

src/data/it.json — the Italian locale dictionary — had been built from es.json and only partially re-translated. 632 of its long strings were byte-identical Spanish, and the _protected brand map mistranslated brand names (Claude → Claudio, Anthropic → Antropico, Claude Code → Codice Claudio), which silently broke runtime brand-term restoration for Italian users.

This matters because Italy is our #1 install market (verified against the CWS CSV exports), with growth traced to an Italian developer blog that recommends SkillBridge specifically as the Academy translation solution — so the strongest market was getting Spanish course content.

The existing checks (check-i18n, check-dict-coverage, 488 unit tests) all passed on the broken file because they verify key/shape parity, not that values are in the right language.

Fix

  • Re-translated every contaminated string (exact es-copies + Spanish/hybrid forms) from the English source keys, via the same Google Translate endpoint the extension itself uses (translate.googleapis.com/translate_a/single, tl=it), then restored brand/technical terms to canonical English.
  • Rebuilt _protected with the correct Italian wrong-forms (Claude → ["Claudio"], Anthropic → ["Antropico"], …) so runtime restoration works for Italian.
  • it↔es overlap: 51% → 0.1% (now at parity with the other 10 locales).

Re-prevention

New guard scripts/check-locale-contamination.js (npm run check:locales, wired into ci.yml) fails when any locale shares >8% of its long strings with another. Clean locales sit at ≤2.1%; the contaminated Italian file was 51%. This closes the blind spot that let the bug ship.

Gates

488 tests · lint · check:i18n · check:dict-coverage · check:glossary · check:academy · check:locales — all green.

Found while auditing for pre-CWS-republication hardening. Locale content fix; no logic changes.

🤖 Generated with Claude Code

…mination guard

src/data/it.json had been populated from es.json and only partially
re-translated: 632 long strings were byte-identical Spanish, and the
_protected brand map mistranslated Claude→Claudio / Anthropic→Antropico /
Claude Code→Codice Claudio — silently breaking runtime brand-term restoration
for Italian, which is our #1 install market (verified via CWS CSV; growth
driven by an Italian dev-blog recommendation).

- Re-translated every contaminated string (exact es-copies + Spanish/hybrid
  forms) from the English source keys via the extension's own GT endpoint,
  restored brand/technical terms to canonical English, rebuilt _protected with
  correct Italian wrong-forms. it↔es overlap: 51% → 0.1% (parity with 10 others).
- New guard scripts/check-locale-contamination.js (npm run check:locales, wired
  into ci.yml) fails when a locale shares >8% of long strings with another —
  catches the wrong-language bug class that check-i18n/dict-coverage miss
  (they verify key/shape, not language).

Gates green: 488 tests, lint, i18n, dict-coverage, glossary, academy, guard.
@heznpc heznpc enabled auto-merge (squash) June 2, 2026 04:22
@heznpc heznpc merged commit b9dd427 into main Jun 2, 2026
9 checks passed
@heznpc heznpc deleted the fix/it-locale-spanish-contamination branch June 2, 2026 04:26
heznpc added a commit that referenced this pull request Jun 3, 2026
…rids (#167)

#166 re-translated the exact es-copies and token-matched Spanish, but a
language-detection (franc) + Spanish-only-character ([áíóúñ¿¡]) re-audit found
~89 residual contaminated strings it missed — partially-Italianized hybrids
like "Conectando tus Strumenti", "Configurazione de tokens máximos",
"Qué información recopila Skilljar..." that are neither exact copies nor
caught by token regexes. Single heuristics keep leaking.

Since the file is a botched es-copy with no hand-curation worth preserving,
regenerated EVERY body string (1,101 leaves) from the English source keys via
the extension's own GT endpoint, with brand-term restoration. Verified clean
by three independent methods converging to ~0:
- Spanish-only characters [áíóúñ¿¡]: 26 → 0
- franc=spa (long strings): only the 8-string false-positive floor (all
  confirmed correct Italian, e.g. "Pianifica i tuoi prossimi passi con Claude")
- exact es-copy: 1 ("Claude con Amazon Bedrock", identical & valid in both)

Gates green: validate, i18n, dict-coverage, glossary, academy, check:locales,
488 tests, lint. _protected and brand terms (Claude/Anthropic/Claude Code/
Cowork/Skilljar) preserved.
heznpc added a commit that referenced this pull request Jun 9, 2026
…tency) (#181)

A verified readiness audit found the code is done (505 tests, 0 open issues)
but front-door docs had drifted. Fixes (all factual/compliance, not the
deferred strategy docs):

Factual errors (were misleading users/owner):
- README Installation said the CWS listing "was removed ... not currently
  available" (full delisting). It is actually live as v1.0.1 in all locales
  except the US (removed 2026-05-12 over the old icon). Corrected to match
  POSITIONING (the source of truth).
- RELEASE_CHECKLIST pointed at store-assets/promotion/ drafts that were purged
  and no longer exist. Removed the dead pointer (drafts are kept off-repo).

Stale (now closed):
- CHANGELOG [Unreleased] was missing #167/#170/#172/#174/#175/#176/#179/#180;
  added them.
- it.json _meta.translation_provenance + lastUpdated (and the matching
  constants.js comment, README locale-table cell) still said "v1, Spanish-
  derived regex" — it was re-translated from English in #166/#167 (overlap
  now 0.1%). Updated; regenerated plugin data accordingly.
- TESTING.md listed "E2E flows" under "What is NOT tested" — the Playwright
  E2E suite exists and runs in CI. Reframed to describe what E2E covers.
- PRIVACY_POLICY "Last updated" dateline was April 11 despite June changes.

Gates green: 505 tests, lint, prettier, validate, check:plugin/dicts/locales/
i18n/dict-coverage, full E2E (17). Deferred strategy docs (POSITIONING,
quarter-focus) untouched — owned by the separate doc-cleanup session.
heznpc added a commit that referenced this pull request Jun 9, 2026
Mechanical version cut so the dashboard upload ships everything merged since
the 3.5.39 tag (#166#194: Italian re-translation + locale guard, protected-
terms CJK fix, tutor stream fixes, Gemini-verify guard, FAB icon + reset-button
host-CSS fixes, "Claude(Claude)" gloss collapse, chip alignment, shadow-root
isolation, doc consistency).

- manifest.json / package.json / package-lock.json → 3.5.40
- 11 src/data/*.json _meta.version → 3.5.40 (check:dict-coverage enforces the
  match); claude-plugin terms data regenerated in sync
- CHANGELOG: [Unreleased] → [3.5.40] - 2026-06-10; fresh [Unreleased] added
- RELEASE_CHECKLIST refreshed for v3.5.40 (status block, prepared list,
  gate counts 520/520 + 19 e2e; historical 3.5.39 mentions kept)
- STORE_LISTING → v3.5.40; "What's new" gains the 3.5.40 user-facing items
  (tutor-button icon, chip alignment, "Claude(Claude)" fix, style isolation)
- README installation note + version markers (npm run docs) → 3.5.40

Artifacts rebuilt and verified at 3.5.40: store-assets/skillbridge-bundled.zip
(CWS upload) + skillbridge.zip (raw fallback) — both gitignored, upload from
local disk. Gates: 520 unit, full E2E 19/19, lint, prettier, validate, all
check:* green (check:cws-drift intentionally fails until the dashboard upload).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant