Skip to content

fix(i18n): fully regenerate Italian body — #166 left ~89 residual hybrids#167

Merged
heznpc merged 1 commit into
mainfrom
fix/it-locale-residual-hybrids
Jun 3, 2026
Merged

fix(i18n): fully regenerate Italian body — #166 left ~89 residual hybrids#167
heznpc merged 1 commit into
mainfrom
fix/it-locale-residual-hybrids

Conversation

@heznpc

@heznpc heznpc commented Jun 3, 2026

Copy link
Copy Markdown
Owner

Why

While scoping further hardening, a re-audit of it.json with franc language detection + a Spanish-only-character scan ([áíóúñ¿¡] — Italian doesn't use these) found that #166 was incomplete: ~89 contaminated strings survived it. They're partially-Italianized hybrids that defeat single heuristics:

  • Conectando tus Strumenti
  • Configurazione de tokens máximos
  • Qué información recopila Skilljar sobre mi actividad de apprendimento?

These are neither exact es.json copies (so the contamination guard's exact-match missed them) nor caught by token regexes. Honest note: my #166 "it.json is clean" claim was wrong — single-method verification wasn't enough.

Fix

The file is a botched es-copy with no hand-curation worth preserving, so I regenerated every body string (1,101 leaves) from the English source keys via the extension's own GT endpoint, with brand-term restoration — no detection gaps possible.

Verification — three independent methods converge to ~0

Method Before After
Spanish-only chars [áíóúñ¿¡] 26 0
franc = Spanish (long strings) 63 8 (false-positive floor — all confirmed correct Italian, e.g. "Pianifica i tuoi prossimi passi con Claude")
exact es.json copy 1 ("Claude con Amazon Bedrock" — identical & valid in both)

Brand terms (Claude / Anthropic / Claude Code / Cowork / Skilljar) and _protected preserved. Quality sample: "Cos'è Skilljar e perché accedo?", "Procedura dettagliata sull'utilizzo di Claude Code...".

Gates: validate · i18n · dict-coverage · glossary · academy · check:locales · 488 tests · lint — all green.

🤖 Generated with Claude Code

…rids

#166 re-translated the exact es-copies and token-matched Spanish, but a
language-detection (franc) + Spanish-only-character ([áíóúñ¿¡]) re-audit found
~89 residual contaminated strings it missed — partially-Italianized hybrids
like "Conectando tus Strumenti", "Configurazione de tokens máximos",
"Qué información recopila Skilljar..." that are neither exact copies nor
caught by token regexes. Single heuristics keep leaking.

Since the file is a botched es-copy with no hand-curation worth preserving,
regenerated EVERY body string (1,101 leaves) from the English source keys via
the extension's own GT endpoint, with brand-term restoration. Verified clean
by three independent methods converging to ~0:
- Spanish-only characters [áíóúñ¿¡]: 26 → 0
- franc=spa (long strings): only the 8-string false-positive floor (all
  confirmed correct Italian, e.g. "Pianifica i tuoi prossimi passi con Claude")
- exact es-copy: 1 ("Claude con Amazon Bedrock", identical & valid in both)

Gates green: validate, i18n, dict-coverage, glossary, academy, check:locales,
488 tests, lint. _protected and brand terms (Claude/Anthropic/Claude Code/
Cowork/Skilljar) preserved.
@heznpc heznpc enabled auto-merge (squash) June 3, 2026 09:01
@heznpc heznpc merged commit 03af223 into main Jun 3, 2026
9 checks passed
@heznpc heznpc deleted the fix/it-locale-residual-hybrids branch June 3, 2026 09:02
heznpc added a commit that referenced this pull request Jun 9, 2026
…tency) (#181)

A verified readiness audit found the code is done (505 tests, 0 open issues)
but front-door docs had drifted. Fixes (all factual/compliance, not the
deferred strategy docs):

Factual errors (were misleading users/owner):
- README Installation said the CWS listing "was removed ... not currently
  available" (full delisting). It is actually live as v1.0.1 in all locales
  except the US (removed 2026-05-12 over the old icon). Corrected to match
  POSITIONING (the source of truth).
- RELEASE_CHECKLIST pointed at store-assets/promotion/ drafts that were purged
  and no longer exist. Removed the dead pointer (drafts are kept off-repo).

Stale (now closed):
- CHANGELOG [Unreleased] was missing #167/#170/#172/#174/#175/#176/#179/#180;
  added them.
- it.json _meta.translation_provenance + lastUpdated (and the matching
  constants.js comment, README locale-table cell) still said "v1, Spanish-
  derived regex" — it was re-translated from English in #166/#167 (overlap
  now 0.1%). Updated; regenerated plugin data accordingly.
- TESTING.md listed "E2E flows" under "What is NOT tested" — the Playwright
  E2E suite exists and runs in CI. Reframed to describe what E2E covers.
- PRIVACY_POLICY "Last updated" dateline was April 11 despite June changes.

Gates green: 505 tests, lint, prettier, validate, check:plugin/dicts/locales/
i18n/dict-coverage, full E2E (17). Deferred strategy docs (POSITIONING,
quarter-focus) untouched — owned by the separate doc-cleanup session.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant