Translate and maintain software localization files with one CLI and one GUI.
This repository is for software-localization work, not generic document translation. It gives you a shared backend for five jobs:
- translate unfinished localization files
- revise existing translations with a precise instruction
- check translated PO or TS files for QA issues
- extract glossary terms with a model
- discover glossary candidates locally without any API call
Supported formats:
.po.ts.resx.strings.txt- Android
<resources>XML
Supported providers:
- Gemini
- OpenAI
- Anthropic
The toolkit is built around three ideas:
- One task-oriented CLI:
translate,revise,check,extract-terms,extract-terms-local - Shared language resources: vocabulary and rules are loaded by target language
- One format backend: supported file types are normalized into a shared entry model and then written back in their native format
That matters because the same vocabulary, rules, runtime controls, batching, and format handling are reused across CLI and GUI instead of being reimplemented per script.
The preferred entry point is:
python translate_cli.py
translate_cli.py is the main CLI surface, and process_gui.py is the GUI entry point.
Detailed task guides live in docs/:
docs/translate.mddocs/check.mddocs/extract.mddocs/extract-local.mddocs/revise.mddocs/extraction-refactor.md
Install dependencies:
pip install -r requirements.txtSet the API key for the provider you want to use:
$env:GOOGLE_API_KEY = "your_google_api_key"
$env:OPENAI_API_KEY = "your_openai_api_key"
$env:ANTHROPIC_API_KEY = "your_anthropic_api_key"Notes:
- You only need the key for the provider you are actually using.
- Gemini can run against AI Studio or Vertex API-key mode.
- Vertex API-key mode currently supports the
globalendpoint only.
Translate one file:
python translate_cli.py translate source.po
python translate_cli.py translate source.ts
python translate_cli.py translate source.resx
python translate_cli.py translate source.strings
python translate_cli.py translate source.txtTranslate several files in one run when they are the same format:
python translate_cli.py translate first.po second.po third.poChoose provider and model explicitly:
python translate_cli.py translate source.po --provider openai --model your-model
python translate_cli.py translate source.po --provider anthropic --model your-model
python translate_cli.py translate source.po --provider gemini --model your-modelUseful controls:
python translate_cli.py translate source.po --target-lang fr
python translate_cli.py translate source.po --thinking-level medium
python translate_cli.py translate source.po --batch-size 100 --parallel-requests 4
python translate_cli.py translate source.po --retranslate-all
python translate_cli.py translate source.po --flex
python translate_cli.py translate source.po --warnings-reportBehavior:
- default target language is
kk - by default, only unfinished messages are translated
--retranslate-allforces already translated messages through translation again- recursive directory translation skips generated toolkit artifacts such as
*.ai-translated.*,*.glossary.po,*.missing-terms.po, and*.prototype-*.po - when the scan root is this toolkit repository itself, recursive translation also skips toolkit-owned directories such as
data/,logs/,docs/,tests/,tasks/, andcore/ - translated output is written as
*.ai-translated.<ext> --warnings-reportalso writes*.translation-warnings.jsonwith only the messages where the model reported ambiguity, unclear meaning, risky glossary choice, or another review-worthy concern
Warnings sidecar behavior:
- warnings are emitted per translated message, not as one batch-level summary
- each warning item includes structured issues with
code,message, andseverity, plus the source text, translated text, and any availablecontext,note, or matchedrelevant_vocabulary - translation warning codes use the
translate.*namespace, for exampletranslate.ambiguous_term severityiswarningfor real risk or ambiguity, andinfofor notable but non-risk notes such as preserved structure- this is a lightweight translator self-report; the dedicated
checktask remains the real QA pass
Android translated exports often contain only resource IDs on the target side, so translation uses a paired-source workflow:
python translate_cli.py translate translated.xml --source-file source.xmlThe Android XML backend:
- supports
<string>and<plurals> - pairs
<string>items by resource name - pairs
<plurals>by resource name plus quantity - preserves inline XML such as
<xliff:g> - preserves literal escapes such as
\nin source style
Current translation constraint:
- Android
.xmltranslation currently supports one target file at a time and requires--source-file
Use revise when a file is already translated and you want targeted changes rather than full retranslation.
For formats that still contain source and translation together:
python translate_cli.py revise translated.po --instruction "Use a shorter term for Preferences"
python translate_cli.py revise translated.ts --instruction "Replace archive with package where the source says package"For formats where the translated file no longer carries the original source text, pass the matching source file:
python translate_cli.py revise translated.ai-translated.xml --source-file source.xml --instruction "Use natural confirmation questions and preserve literal \\n escapes"
python translate_cli.py revise translated.ai-translated.strings --source-file source.strings --instruction "Shorten viewer labels where possible"
python translate_cli.py revise translated.ai-translated.resx --source-file source.resx --instruction "Use command bar instead of toolbar"
python translate_cli.py revise translated.txt --source-file source.txt --instruction "Use formal tone for Exit"Revision behavior:
- default output path is
<input>.revised.<ext> --in-placeoverwrites the translated input file--dry-runreviews and reports changes without writing output- changed AI-reviewed entries are marked as review-required where the format supports it
Use check for QA on an already translated PO or TS file:
python translate_cli.py check translated.po
python translate_cli.py check translated.ts
python translate_cli.py check translated.po --probe 50
python translate_cli.py check translated.po --out report.json --include-okThe checker combines model findings with deterministic local checks for:
- placeholders
- tags
- accelerators
- plural slots
- approved vocabulary usage
Default output path:
translated.translation-check.json
Check-report issue shape:
- each issue uses a structured
code,message, andseverity - check issue codes use the
check.*namespace, for examplecheck.meaning,check.placeholder, orcheck.terminology
Use extract-terms when you want the model to propose glossary entries:
python translate_cli.py extract-terms source.po
python translate_cli.py extract-terms source.xmlUseful variants:
python translate_cli.py extract-terms source.po --mode missing --vocab data/locales/kk/vocab --out-format po
python translate_cli.py extract-terms source.po --mode missing --out-format json --vocab data/locales/kk/vocab
python translate_cli.py extract-terms source.po --out glossary.po --batch-size 200 --parallel-requests 4Modes:
all: build a broader glossary from the source contentmissing: focus on terms that are not already in your existing vocabulary
Output defaults:
all+po-><input>.glossary.pomissing+po-><input>.missing-terms.pomissing+json-><input>.missing-terms.json
When you run missing-term extraction with --out-format po, the generated PO is designed to go straight back into review and then translation:
- known terms from the supplied vocabulary are imported
- new missing terms are added as reviewable entries
Use extract-terms-local when you want a fast local analysis pass with no API call.
Single-file usage:
python translate_cli.py extract-terms-local source.po
python translate_cli.py extract-terms-local source.xml --mode missing --max-length 1Directory-tree usage:
python translate_cli.py extract-terms-local C:\path\to\source-tree --also-poConvert a local JSON report into a PO handoff:
python translate_cli.py extract-terms-local source.prototype-missing-terms.json --to-poLocal extraction behavior:
- works on one supported source file or a whole directory tree
- deduplicates repeated source messages across files
- writes a JSON report with accepted, borderline, and translation-candidate terms
- can also write a PO handoff file with
--also-po
The local extractor deliberately filters common localization noise before scoring terms, including:
- placeholders and variable-like tokens
- CLI flags and digit-led labels
- mnemonic fragments such as underscore accelerators
- URL, tag, and attribute noise such as
href,src, domain fragments, and embedded markup payloads
For larger localization work, the recommended flow is:
- Run local extraction first.
- Translate the resulting glossary PO handoff.
- Review and approve that glossary.
- Use the approved glossary as the vocabulary base for the main translation.
- Review and approve the main translated source file.
In practice, that looks like this:
# 1. Local extraction from one file or a source tree
python translate_cli.py extract-terms-local source.po --mode missing --also-po
python translate_cli.py extract-terms-local C:\path\to\source-tree --mode missing --also-po
# 2. Translate the generated glossary PO handoff
python translate_cli.py translate source.prototype-missing-terms.po --vocab data/locales/kk/vocab
# 3. Review and approve the glossary PO
# Keep only good terms, fix bad translations, and save the approved glossary.
# 4. Use the approved glossary as the base vocabulary for the main translation
python translate_cli.py translate source.po --vocab approved-glossary.po
# 5. Review and approve the main translated source fileWhy this workflow is recommended:
extract-terms-localcan deterministically avoid terms already present in your approved vocabulary and skip local noise such as stop words, excluded abbreviations, placeholders, tags, and weak phrase candidates- the glossary is reviewed before bulk translation, so terminology is stabilized early
- the main
translatetask can load the approved glossary PO directly through--vocab - the final source translation still needs review, because approved terminology does not replace full QA
Keep a distinction between:
- candidate glossary output from local extraction
- approved glossary used as translation input
That approved glossary can stay as a reviewed .po passed with --vocab, or it can be merged into your canonical locale vocabulary under data/locales/<target-lang>/.
By default, the toolkit looks up language resources from data/locales/<target-lang>/.
Auto-detected resources:
data/locales/<target-lang>/vocab.txtdata/locales/<target-lang>/vocab/data/locales/<target-lang>/rules.md
Locale fallback is supported. For example, fr_CA falls back to fr if the region-specific resource is not present.
You can override both resources per run:
python translate_cli.py translate source.po --vocab custom-vocab.txt --rules custom-rules.md
python translate_cli.py translate source.po --vocab custom-vocab --rules-str "Use concise imperative labels."--vocab accepts:
- a glossary
.txt - a glossary
.po - a glossary
.tbx - a directory containing glossary
.txt,.po, and.tbxfiles
When a vocabulary directory is used:
- files are loaded in filename order
- later duplicates override earlier ones
Recommended layout:
data/
locales/
kk/
vocab/
common.txt
colors.txt
media.txt
rules.md
fr/
vocab.txt
rules.md
extract/
common/
abbreviations.txt
excluded_terms.txt
en/
stopwords.txt
low_value_words.txt
fixed_multiword_allowlist.txt
Rich vocabulary entries use this schema:
source_term|target_term|part_of_speech|context_note
Example:
archive|package|noun|software package manager context
save|store|verb|short imperative UI action
During translation, the toolkit still sends the full vocabulary for compatibility, but it also computes relevant_vocabulary per message so each message sees the subset of glossary entries that actually match it.
When warnings reporting is enabled, the translation response can also include a per-message warnings field. Those warnings are written to a separate JSON sidecar so you can inspect ambiguous or risky messages without rereading the whole translated file.
| Format | Translate | Revise | Extract Terms | Notes |
|---|---|---|---|---|
.po |
Yes | Yes | Yes | Source and translation live together |
.ts |
Yes | Yes | Yes | Source and translation live together |
.resx |
Yes | Yes | Yes | Revision requires --source-file |
.strings |
Yes | Yes | Yes | Revision requires --source-file |
.txt |
Yes | Yes | Yes | One line equals one message; revision requires --source-file |
Android .xml |
Yes | Yes | Yes | Translation and revision use paired-source matching |
Additional notes:
checkcurrently supports translated.poand.tsfiles.stringstranslation treats commented key/value entries as untranslated source entries and uncommented entries as translated entries.stringsoutput preserves file encoding and common literal escape sequences.txtoutput preserves original line order and blank lines
The Tk desktop UI is available here:
python process_gui.pyThe GUI is a frontend over the same backend concepts as the CLI. It includes:
- shared provider, model, API-key, thinking, and runtime controls
- instruction preview for the resolved system prompt and language rules
- a
Translatetab with AndroidSource filesupport - a
Local Extracttab for file, folder, and JSON-to-PO local extraction workflows
Translate-tab note:
- the GUI enables the translation warnings JSON sidecar by default
- a normal translate run writes the translated output file and a matching
*.translation-warnings.jsonreport
The repository is intentionally split between shared mechanics and task-specific logic:
translate_cli.py unified CLI entry point
process_gui.py Tk frontend
tasks/ task-specific contracts and runners
core/ formats, providers, runtime, resources, shared helpers
data/locales/ per-language vocabulary and rules
data/extract/ local-extraction stop words and filters
tests/ smoke and regression coverage
If you are changing behavior, the important design line is:
core/owns shared mechanicstasks/owns task-specific prompts, schemas, and result handlingprocess_gui.pyshould stay frontend-oriented
If Poedit complains after placeholder order changes, check the format flag on the entry:
#, c-format: reordering is allowed with positional placeholders such as%2$s,%1$s#, python-format:%2$sis not valid, so plain%splaceholders cannot be safely reordered
For python-format, safe reordering requires named placeholders in the source, for example:
msgid "From %(src)s to %(dst)s"
msgstr "%(dst)s to %(src)s"Always preserve the same placeholder set and types between source and translation.
python -m unittest discover -s tests -p "test_*.py" -v