LLM Batch Translate

Batch file translation tool using LLM APIs with litellm.

Features

Translate single files or entire directories
Support for 25+ languages
100+ LLM providers via litellm (Anthropic, OpenAI, DashScope, DeepSeek, etc.)
Configurable context window for large files
Chunk-based translation for long documents
Preserves formatting, code blocks, and structure
CLI and Python API

Installation

From GitHub

pip install git+https://github.com/xjsongphy/llm_batch_translate.git

Local Development Install

git clone https://github.com/xjsongphy/llm_batch_translate.git
cd llm-batch-translate
pip install -e .

Quick Start

1. Create Configuration File

# Run setup command (copies example config to default location)
llm-translate setup

# Or manually:
# mkdir -p ~/.config/llm-batch-translate
# cp .config/example.config.yml ~/.config/llm-batch-translate/config.yml

# Edit the file and add your API key

2. Edit Config File

Edit ~/.config/llm-batch-translate/config.yml:

llm:
  # Your API key
  api_key: "your-api-key-here"

  # API base URL (optional - auto-detected from model)
  api_url: ""

  # Model (use provider prefix for non-Anthropic)
  model: "claude-3-5-sonnet-20241022"

  max_tokens: 8192
  context_window: 200000

translation:
  source_lang: "en"
  target_lang: "zh-cn"

Supported Models (via litellm)

Provider	Model Format	Example
Anthropic	`claude-*` (no prefix)	`claude-3-5-sonnet-20241022`
OpenAI	`openai/*`	`openai/gpt-4`
DashScope/Qwen	`openai/*`	`openai/qwen-turbo-latest`
DeepSeek	`deepseek/*`	`deepseek/deepseek-chat`

See litellm providers for all 100+ options.

3. Translate Files

# Single file
llm-translate input.txt -o output.txt -s en -t zh-cn

# Multiple files to directory
llm-translate *.md -d translated/

# Directory recursively
llm-translate ./docs/ -d translated_docs/ -r

Configuration

Config File Locations

The tool searches for config files in this order:

~/.config/llm-batch-translate/config.yml
~/.llm-batch-translate.yml
./config.yml (current directory)

Config File Structure

llm:
  # Required: Your API key
  api_key: "sk-ant-api03-xxxxxxxxxxxx"

  # Optional: API URL
  api_url: "https://api.anthropic.com"

  # Optional: Model name
  model: "claude-3-5-sonnet-20241022"

  # Optional: Max output tokens
  max_tokens: 8192

  # Optional: Request timeout
  timeout: 60

  # Optional: Context window size
  context_window: 200000

  # Optional: Reserved tokens for output
  reserved_output_tokens: 16384

translation:
  # Optional: Default source language
  source_lang: "en"

  # Optional: Default target language
  target_lang: "zh-cn"

  # Optional: Custom prompt template
  prompt_template: |
    Please translate from {source} to {target}:
    {text}

  # Optional: Chunk settings for large files
  chunk_size: 50000
  chunk_overlap: 500

CLI Usage

Main Commands

# Setup: Copy example config to default location
llm-translate setup

# Translate files
llm-translate translate [OPTIONS] FILES...

# List supported languages
llm-translate languages

# Show configuration
llm-translate config

# Show config file locations
llm-translate config --show-path

Translate Options

llm-translate translate input.txt -o output.txt -s en -t zh-cn

Options:
  -s, --source TEXT       Source language code
  -t, --target TEXT       Target language code
  -o, --output PATH       Output file (single input)
  -d, --output-dir PATH   Output directory (multiple inputs)
  -r, --recursive         Process directories recursively
  -p, --pattern TEXT      File pattern (default: *)
  -e, --extensions TEXT   File extensions to include
  -c, --config PATH       Path to config file

Examples

# Setup: Initialize config file
llm-translate setup

# Translate English to Chinese
llm-translate translate README.md -o README_zh.md -s en -t zh-cn

# Translate all markdown files in a directory
llm-translate translate ./docs/ -d docs_ja/ -s en -t ja -e .md -r

# Use custom config file
llm-translate translate -c custom.yml file.txt -o out.txt

# Check supported languages
llm-translate languages

# View current config
llm-translate config

Supported Languages

Code	Language	Code	Language
`en`	English	`zh`	Chinese
`zh-cn`	Simplified Chinese	`zh-tw`	Traditional Chinese
`ja`	Japanese	`ko`	Korean
`es`	Spanish	`fr`	French
`de`	German	`it`	Italian
`pt`	Portuguese	`ru`	Russian
`ar`	Arabic	`hi`	Hindi
`vi`	Vietnamese	`th`	Thai
`nl`	Dutch	`pl`	Polish
`tr`	Turkish	`uk`	Ukrainian
`cs`	Czech	`sv`	Swedish
`da`	Danish	`fi`	Finnish
`no`	Norwegian

Run llm-translate languages for the full list.

Python API

from llm_batch_translate import Config, Translator

# Load configuration
config = Config.from_file("config.yml")

# Create translator
translator = Translator(config)

# Translate text
result = translator.translate("Hello, world!", source="en", target="zh-cn")
print(result)

# Translate file
result = translator.translate_file("input.txt", "output.txt", source="en", target="ja")
if result.success:
    print(f"Translated {result.chunks_count} chunks")

# Translate directory
results = translator.translate_directory(
    "./docs/",
    "./docs_translated/",
    source="en",
    target="zh-cn",
    recursive=True,
    extensions=[".md", ".txt"]
)

Context Window Configuration

The context window determines how much text can be processed at once.

The effective input size is calculated as:

max_input = context_window - max_tokens - reserved_output_tokens

For models with larger context windows, adjust in config file:

llm:
  context_window: 200000
  max_tokens: 16384
  reserved_output_tokens: 16384

File Format Support

Text files (.txt)
Markdown (.md)
Source code (.py, .js, .ts, .html, .css, etc.)
Config files (.json, .yaml, .toml, etc.)
Any plain text format

Troubleshooting

No Config File Found

FileNotFoundError: No config file found

Create a config file:

# Use setup command
llm-translate setup

# Or manually:
# mkdir -p ~/.config/llm-batch-translate
# cp .config/example.config.yml ~/.config/llm-batch-translate/config.yml

API Key Not Set

ValueError: api_key is required in config file

Edit your config file and add the API key.

Large Files

For large files, the text is automatically chunked. Adjust chunk size in config:

translation:
  chunk_size: 100000
  chunk_overlap: 1000

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.config		.config
llm_batch_translate		llm_batch_translate
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Batch Translate

Features

Installation

From GitHub

Local Development Install

Quick Start

1. Create Configuration File

2. Edit Config File

Supported Models (via litellm)

3. Translate Files

Configuration

Config File Locations

Config File Structure

CLI Usage

Main Commands

Translate Options

Examples

Supported Languages

Python API

Context Window Configuration

File Format Support

Troubleshooting

No Config File Found

API Key Not Set

Large Files

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Batch Translate

Features

Installation

From GitHub

Local Development Install

Quick Start

1. Create Configuration File

2. Edit Config File

Supported Models (via litellm)

3. Translate Files

Configuration

Config File Locations

Config File Structure

CLI Usage

Main Commands

Translate Options

Examples

Supported Languages

Python API

Context Window Configuration

File Format Support

Troubleshooting

No Config File Found

API Key Not Set

Large Files

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages