Skip to content

xjsongphy/llm_batch_translate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Batch Translate

Batch file translation tool using LLM APIs with litellm.

Features

  • Translate single files or entire directories
  • Support for 25+ languages
  • 100+ LLM providers via litellm (Anthropic, OpenAI, DashScope, DeepSeek, etc.)
  • Configurable context window for large files
  • Chunk-based translation for long documents
  • Preserves formatting, code blocks, and structure
  • CLI and Python API

Installation

From GitHub

pip install git+https://github.com/xjsongphy/llm_batch_translate.git

Local Development Install

git clone https://github.com/xjsongphy/llm_batch_translate.git
cd llm-batch-translate
pip install -e .

Quick Start

1. Create Configuration File

# Run setup command (copies example config to default location)
llm-translate setup

# Or manually:
# mkdir -p ~/.config/llm-batch-translate
# cp .config/example.config.yml ~/.config/llm-batch-translate/config.yml

# Edit the file and add your API key

2. Edit Config File

Edit ~/.config/llm-batch-translate/config.yml:

llm:
  # Your API key
  api_key: "your-api-key-here"

  # API base URL (optional - auto-detected from model)
  api_url: ""

  # Model (use provider prefix for non-Anthropic)
  model: "claude-3-5-sonnet-20241022"

  max_tokens: 8192
  context_window: 200000

translation:
  source_lang: "en"
  target_lang: "zh-cn"

Supported Models (via litellm)

Provider Model Format Example
Anthropic claude-* (no prefix) claude-3-5-sonnet-20241022
OpenAI openai/* openai/gpt-4
DashScope/Qwen openai/* openai/qwen-turbo-latest
DeepSeek deepseek/* deepseek/deepseek-chat

See litellm providers for all 100+ options.

3. Translate Files

# Single file
llm-translate input.txt -o output.txt -s en -t zh-cn

# Multiple files to directory
llm-translate *.md -d translated/

# Directory recursively
llm-translate ./docs/ -d translated_docs/ -r

Configuration

Config File Locations

The tool searches for config files in this order:

  1. ~/.config/llm-batch-translate/config.yml
  2. ~/.llm-batch-translate.yml
  3. ./config.yml (current directory)

Config File Structure

llm:
  # Required: Your API key
  api_key: "sk-ant-api03-xxxxxxxxxxxx"

  # Optional: API URL
  api_url: "https://api.anthropic.com"

  # Optional: Model name
  model: "claude-3-5-sonnet-20241022"

  # Optional: Max output tokens
  max_tokens: 8192

  # Optional: Request timeout
  timeout: 60

  # Optional: Context window size
  context_window: 200000

  # Optional: Reserved tokens for output
  reserved_output_tokens: 16384

translation:
  # Optional: Default source language
  source_lang: "en"

  # Optional: Default target language
  target_lang: "zh-cn"

  # Optional: Custom prompt template
  prompt_template: |
    Please translate from {source} to {target}:
    {text}

  # Optional: Chunk settings for large files
  chunk_size: 50000
  chunk_overlap: 500

CLI Usage

Main Commands

# Setup: Copy example config to default location
llm-translate setup

# Translate files
llm-translate translate [OPTIONS] FILES...

# List supported languages
llm-translate languages

# Show configuration
llm-translate config

# Show config file locations
llm-translate config --show-path

Translate Options

llm-translate translate input.txt -o output.txt -s en -t zh-cn

Options:
  -s, --source TEXT       Source language code
  -t, --target TEXT       Target language code
  -o, --output PATH       Output file (single input)
  -d, --output-dir PATH   Output directory (multiple inputs)
  -r, --recursive         Process directories recursively
  -p, --pattern TEXT      File pattern (default: *)
  -e, --extensions TEXT   File extensions to include
  -c, --config PATH       Path to config file

Examples

# Setup: Initialize config file
llm-translate setup

# Translate English to Chinese
llm-translate translate README.md -o README_zh.md -s en -t zh-cn

# Translate all markdown files in a directory
llm-translate translate ./docs/ -d docs_ja/ -s en -t ja -e .md -r

# Use custom config file
llm-translate translate -c custom.yml file.txt -o out.txt

# Check supported languages
llm-translate languages

# View current config
llm-translate config

Supported Languages

Code Language Code Language
en English zh Chinese
zh-cn Simplified Chinese zh-tw Traditional Chinese
ja Japanese ko Korean
es Spanish fr French
de German it Italian
pt Portuguese ru Russian
ar Arabic hi Hindi
vi Vietnamese th Thai
nl Dutch pl Polish
tr Turkish uk Ukrainian
cs Czech sv Swedish
da Danish fi Finnish
no Norwegian

Run llm-translate languages for the full list.

Python API

from llm_batch_translate import Config, Translator

# Load configuration
config = Config.from_file("config.yml")

# Create translator
translator = Translator(config)

# Translate text
result = translator.translate("Hello, world!", source="en", target="zh-cn")
print(result)

# Translate file
result = translator.translate_file("input.txt", "output.txt", source="en", target="ja")
if result.success:
    print(f"Translated {result.chunks_count} chunks")

# Translate directory
results = translator.translate_directory(
    "./docs/",
    "./docs_translated/",
    source="en",
    target="zh-cn",
    recursive=True,
    extensions=[".md", ".txt"]
)

Context Window Configuration

The context window determines how much text can be processed at once.

The effective input size is calculated as:

max_input = context_window - max_tokens - reserved_output_tokens

For models with larger context windows, adjust in config file:

llm:
  context_window: 200000
  max_tokens: 16384
  reserved_output_tokens: 16384

File Format Support

  • Text files (.txt)
  • Markdown (.md)
  • Source code (.py, .js, .ts, .html, .css, etc.)
  • Config files (.json, .yaml, .toml, etc.)
  • Any plain text format

Troubleshooting

No Config File Found

FileNotFoundError: No config file found

Create a config file:

# Use setup command
llm-translate setup

# Or manually:
# mkdir -p ~/.config/llm-batch-translate
# cp .config/example.config.yml ~/.config/llm-batch-translate/config.yml

API Key Not Set

ValueError: api_key is required in config file

Edit your config file and add the API key.

Large Files

For large files, the text is automatically chunked. Adjust chunk size in config:

translation:
  chunk_size: 100000
  chunk_overlap: 1000

License

MIT License - see LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages