An AI-powered translation system that utilizes a multi-agent approach to produce high-quality translations between different languages. The system leverages LangGraph for orchestrating the translation workflow and Streamlit for the user interface.
- Multi-Agent Architecture: Separate agents for processing dictionary, grammar, examples, translation, and validation
- Context-Aware Translation: Support for providing custom dictionaries, grammar rules, and translation examples, and process them in parallel
- Quality Assurance: Optional LLM as a judge to evaluate and refine translations
- Large Text Support: LLM based chunking for processing large corpus texts in parallel and efficiently
- Consistency Checking: Ensures terminology and style consistency across document chunks
- Flexible Model Selection: Support for different backbone models for agents
app.py: Main Streamlit application with user interfacegraph.py: LangGraph workflow definitions and agent implementationsprompts.py: Prompt templates for different agents in the systemrequirements.txt: Required Python dependenciesdrawgraph.py (TODO): Utility to visualize the LangGraph workflowdata/: Sample dictionary, grammar, and translation examples
- Python 3.12+
- Endpoint API key and URL
-
Clone the repository:
git clone https://github.com/mmheydari97/translation-agent-system.git cd translation-agent-system -
Create and activate a virtual environment (recommended):
uv venv .ling_venv --python=python3.12 # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate
-
Install the required dependencies:
uv pip install -r requirements.txt
Create a secrets.toml file in the .streamlit directory with your Azure OpenAI API credentials:
[openrouter]
ENDPOINT_URL = "https://your-endpoint.com/"
API_KEY = "your-api-key"
[ollama]
ENDPOINT_URL = "http://localhost:11434/v1"
API_KEY = "your-api-key"-
Start the Streamlit application:
streamlit run app.py
-
Open your browser and navigate to the URL shown in the terminal (usually http://localhost:8501)
-
Configure the translation settings:
- Select your translation model (Gpt-4o-mini or Phi-3-small-8k-instruct)
- Enter the source and target languages
- Input the text to translate
- Optionally upload reference files (dictionary, grammar rules, examples)
- Choose between "Single Sentence" or "Large Corpus" processing mode
- Toggle the use of the LLM judge for quality evaluation
-
Click the "Generate Translation" button and wait for the results
The main Streamlit application that provides the user interface for the translation system. Key components:
- API Configuration: Sidebar for selecting translation models and entering API credentials
- Translation Form: Input fields for source/target languages and text to translate
- Context Options: File uploaders for dictionary, grammar rules, and examples
- Processing Mode: Option to process as single sentence or large corpus
- Results Display: Shows translation prompts, initial translations, judge feedback, and final translations
Defines the LangGraph workflows and agent implementations:
-
State Schema:
TranslationStateTypedDict for the graph state -
Agent Nodes:
process_dictionary: Extracts relevant word pairs from the dictionaryprocess_grammar: Processes grammar rules for translationprocess_examples: Extracts relevant translation examplesperform_translation: Performs the main translation taskjudge_translation: Evaluates the translation qualityretry_translation: Refines the translation based on judge feedbacksplit_corpus: Splits large texts into manageable chunksprocess_chunks: Processes chunks in parallelcheck_consistency: Ensures consistency across translated chunks
-
Workflow Graphs:
build_translation_graph(): For single sentence translationbuild_supernode_graph(): For large corpus translation
Contains all prompt templates used by the various agents:
get_system_prompt(): System prompt for the translation agentsget_translation_prompt(): Main translation prompt templateget_dictionary_prompt(): Prompt for dictionary processingget_grammar_prompt(): Prompt for grammar rule processingget_examples_prompt(): Prompt for example translation processingget_judge_prompt(): Prompt for translation quality evaluationget_retry_prompt(): Prompt for translation refinementget_chunking_prompt(): Prompt for text chunkingget_consistency_prompt(): Prompt for consistency checking
A utility script to visualize the LangGraph workflow using ASCII art, helpful for understanding the agent interaction flow.
The data/ directory contains sample files that can be used as references:
dictionary.txt: Sample dictionary with word pairsgrammar.txt: Sample grammar rulessamples.txt: Sample translation examples
- Resource Processing: Dictionary, grammar, and examples are processed in parallel
- Initial Translation: The main translation is performed using the processed resources
- Quality Evaluation: The judge agent evaluates the translation quality
- Refinement (if needed): If the judge finds issues, the translation is refined
- Final Result: The system returns the final translation
- Text Chunking: The large text is split into manageable chunks
- Parallel Processing: Each chunk is processed through the single sentence workflow
- Consistency Checking: The system checks for consistency across translated chunks
- Final Assembly: The final translation is assembled from the chunks with consistency ensured
- Dictionary: Create a text file with word pairs in the format
source_word : target_word - Grammar Rules: Create a text file with grammar rules for the target language in mermaid (.md) format
- Examples: Create a text file with example translations in the format
source: sentence
target: translationThe system supports any model deployed to your Azure OpenAI resource. To use a different model:
- Update the
secrets.tomlfile with credentials for your model - Add your model to the model selection dropdown in
app.py
- API Errors: Ensure your API keys and endpoints are correct (verify with Postman)
- Memory Issues: For very large texts, consider adjusting the chunk size in
prompts.pyand max token ingraph.py - Timeout Errors: Large texts or complex translations may take longer to process
The full list of dependencies from requirements.txt:
- streamlit: For the web interface
- langgraph: For orchestrating the agent workflow
- pydantic: For data validation and settings management
- litellm: For LLM API abstraction
- python-dotenv: For environment variable management
- instructor: For structured outputs from LLMs