DavidLMS · DavidLMS · Jul 1, 2025 · Jul 1, 2025 · Jul 1, 2025 · Jul 1, 2025
diff --git a/.gitignore b/.gitignore
@@ -1,7 +1,5 @@
 
 .DS_Store
-/aphra/__pycache__
-aphra/config.toml
-/tests/__pycache__
 model_calls.log
 config.toml
+__pycache__
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -110,13 +110,45 @@ Once in the cloned repository directory, create a new branch for your contributi
 git checkout -B <feature-description>
 ````
 
+### Understanding the Architecture
+
+🌐💬 Aphra uses a workflow architecture with the following key components:
+
+- **Core Components** (`aphra/core/`): Base classes, context management, and registry system,
+- **Workflows** (`aphra/workflows/`): High-level orchestrators that combine steps into complete translation processes.
+- **Prompts** (`aphra/prompts/`): Template files organized by workflow type.
+
 ### Contributing Workflow
 
-1. Make sure your code follows the style guide and passes linting with `pylint`.
-2. Write tests for any new functionality you add.
-3. Ensure all tests pass before submitting a pull request.
-4. Document any changes to APIs or core functionality.
-5. Submit your pull request, providing a clear and descriptive title and description of your changes.
+1. **Understand the Component You're Modifying:**
+   - **Core Components**: Changes here affect the entire system.
+   - **Workflows**: Should implement the `AbstractWorkflow` interface and be registered properly.
+   - **Prompts**: Should follow the existing template format and variable naming conventions.
+
+2. **Development Guidelines:**
+   - Make sure your code follows the style guide and passes linting with `pylint`.
+   - Write tests for any new functionality you add.
+   - For new workflows, inherit from `AbstractWorkflow` or `ArticleWorkflow` and implement required methods.
+   - Use method overriding rather than step composition for customization.
+   - Ensure all tests pass before submitting a pull request.
+   - Document any changes to APIs or core functionality.
+
+3. **Testing Requirements:**
+   - Test individual workflow methods in isolation.
+   - Test complete workflow execution end-to-end.
+   - Ensure compatibility with existing API.
+   - Run the full test suite: `python -m pytest tests/ -v`
+
+4. **Code Organization:**
+   - Place new workflows in `aphra/workflows/` and update the `__init__.py` file.
+   - Organize workflow-specific methods within the workflow class itself.
+   - Add prompts for new workflows in `aphra/prompts/<workflow_name>/`.
+   - Update core components only when adding fundamental functionality.
+
+5. **Submission:**
+   - Submit your pull request with a clear and descriptive title and description.
+   - Explain how your changes fit into the workflow architecture.
+   - Include examples of how to use any new components you've created.
 
 ### Improving The Documentation
 

diff --git a/README.md b/README.md
@@ -62,17 +62,50 @@ As [Virginia Woolf](https://en.wikipedia.org/wiki/Virginia_Woolf) famously said,
 
 ## How 🌐💬 Aphra Works
 
-🌐💬 Aphra employs a multi-stage, agentic approach to translation, designed to closely mimic the steps a human translator might take when working on a text. Here's how the process unfolds:
+🌐💬 Aphra employs a multi-stage, agentic approach to translation using a **workflow architecture** designed to closely mimic the steps a human translator might take when working on a text. The system is built around workflows that orchestrate translation through specialized methods.
 
-![aphra-diagram](aphra-diagram.png)
+### Architecture
 
-1. **Analysis**: The process begins with the "LLM Writer," a language model, analyzing the original text. During this analysis, the model identifies key expressions, terms, and entities that may pose challenges in translation, such as culturally specific references or industry jargon.
-2. **Search**: Following the analysis, the "LLM Searcher," a model with internet access, takes the identified terms and searches for additional context. This context could include definitions, background information, or examples of usage in different contexts, ensuring that the translation is well-informed and accurate.
-3. **Initial Translation**: Simultaneously with the search phase, another instance of the "LLM Writer" begins translating the original text without yet incorporating the contextual information gathered. This step focuses on producing a raw, direct translation that preserves the original style and structure of the text.
-4. **Critique**: Once the initial translation is complete and the context has been gathered, the "LLM Critic" comes into play. This model reviews the initial translation in light of the context and original text, providing feedback on areas where the translation could be improved. The critique might highlight potential misinterpretations, suggest alternative phrasings, or recommend adding translator notes for clarity.
-5. **Final Translation**: Finally, the "LLM Writer" revisits the translation, incorporating the critic's feedback and the contextual information gathered earlier. The result is a polished, contextually aware translation that is more nuanced and accurate than a simple literal translation, often including additional notes to guide the reader through complex or ambiguous parts of the text.
+Aphra's architecture consists of several key components:
 
-This structured approach enables 🌐💬 Aphra to produce translations that are not only linguistically accurate but also contextually rich, making it a valuable tool for small projects aiming to reach a global audience without the resources to hire a professional translator.
+- **Workflows**: Self-contained classes that implement complete translation processes using simple methods.
+- **Context**: Shared state management across the entire translation process.
+- **Registry**: Central discovery and management system for available workflows.
+- **Core Components**: LLM client, parsers, and utilities that workflows use internally.
+
+### Article Workflow (Default)
+
+![aphra-article-diagram](aphra-diagram.png)
+
+The default workflow implements the proven 5-step translation process using simple methods:
+
+1. **analyze()**: The "LLM Writer" analyzes the original text, identifying key expressions, terms, and entities that may pose challenges in translation, such as culturally specific references or industry jargon.
+
+2. **search()**: Any LLM model, enhanced with OpenRouter's web search capabilities, takes the identified terms and searches for real-time, up-to-date context. This includes current definitions, background information, or examples of usage in different contexts, ensuring that the translation is well-informed and accurate.
+
+3. **translate()**: The "LLM Writer" produces an initial translation that preserves the original style and structure of the text, focusing on linguistic accuracy while preparing for contextual refinement.
+
+4. **critique()**: The "LLM Critic" reviews the initial translation in light of the gathered context and original text, providing feedback on areas where the translation could be improved. The critique highlights potential misinterpretations, suggests alternative phrasings, or recommends adding translator notes for clarity.
+
+5. **refine()**: Finally, the "LLM Writer" creates the final translation, incorporating the critic's feedback and the contextual information gathered earlier. The result is a polished, contextually aware translation that is more nuanced and accurate than a simple literal translation.
+
+### Web Search Integration
+
+Aphra leverages OpenRouter's advanced web search capabilities:
+- **Universal Web Access**: Any model can now access real-time web information via OpenRouter's web plugin.
+- **High-Context Search**: Uses "high" search context for maximum information retrieval.
+- **Automatic Citations**: Web search results include proper source citations.
+- **Cost-Effective**: Powered by Exa search with transparent pricing ($4 per 1000 results).
+
+### Extensible Design
+
+The workflow architecture enables:
+- **Custom Workflows**: Create specialized translation workflows by inheriting from base classes and overriding methods.
+- **Method Reusability**: Individual methods can be reused by inheritance or composition.
+- **Easy Testing**: Each method can be tested independently.
+- **Future Expansion**: New workflows can be added without modifying existing code.
+
+This structured approach enables 🌐💬 Aphra to produce translations that are not only linguistically accurate but also contextually rich, while providing a solid foundation for extending the system to handle various types of content and use cases.
 
 ## Demo
 
@@ -312,19 +345,103 @@ In this example:
 
 ## Customizability and Ideas for Extensions
 
-🌐💬 Aphra is designed with flexibility in mind, making it easy to tailor to your specific needs. The simplest way to customize Aphra is by maintaining the defined call flow and modifying the prompts within the `prompts` folder for each step. This approach allows you to adapt the output to your desired outcome for most use cases.
+🌐💬 Aphra's workflow architecture is designed with extensibility and customization at its core. The system provides multiple levels of customization, from simple prompt modifications to creating entirely new workflows.
+
+### Customization Levels
+
+#### 1. Prompt Customization (Simplest)
+Modify the prompts within the `prompts/articles/` folder to adapt the output for your specific use cases:
+- `step1_system.txt` and `step1_user.txt` - Analysis step prompts
+- `step2_system.txt` and `step2_user.txt` - Search step prompts  
+- `step3_system.txt` and `step3_user.txt` - Translation step prompts
+- `step4_system.txt` and `step4_user.txt` - Critique step prompts
+- `step5_system.txt` and `step5_user.txt` - Refinement step prompts
+
+#### 2. Method Customization (Intermediate)
+Customize translation behavior by inheriting from `ArticleWorkflow` and overriding specific methods:
+
+```python
+from aphra.workflows import ArticleWorkflow
+from aphra.core.context import TranslationContext
+
+class CustomWorkflow(ArticleWorkflow):
+    def analyze(self, context: TranslationContext, text: str):
+        # Your custom analysis logic here
+        return super().analyze(context, text)
+
+    def search(self, context: TranslationContext, parsed_items):
+        # Your custom search logic here
+        return super().search(context, parsed_items)
+```
+
+#### 3. Complete Workflow Creation (Advanced)
+Build entirely new workflows by inheriting from `AbstractWorkflow`:
+
+```python
+from aphra.core.workflow import AbstractWorkflow
+from aphra.core.context import TranslationContext
+
+class CustomWorkflow(AbstractWorkflow):
+    def get_workflow_name(self) -> str:
+        return "custom_workflow"
+
+    def is_suitable_for(self, text: str, **kwargs) -> bool:
+        # Define suitability criteria
+        return True
+
+    def execute(self, context: TranslationContext, text: str) -> str:
+        # Your complete workflow logic here
+        return translated_text
+```
+
+Then register your workflow:
+
+```python
+from aphra.core import register_workflow
+register_workflow(CustomWorkflow)
+```
+
+### Extension Ideas
+
+The workflow architecture opens up exciting possibilities:
+
+- **Specialized Content Workflows:**
+  - **Academic Papers**: Enhanced terminology handling and citation preservation.
+  - **Technical Documentation**: API reference translation with code preservation.
+  - **Marketing Content**: Tone and brand voice adaptation across languages.
+  - **Legal Documents**: Precision-focused translation with legal term verification.
+
+- **Enhanced Search Capabilities:**
+  - **Agent-Based Web Search**: Replace LLM searcher with custom web search agents.
+  - **Domain-Specific Databases**: Integrate specialized terminology databases.
+  - **Visual Context**: Add image analysis for documents with visual elements.
+
+- **Local and Hybrid Operation:**
+  - **Ollama Integration**: Run workflows entirely locally using open-source models.
+  - **Hybrid Cloud-Local**: Use local models for sensitive content, cloud for complex analysis.
+  - **Custom Model Integration**: Plug in specialized translation models.
+
+- **Quality Assurance Extensions:**
+  - **Multiple Critic Workflow**: Use several specialized critics for different aspects.
+  - **Human-in-the-Loop**: Add human review steps at critical points.
+  - **Quality Metrics**: Automatic translation quality assessment.
+
+- **Performance Optimizations:**
+  - **Parallel Step Execution**: Run independent steps concurrently.
+  - **Caching System**: Cache analysis and search results for similar content.
+  - **Streaming Translation**: Process large documents in chunks.
 
-If you need to modify the flow itself, you'll need to dive into the code in `translate.py`. While this is a bit more complex, it's entirely doable for those looking to extend the functionality of 🌐💬 Aphra.
+### Getting Started with Extensions
 
-Here are some ideas for further extensions:
+1. **Fork the Repository**: Start with your own copy of Aphra.
+2. **Study the Existing Code**: Examine `aphra/workflows/article_workflow.py` for examples of workflow implementation.
+3. **Create Your Components**: Build your custom steps and workflows.
+4. **Test Thoroughly**: Use the existing test framework as a guide.
+5. **Share Your Work**: Consider contributing your extensions back to the community.
 
-- **Agent-Based Web Search:** Transform the LLM Searcher step into an agent that can search the Internet, removing the reliance on an LLM with built-in Internet access.
-- **Local Operation:** Once the above is implemented, consider adapting 🌐💬 Aphra to run locally using tools like [Ollama](https://ollama.com).
-- **Alternative Flows:** Create alternative workflows for specific use cases, giving 🌐💬 Aphra even more versatility.
-- **Modular Prompts:** Develop modular versions of 🌐💬 Aphra's prompts tailored to different use cases.
-- **And Beyond:** The possibilities are as vast as your imagination. 🌐💬 Aphra is a starting point, and the directions you can take it are endless.
+The workflow design ensures that your extensions are isolated, testable, and maintainable, while the registry system makes them discoverable and reusable.
 
-Feel free to experiment and extend 🌐💬 Aphra in ways that suit your projects and ideas.
+Feel free to experiment and extend 🌐💬 Aphra in ways that suit your projects and ideas. The architecture is built to grow with your needs!
 
 ## License
 

diff --git a/aphra/__init__.py b/aphra/__init__.py
@@ -5,8 +5,8 @@
 """
 
 from .translate import translate
-from . import llm_client
+from .core import llm_client
 from . import prompts
-from . import parsers
+from .core import parsers
 
 __all__ = ['translate', 'llm_client', 'prompts', 'parsers']
diff --git a/aphra/core/__init__.py b/aphra/core/__init__.py
@@ -0,0 +1,31 @@
+"""
+Core components for the Aphra translation system.
+
+This module contains the fundamental building blocks used across
+all workflows.
+"""
+
+from .llm_client import LLMModelClient
+from .parsers import parse_analysis, parse_translation
+from .context import TranslationContext
+from .workflow import AbstractWorkflow
+from .registry import (
+    WorkflowRegistry,
+    get_registry,
+    register_workflow,
+    get_workflow,
+    get_suitable_workflow
+)
+
+__all__ = [
+    'LLMModelClient',
+    'parse_analysis',
+    'parse_translation',
+    'TranslationContext',
+    'AbstractWorkflow',
+    'WorkflowRegistry',
+    'get_registry',
+    'register_workflow',
+    'get_workflow',
+    'get_suitable_workflow'
+]
diff --git a/aphra/core/context.py b/aphra/core/context.py
@@ -0,0 +1,47 @@
+"""
+Context management for translation workflows.
+
+This module provides the TranslationContext class that encapsulates
+all the state and configuration needed during translation execution.
+"""
+
+from dataclasses import dataclass
+from typing import Dict, Any
+from .llm_client import LLMModelClient
+
+
+@dataclass
+class TranslationContext:
+    """
+    Context for translation containing parameters and settings.
+
+    This class encapsulates the parameters and settings needed for performing a translation,
+    including the model client, source and target languages, and logging preferences.
+    """
+    model_client: LLMModelClient
+    source_language: str
+    target_language: str
+    log_calls: bool
+
+    # Additional fields for workflow state
+    metadata: Dict[str, Any] = None
+    intermediate_results: Dict[str, Any] = None
+
+    def __post_init__(self):
+        """Initialize optional fields if not provided."""
+        if self.metadata is None:
+            self.metadata = {}
+        if self.intermediate_results is None:
+            self.intermediate_results = {}
+
+    def get_models(self) -> Dict[str, str]:
+        """Get the configured LLM models."""
+        return self.model_client.llms
+
+    def store_result(self, step_name: str, result: Any) -> None:
+        """Store intermediate result from a workflow step."""
+        self.intermediate_results[step_name] = result
+
+    def get_result(self, step_name: str) -> Any:
+        """Retrieve intermediate result from a workflow step."""
+        return self.intermediate_results.get(step_name)
diff --git a/aphra/llm_client.py → aphra/core/llm_client.py b/aphra/llm_client.py → aphra/core/llm_client.py
@@ -46,24 +46,43 @@ def load_config(self, config_file_path):
             logging.error('Missing key in config file: %s', e)
             raise
 
-    def call_model(self, system_prompt, user_prompt, model_name, log_call=False):
+    def call_model(self, system_prompt, user_prompt, model_name, *,
+                   log_call=False, enable_web_search=False,
+                   web_search_context="high"):
         """
         Calls the model with the provided prompts.
 
         :param system_prompt: The system prompt to set the context for the model.
         :param user_prompt: The user prompt to send to the model.
         :param model_name: The name of the model to use.
         :param log_call: Boolean indicating whether to log the call details.
+        :param enable_web_search: Boolean indicating whether to enable web search via OpenRouter.
+        :param web_search_context: Context size for web search ('low', 'medium', 'high').
         :return: The model's response.
         """
+        response = None
         try:
-            response = self.client.chat.completions.create(
-                model=model_name,
-                messages=[
+            # Prepare the request parameters
+            request_params = {
+                "model": model_name,
+                "messages": [
                     {"role": "system", "content": system_prompt},
                     {"role": "user", "content": user_prompt}
                 ]
-            )
+            }
+
+            # Add web search capabilities if enabled (OpenRouter format)
+            if enable_web_search:
+                # Append :online to model name for web search
+                if not model_name.endswith(":online"):
+                    request_params["model"] = f"{model_name}:online"
+
+                # Add web search options
+                request_params["web_search_options"] = {
+                    "search_context_size": web_search_context
+                }
+
+            response = self.client.chat.completions.create(**request_params)
             response_content = response.choices[0].message.content
 
             if log_call:
@@ -75,7 +94,10 @@ def call_model(self, system_prompt, user_prompt, model_name, log_call=False):
             raise
         except (ValueError, KeyError, TypeError) as e:
             logging.error('Error parsing response: %s', e)
-            logging.error('Response content: %s', response.text if response else 'No response')
+            if response and hasattr(response, 'text'):
+                logging.error('Response content: %s', response.text)
+            else:
+                logging.error('No response available')
             raise
 
     def log_model_call(self, user_prompt, response):

diff --git a/aphra/parsers.py → aphra/core/parsers.py b/aphra/parsers.py → aphra/core/parsers.py