The agent.py module provides an intelligent interface for converting natural language questions into PostgreSQL to_tsquery format and searching biomedical literature databases. This module integrates Large Language Model (LLM) capabilities with database search functionality to create a seamless research experience.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ User Input │───▶│ QueryAgent │───▶│ Database │
│ Natural Language│ │ │ │ Results │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Ollama LLM │
│ (Local AI) │
└─────────────────┘
- QueryAgent Class: Main interface for query conversion and search
- LLM Integration: Uses Ollama for local AI processing
- Database Integration: Connects with BMLibrarian database module
- Callback System: Provides hooks for UI updates and monitoring
- Human-in-the-Loop: Allows manual query review and modification
ollama>=0.2.1: Python client for Ollama LLM serverlogging: For comprehensive logging- Standard library modules
class QueryAgent:
def __init__(self, model: str = "medgemma4B_it_q8:latest", host: str = "http://localhost:11434"):
"""
Initialize the QueryAgent.
Args:
model: The name of the Ollama model to use
host: The Ollama server host URL
"""Converts natural language to PostgreSQL to_tsquery format.
Parameters:
question(str): Natural language question
Returns:
str: PostgreSQL to_tsquery compatible string
Raises:
ValueError: If question is emptyConnectionError: If unable to connect to Ollama
Example:
agent = QueryAgent()
query = agent.convert_question("Effects of aspirin on heart disease")
# Returns: "aspirin & (heart | cardiac | cardiovascular) & (disease | disorder)"Full-featured search combining query conversion with database search.
Parameters:
question(str): Natural language questionmax_rows(int): Maximum results to return (default: 100)use_pubmed(bool): Include PubMed sources (default: True)use_medrxiv(bool): Include medRxiv sources (default: True)use_others(bool): Include other sources (default: True)from_date(Optional[date]): Earliest publication dateto_date(Optional[date]): Latest publication datebatch_size(int): Database fetch batch size (default: 50)use_ranking(bool): Enable relevance ranking (default: False)human_in_the_loop(bool): Enable human query review (default: False)callback(Optional[Callable]): Progress callback functionhuman_query_modifier(Optional[Callable]): Query modification function
Returns:
Generator[Dict, None, None]: Stream of document dictionaries
Example:
for doc in agent.find_abstracts("COVID vaccine effectiveness", max_rows=10):
print(f"{doc['title']} - {doc['publication_date']}")Test connection to Ollama server and verify model availability.
Returns:
bool: True if connection successful and model available
Example:
if agent.test_connection():
print("Ready to convert queries")
else:
print("Ollama server or model not available")Retrieve list of available models from Ollama server.
Returns:
list[str]: List of available model names
Raises:
ConnectionError: If unable to connect to Ollama server
The system uses a carefully crafted system prompt that:
- Establishes Context: Defines the agent as a biomedical literature search expert
- Provides Rules: Clear guidelines for
to_tsqueryformat generation - Includes Examples: Concrete examples of question-to-query conversion
- Focuses Domain: Emphasizes biomedical terminology and concepts
- Operator Usage: Proper use of
&(AND) and|(OR) operators - Grouping: Strategic use of parentheses for complex queries
- Term Selection: Focus on medical terminology, drug names, disease names
- Synonym Handling: Include alternative terms for comprehensive search
The _validate_tsquery() method performs basic validation:
- Balanced Parentheses: Ensures proper nesting
- Operator Validation: Checks for invalid operator combinations
- Empty Query Check: Prevents empty or whitespace-only queries
- Network connectivity issues to Ollama server
- Model not available or not loaded
- Server timeouts
- Empty or invalid questions
- Malformed responses from LLM
- Invalid
to_tsqueryformat detection - Warning logs for suspicious queries
The QueryAgent integrates with the bmlibrarian.database module:
from .database import find_abstracts
# In find_abstracts method:
yield from find_abstracts(
ts_query_str=ts_query_str,
max_rows=max_rows,
# ... other parameters
plain=False # Important: Use to_tsquery format, not plain text
)The agent uses Ollama for local LLM processing:
import ollama
self.client = ollama.Client(host=host)
response = self.client.chat(
model=self.model,
messages=[
{'role': 'system', 'content': self.system_prompt},
{'role': 'user', 'content': question}
],
options={
'temperature': 0.1, # Low temperature for consistent results
'top_p': 0.9,
'num_predict': 100 # Limit response length
}
)try:
query = agent.convert_question(user_question)
results = search_database(query)
except ValueError as e:
# Handle invalid input
return {"error": "Invalid question format"}
except ConnectionError as e:
# Handle Ollama connection issues
return {"error": "LLM service unavailable"}- Mock Ollama client for isolated testing
- Validate query format generation
- Error condition handling
- Input validation
- Real Ollama server connection (optional)
- End-to-end query conversion
- Performance testing with various question types
# Run unit tests only
uv run pytest tests/test_agent.py -m "not integration"
# Run all tests (requires Ollama server)
uv run pytest tests/test_agent.pyThe agent module doesn't directly use environment variables, but applications may configure:
OLLAMA_HOST: Override default Ollama server URLOLLAMA_MODEL: Override default model name
Choose models based on:
- Performance: Response time requirements
- Accuracy: Quality of biomedical keyword extraction
- Availability: Local vs remote model hosting
llama3.2: Good balance of performance and accuracymistral: Fast responses, good for simple queriescodellama: If incorporating code-like syntax parsing
- Model size affects response time
- Network latency to Ollama server
- Query complexity influences processing time
Consider implementing caching for:
- Common question patterns
- Frequently requested queries
- Model responses for identical inputs
- Use low temperature (0.1) for consistent results
- Limit response length with
num_predict - Implement connection pooling for high-volume applications
- Sanitize user input before sending to LLM
- Prevent injection attacks through malformed questions
- Validate generated queries before database execution
- Use HTTPS for remote Ollama connections
- Implement authentication if required
- Monitor for unusual query patterns
-
"Import ollama could not be resolved"
- Run
uv syncto install dependencies - Verify ollama package is in virtual environment
- Run
-
Connection timeouts
- Check Ollama server status
- Verify host URL and port
- Ensure model is downloaded and loaded
-
Poor query quality
- Try different models
- Adjust temperature settings
- Refine system prompt for specific use cases
Enable debug logging:
import logging
logging.getLogger('bmlibrarian.agent').setLevel(logging.DEBUG)- Query result ranking based on relevance scores
- Multi-model ensemble for improved accuracy
- Custom prompt templates for different domains
- Query optimization based on database schema
- Automatic model fallback for reliability
- Custom validation rules
- Domain-specific prompt engineering
- Integration with other LLM providers
- Query performance analytics