Oracle Chat AI provides a comprehensive REST API for managing persistent chat sessions with Google's Gemini AI enhanced by LangChain integration. The API supports session management, message handling, intelligent memory strategies, and comprehensive monitoring capabilities with optimized conversation management.
Base URL: http://localhost:8000
API Version: v1
Current Version: 1.3.0 (LangChain Integration)
Oracle implements intelligent session management through LangChain integration with advanced memory strategies:
- LangChain Integration: ChatGoogleGenerativeAI with intelligent conversation management
- Smart Memory Strategies: Buffer, summary, entity, and hybrid memory types for optimal context handling
- Context Optimization: Automatic summarization and relevance-based context selection
- Session Caching: Active LangChain sessions cached in memory (default: 1 hour, max 50 sessions)
- Automatic Cleanup: Expired sessions are automatically removed to manage memory usage
- Intelligent Context Restoration: Sessions restore optimized conversation context using memory strategies
- Performance Benefits: 60-80% reduction in API token usage and 30-50% faster response times
- Database Persistence: All conversation history stored in SQLite with memory strategy coordination
The following environment variables control the application:
# Required
GEMINI_API_KEY=your_api_key_here
# Core Application Configuration
GEMINI_MODEL=gemini-2.5-flash
SYSTEM_INSTRUCTION_TYPE=default
LOG_LEVEL=info
ENVIRONMENT=development
DATABASE_URL=sqlite:///./oracle_sessions.db
# LangChain Integration Configuration
LANGCHAIN_ENABLED=true
LANGCHAIN_MEMORY_STRATEGY=hybrid
LANGCHAIN_MAX_BUFFER_SIZE=20
LANGCHAIN_MAX_TOKENS_BEFORE_SUMMARY=4000
LANGCHAIN_ENTITY_EXTRACTION_ENABLED=true
LANGCHAIN_MAX_TOKENS=4000
LANGCHAIN_MESSAGES_TO_KEEP_AFTER_SUMMARY=20
LANGCHAIN_RELEVANCE_THRESHOLD=0.7
LANGCHAIN_ENABLE_SEMANTIC_SEARCH=true
LANGCHAIN_SUMMARIZATION_TRIGGER_RATIO=0.8
LANGCHAIN_SUMMARY_MODEL=gemini-2.5-flash
LANGCHAIN_TEMPERATURE=0.7
LANGCHAIN_MAX_OUTPUT_TOKENS=2048
LANGCHAIN_LOG_LEVEL=info
LANGCHAIN_ENABLE_PERFORMANCE_MONITORING=true
LANGCHAIN_ENABLE_TOKEN_TRACKING=true- buffer: Keep recent messages in full detail (configurable buffer size)
- summary: Summarize older conversation parts while preserving recent context
- entity: Extract and maintain important entities (names, dates, preferences)
- hybrid: Combine strategies based on conversation length and importance
Currently, Oracle uses API key authentication configured via environment variables. No additional authentication headers are required for API requests.
POST /api/v1/sessions/
Creates a new chat session with optional configuration.
Request Body:
{
"title": "My New Session",
"model_used": "gemini-2.5-flash",
"session_metadata": {
"user_preferences": {},
"context_tags": ["technical", "development"]
}
}Response:
{
"id": 1,
"title": "My New Session",
"model_used": "gemini-2.5-flash",
"session_metadata": {
"user_preferences": {},
"context_tags": ["technical", "development"]
},
"created_at": "2025-01-27T12:00:00Z",
"updated_at": "2025-01-27T12:00:00Z",
"message_count": 0
}Status Codes:
201 Created- Session created successfully400 Bad Request- Invalid request data500 Internal Server Error- Server error
GET /api/v1/sessions/
Retrieves a paginated list of all sessions.
Query Parameters:
skip(optional): Number of sessions to skip (default: 0)limit(optional): Maximum sessions to return (default: 50, max: 100)
Response:
{
"sessions": [
{
"id": 1,
"title": "Technical Discussion",
"model_used": "gemini-2.5-flash",
"created_at": "2025-01-27T12:00:00Z",
"updated_at": "2025-01-27T12:30:00Z",
"message_count": 8
}
],
"total": 1,
"skip": 0,
"limit": 50
}GET /api/v1/sessions/{session_id}
Retrieves detailed information about a specific session.
Path Parameters:
session_id(integer): The session ID
Response:
{
"id": 1,
"title": "Technical Discussion",
"model_used": "gemini-2.5-flash",
"session_metadata": {},
"created_at": "2025-01-27T12:00:00Z",
"updated_at": "2025-01-27T12:30:00Z",
"message_count": 8
}Status Codes:
200 OK- Session found404 Not Found- Session does not exist
DELETE /api/v1/sessions/{session_id}
Deletes a session and all associated messages. Also removes the session from the persistent session cache.
Path Parameters:
session_id(integer): The session ID
Response:
{
"message": "Session deleted successfully",
"session_id": 1
}Status Codes:
200 OK- Session deleted successfully404 Not Found- Session does not exist
POST /api/v1/sessions/{session_id}/chat
Sends a message within a session context using persistent Gemini sessions for optimal performance.
Path Parameters:
session_id(integer): The session ID
Request Body:
{
"message": "What is FastAPI and how does it compare to other Python web frameworks?"
}Response:
{
"user_message": {
"id": 15,
"session_id": 1,
"role": "user",
"content": "What is FastAPI and how does it compare to other Python web frameworks?",
"timestamp": "2025-01-27T12:00:00Z"
},
"assistant_message": {
"id": 16,
"session_id": 1,
"role": "assistant",
"content": "FastAPI is a modern, fast web framework for building APIs with Python 3.7+...",
"timestamp": "2025-01-27T12:00:01Z"
},
"session": {
"id": 1,
"title": "Technical Discussion",
"message_count": 16,
"updated_at": "2025-01-27T12:00:01Z"
},
}LangChain Performance Features:
- Intelligent Memory Management: Uses configurable memory strategies for optimal context handling
- Context Optimization: Automatic summarization and relevance-based context selection
- Entity Extraction: Remembers important facts and preferences within sessions
- Token Efficiency: 60-80% reduction in API token usage through smart memory strategies
- Session Reuse: Reuses existing LangChain sessions for faster responses
- Database Persistence: All conversations stored reliably in SQLite with memory coordination
Status Codes:
200 OK- Message sent successfully400 Bad Request- Invalid message content404 Not Found- Session does not exist500 Internal Server Error- AI service error (with fallback handling)
GET /api/v1/sessions/{session_id}/messages
Retrieves message history for a session with pagination support.
Path Parameters:
session_id(integer): The session ID
Query Parameters:
skip(optional): Number of messages to skip (default: 0)limit(optional): Maximum messages to return (default: 50, max: 100)
Response:
{
"messages": [
{
"id": 15,
"session_id": 1,
"role": "user",
"content": "What is FastAPI?",
"timestamp": "2025-01-27T12:00:00Z"
},
{
"id": 16,
"session_id": 1,
"role": "assistant",
"content": "FastAPI is a modern, fast web framework...",
"timestamp": "2025-01-27T12:00:01Z"
}
],
"total": 16,
"skip": 0,
"limit": 50,
"session_id": 1
}GET /health
Basic system health check with session metrics.
Response:
{
"status": "healthy",
"timestamp": "2025-01-27T12:00:00Z",
"services": {
"gemini_api": "configured",
"database": "connected",
"logging": "active"
},
"session_metrics": {
"total_sessions": 25,
"active_sessions": 8,
"total_messages": 247
},
"version": "1.2.0"
}The application provides a simple health check endpoint for monitoring:
GET /health
Basic system health check.
Response:
{
"status": "healthy",
"timestamp": "2025-01-27T12:00:00Z",
"services": {
"gemini_api": "configured",
"database": "connected",
"logging": "active"
},
"session_metrics": {
"total_sessions": 25,
"active_sessions": 8,
"total_messages": 247
},
"version": "1.2.0"
}All API errors follow a consistent format:
{
"detail": "Error description",
"error_code": "SPECIFIC_ERROR_CODE",
"timestamp": "2025-01-27T12:00:00Z",
"path": "/api/v1/sessions/123/chat"
}SESSION_NOT_FOUND- The requested session does not existINVALID_MESSAGE_CONTENT- Message content is empty or invalidGEMINI_API_ERROR- Error communicating with Gemini APISESSION_CREATION_FAILED- Failed to create new sessionDATABASE_ERROR- Database operation failedPERSISTENT_SESSION_ERROR- Error with persistent session managementLANGCHAIN_INITIALIZATION_ERROR- LangChain model initialization failedMEMORY_STRATEGY_ERROR- Memory management operation failedCONTEXT_OPTIMIZATION_ERROR- Context optimization failedENTITY_EXTRACTION_ERROR- Entity extraction operation failedSUMMARIZATION_ERROR- Conversation summarization failed
When LangChain integration encounters errors:
- LangChain Initialization Failure: Falls back to direct Gemini API integration
- Memory Strategy Failure: Degrades to simple buffer memory for that session
- Context Optimization Failure: Uses basic message trimming instead of smart optimization
- Entity Extraction Failure: Continues without entity tracking for that conversation
- Summarization Failure: Maintains full conversation history within token limits
- Session Recovery Failure: Falls back to stateless mode for that request
- API Errors: Provides detailed error context while maintaining database persistence
- Memory Pressure: Automatically triggers session cleanup and continues operation
Currently, Oracle does not implement API rate limiting. Rate limiting is handled by the underlying Gemini API service.
Configure via GEMINI_MODEL environment variable:
gemini-2.5-flash(default) - Balanced performance and speedgemini-2.5-flash-lite- Faster responses, lighter processinggemini-2.5-pro- Enhanced reasoning capabilitiesgemini-1.5-pro- Advanced reasoning with longer contextgemini-1.5-flash- Fast responses with good quality
Configure via SYSTEM_INSTRUCTION_TYPE environment variable:
default- General purpose helpful assistantprofessional- Business and productivity focusedtechnical- Software development specialistcreative- Creative and engaging conversational styleeducational- Teaching and learning focused
- Intelligent Memory Management: Smart memory strategies reduce token usage while improving context quality
- Token Usage Reduction: 60-80% fewer tokens consumed through optimized context selection
- Response Time Improvement: 30-50% faster responses via LangChain session caching
- Context Optimization: Automatic summarization and relevance-based context selection
- Entity Extraction: Remembers important facts without storing full conversation history
- Database Persistence: All conversations reliably stored in SQLite with memory coordination
- Automatic Cleanup: Smart session management with configurable limits
- Fallback Mechanisms: Graceful degradation when advanced features fail
- Session Reuse: Keep sessions active for ongoing conversations
- Memory Strategy Selection: Choose appropriate memory strategy based on conversation type
- Context Management: System automatically handles conversation context with LangChain optimization
- Error Handling: Implement proper error handling for graceful degradation
- Database Monitoring: Monitor database health and performance
- Token Monitoring: Track token usage improvements with LangChain integration
- Memory Configuration: Tune memory strategy parameters for optimal performance
import requests
# Create a new session
session_response = requests.post(
"http://localhost:8000/api/v1/sessions/",
json={"title": "My Chat Session"}
)
session = session_response.json()
# Send a message
message_response = requests.post(
f"http://localhost:8000/api/v1/sessions/{session['id']}/chat",
json={"message": "Hello, how can you help me today?"}
)
chat_result = message_response.json()
print(f"AI Response: {chat_result['assistant_message']['content']}")
print(f"Session ID: {chat_result['session']['id']}")// Create a new session
const sessionResponse = await fetch('http://localhost:8000/api/v1/sessions/', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ title: 'My Chat Session' })
});
const session = await sessionResponse.json();
// Send a message
const messageResponse = await fetch(`http://localhost:8000/api/v1/sessions/${session.id}/chat`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message: 'Hello, how can you help me today?' })
});
const chatResult = await messageResponse.json();
console.log('AI Response:', chatResult.assistant_message.content);
console.log('Session ID:', chatResult.session.id);Configure memory strategies via environment variables:
# Memory strategy selection
LANGCHAIN_MEMORY_STRATEGY=hybrid # buffer, summary, entity, hybrid
# Buffer memory settings
LANGCHAIN_MAX_BUFFER_SIZE=20
LANGCHAIN_MAX_TOKENS_BEFORE_SUMMARY=4000
# Entity extraction settings
LANGCHAIN_ENTITY_EXTRACTION_ENABLED=true
# Context optimization settings
LANGCHAIN_MAX_TOKENS=4000
LANGCHAIN_MESSAGES_TO_KEEP_AFTER_SUMMARY=20
LANGCHAIN_RELEVANCE_THRESHOLD=0.7
LANGCHAIN_ENABLE_SEMANTIC_SEARCH=true
LANGCHAIN_SUMMARIZATION_TRIGGER_RATIO=0.8
# Model configuration
LANGCHAIN_SUMMARY_MODEL=gemini-2.5-flash
LANGCHAIN_TEMPERATURE=0.7
LANGCHAIN_MAX_OUTPUT_TOKENS=2048
# Monitoring and logging
LANGCHAIN_LOG_LEVEL=info
LANGCHAIN_ENABLE_PERFORMANCE_MONITORING=true
LANGCHAIN_ENABLE_TOKEN_TRACKING=trueLangChain integration supports gradual rollout via feature flags:
{
"langchain_integration": {
"state": "percentage_rollout",
"percentage": 10,
"environment_override": "ENABLE_LANGCHAIN"
},
"langchain_memory_strategies": {
"state": "disabled",
"environment_override": "ENABLE_LANGCHAIN_MEMORY"
},
"context_optimization": {
"state": "percentage_rollout",
"percentage": 25,
"environment_override": "ENABLE_CONTEXT_OPTIMIZATION"
}
}-
LangChain Initialization Errors
- Verify GEMINI_API_KEY is correctly set
- Check LANGCHAIN_ENABLED=true in environment
- Ensure LangChain dependencies are installed
-
Memory Strategy Failures
- Check LANGCHAIN_MEMORY_STRATEGY is valid (buffer, summary, entity, hybrid)
- Verify LANGCHAIN_MAX_BUFFER_SIZE > 0
- Check memory configuration parameters are within valid ranges
-
Context Optimization Issues
- Verify LANGCHAIN_MAX_TOKENS > 0
- Check LANGCHAIN_RELEVANCE_THRESHOLD is between 0.0 and 1.0
- Ensure LANGCHAIN_SUMMARIZATION_TRIGGER_RATIO is between 0.0 and 1.0
-
Entity Extraction Problems
- Check LANGCHAIN_ENTITY_EXTRACTION_ENABLED=true
- Verify sufficient conversation history for entity extraction
- Monitor logs for entity extraction errors
Monitor LangChain integration through logs and health endpoints:
# Check LangChain configuration
curl http://localhost:8000/health
# Monitor LangChain logs
grep "langchain" backend/logs/backend.log
# Check memory strategy usage
grep "memory_strategy" backend/logs/backend.log
# Monitor token usage improvements
grep "token_usage" backend/logs/backend.logOptimize LangChain performance by adjusting configuration:
-
Memory Strategy Selection:
- Use
bufferfor short conversations - Use
summaryfor long conversations with less entity tracking - Use
entityfor conversations requiring fact retention - Use
hybridfor balanced performance (recommended)
- Use
-
Token Management:
- Adjust
LANGCHAIN_MAX_TOKENSbased on model limits - Tune
LANGCHAIN_SUMMARIZATION_TRIGGER_RATIOfor summarization frequency - Configure
LANGCHAIN_MESSAGES_TO_KEEP_AFTER_SUMMARYfor context preservation
- Adjust
-
Context Optimization:
- Set
LANGCHAIN_RELEVANCE_THRESHOLDhigher for more selective context - Enable
LANGCHAIN_ENABLE_SEMANTIC_SEARCHfor better context selection - Adjust
LANGCHAIN_MAX_BUFFER_SIZEbased on conversation patterns
- Set
- Added LangChain integration with ChatGoogleGenerativeAI
- Implemented intelligent memory strategies (buffer, summary, entity, hybrid)
- Added context optimization with automatic summarization
- Implemented entity extraction and fact retention
- Added comprehensive error handling and fallback mechanisms
- Enhanced monitoring with LangChain-specific metrics
- Added feature flag support for gradual rollout
- Improved token usage efficiency through smart memory management
- Added persistent Gemini session management
- Implemented session caching with automatic cleanup
- Added session recovery from database history
- Enhanced monitoring endpoints with session health checks
- Improved performance with 60-80% token usage reduction
- Added feature flag support for safe deployment
- Added session-based architecture
- Implemented SQLite database with SQLModel
- Added comprehensive session management endpoints
- Enhanced monitoring and analytics capabilities
- Initial release with basic chat functionality
- Stateless conversation handling
- Basic health monitoring