A FastAPI backend that handles multiple AI capabilities: document Q&A, SQL queries, voice interaction, and multi-agent coordination.
- Python 3.13 or higher
- uv package manager
- Ollama - Local LLM runtime
- LangSmith account (optional, for observability)
- Vision-capable model for multimodal features (optional):
ollama pull llavaorollama pull llava-phi3- Alternative: GPT-4V, Claude 3 (requires API keys)
- Docker and Docker Compose (optional)
- Make (optional, for convenient commands)
A production-ready AI platform with:
- π 9 specialized AI agents
- π§ 10+ tools and integrations
- π¬ Real-time chat with streaming
- πΌοΈ Vision and multimodal support
- π MCP for external services
- π Full observability with LangSmith
- π€ Human-in-the-loop workflows
- π WebSocket real-time features
| Feature | Status | Phase | Files |
|---|---|---|---|
| RAG | β | 1 | rag_agent.py |
| SQL Agent | β | 2 | sql_agent.py |
| Custom RAG | β | 3 | custom_rag_agent.py |
| Web Search | β | 4 | tools/web_search.py |
| Calculator | β | 4 | tools/calculator.py |
| Wikipedia | β | 4 | tools/wikipedia.py |
| Memory | β | 4 | tool_agent.py |
| Multi-Agent | β | 5 | multi_agent_router.py |
| Streaming | β | 5 | streaming_agent.py |
| Human-in-Loop | β | 5 | human_in_loop_agent.py |
| Vision | β | 6 | multimodal_agent.py |
| MCP | β | 7 | mcp/mcp_client.py, mcp_agent.py |
| LangSmith | β | 8 | langsmith_service.py |
| WebSocket Chat | β | 9 | websocket_manager.py, chat_service.py |
- Document upload and indexing
- Vector search & storage with ChromaDB
- Basic RAG and Semantic Search with LangGraph LCEL
- Document processing and chunking
- Natural language to SQL
- Database schema introspection
- Query execution and explanation
- State-based workflow
- Question analysis
- Conditional routing
- Clarification requests
- Web search (DuckDuckGo)
- Calculator
- Wikipedia
- Conversation memory with SQLite checkpoints
- Short-Term Memory - SQLite checkpoint persistence
- Conversation History - Thread-based tracking
- Tool Agent - Orchestrates all tools with memory
- Multi-Agent Router - Intelligent routing to specialists
- Real-time Streaming Agent - Real-time token/event streaming
- Human-in-the-Loop with breakpoints - Breakpoints for sensitive operations
- State Management - Advanced LangGraph workflows
- Customer Support Agent with handoffs - Handoffs between specialists
- Long-term memory (user profiles, preferences, facts)
- Parallel execution
- Deep research with sub-graphs
- Synthesis from multiple sources
- Image Analysis - Describe and analyze images
- Multi-Image Comparison - Compare multiple images
- OCR - Extract text from images
- Document + Image - Combine text and visual analysis
- Vision Models - Support for GPT-4V, Claude 3, Llava
- MCP Client - Connect to external MCP servers
- Server Management - Add/remove MCP servers dynamically
- Tool Discovery - Auto-discover available tools
- MCP Agent - AI-powered tool selection and execution
- Pre-configured Presets - GitHub, Slack, Notion, Google Drive
- Auto-Tracing - Automatic trace capture
- Project Management - Organize runs by project
- Feedback System - Collect user feedback
- Dataset Creation - Build evaluation datasets
- Run Analytics - View performance statistics
- Error Tracking - Monitor failures
- WebSocket Connections - Real-time bidirectional communication
- Connection Management - Multi-device support
- Message Broadcasting - Broadcast to conversation participants
- Typing Indicators - Show when users are typing
- Read Receipts - Track message read status
- Multi-Device Sync
- Agent Streaming - Stream AI responses in real-time
- Message Persistence - SQLite database storage
- Conversation History - Load past messages
- Pagination - Efficient message loading
- Multi-User Support - Multiple users per conversation
- Reconnection Handling - Graceful disconnect/reconnect
- Install Ollama from https://ollama.ai
- Pull required models:
ollama pull llama3.2 ollama pull nomic-embed-text # For vision features (optional): ollama pull llava # Or smaller vision model: ollama pull llava-phi3
FastAPI server has WebSocket endpoints, but you need a client to connect to them. Test clients provide ability to:
- Test that WebSocket chat works
- See messages streaming in real-time
- Test multiple users (open multiple browser tabs)
- Debug connection issues
Different ways to run
- Browser Client (EASIEST - No installation or dependencies needed!)
- Start server:
uv run fastapi dev app/main.py- Open the HTML file in your browser:
Just double-click on
tests/websocket/test_websocket_client.htmlin folder
# OR in terminal
open tests/websocket/test_websocket_client.html- You'll see a nice chat interface:
- Click "Connect"
- Type messages
- See AI responses stream in real-time!
- Python Client (For Testing from Terminal)
- Install websockets:
uv add websockets --dev- Start server (in one terminal)
uv run fastapi dev app/main.py- Run the test:
python tests/websocket/test_websocket_client.py- Watch it connect and chat:
π Connecting to ws://localhost:8000/ws/chat/test-conversation?token=user123...
β
Connected as user123!
π¬ Sending message...
π€ Agent is thinking...
LangChain is a framework... (streaming in real-time)
β
Agent response complete!- Node.js Client (For JavaScript Development)
- Install ws package:
npm install ws- Run the test:
node tests/websocket/test_websocket_client.js- Basic Chat
python tests/websocket/test_websocket_client.py basicConnects Sends one message Receives AI response Disconnects
- Multi-Turn Memory
python tests/websocket/test_websocket_client.py memoryTests if AI remembers context:
"My name is Alice"
"What's my name?" β AI should say "Alice"
- Typing Indicators
python tests/websocket/test_websocket_client.py typingTests typing status updates Shows "user is typing..." messages
- Sign up at https://smith.langchain.com
- Create API key
- Add to
.env:LANGSMITH_ENABLED=true LANGSMITH_API_KEY=your_key_here LANGSMITH_PROJECT=my-project
Example MCP - Servers:
# GitHub
mcp_client.add_server("github", "http://localhost:3000/mcp", token)
# Slack
mcp_client.add_server("slack", "https://slack.com/api/mcp", token)
# Custom
mcp_client.add_server("my-service", "https://api.myservice.com/mcp")Example MCP - GitHub Integration
# First, configure MCP server in services.py:
services.mcp_client.add_server(
name="github",
url="http://localhost:3000/mcp",
api_key=os.getenv("GITHUB_TOKEN")
)
# Then use via API:
curl -X POST "http://localhost:8000/api/v1/mcp/query" \
-H "Content-Type: application/json" \
-d '{"question": "List my GitHub repositories"}'MCP Server Setup
Example MCP server configurations available in mcp_client.py:
- GitHub
- Slack
- Notion
- Google Drive
- Custom servers
# Install dependencies
make install
# OR
uv sync
# Setup environment
cp .env.example .env
# Edit .env with your keys
# Run server
make run
# OR
uv run fastapi dev app/main.py
# Run tests
make test-all
# OR
chmod +x tests/test_all_phases.sh && ./tests/test_all_phases.sh
# Test individual phases
make test-phase1
make test-phase4
make test-phase6
# Test streaming
make test-streamingPOST /api/v1/upload-documents- Upload documentsPOST /api/v1/semantic-search- Semantic searchPOST /api/v1/rag-query- RAG query
POST /api/v1/sql-query- Natural language to SQLGET /api/v1/database-schema- Get schema
POST /api/v1/custom-rag- Advanced RAG with reasoning
POST /api/v1/tool-agent- Query with toolsGET /api/v1/tool-agent/history/{thread_id}- Get historyDELETE /api/v1/tool-agent/history/{thread_id}- Clear history
POST /api/v1/multi-agent- Intelligent routingPOST /api/v1/customer-support- Support agentPOST /api/v1/stream- Streaming responsesPOST /api/v1/human-in-loop- HIL queriesPOST /api/v1/human-in-loop/approve- Approve action
POST /api/v1/memory-agent- Query with long-term memoryPOST /api/v1/memory-agent/profile- Create user profileGET /api/v1/memory-agent/user/{user_id}- Get user infoPOST /api/v1/parallel-search- Parallel searchPOST /api/v1/research- Deep researchPOST /api/v1/vision/analyze- Analyze single imagePOST /api/v1/vision/compare- Compare multiple imagesPOST /api/v1/vision/ocr- Extract text from image
GET /api/v1/mcp/servers- List MCP serversGET /api/v1/mcp/tools- List available toolsPOST /api/v1/mcp/query- Query using MCP toolsPOST /api/v1/mcp/servers- Add MCP serverDELETE /api/v1/mcp/servers/{name}- Remove MCP server
GET /api/v1/langsmith/stats- Get project statisticsGET /api/v1/langsmith/runs- List recent runsPOST /api/v1/langsmith/feedback- Add feedback to runGET /api/v1/langsmith/run/{run_id}- Get run details
All agent calls are automatically traced when LangSmith is enabled. View traces at: https://smith.langchain.com
WebSocket Endpoint:
WS /ws/chat/{conversation_id}?token=user_id- Real-time chat
REST Endpoints:
POST /api/v1/chat/conversations- Create conversationGET /api/v1/chat/conversations/{id}- Get conversationGET /api/v1/chat/conversations/{id}/messages- Get messagesGET /api/v1/chat/users/{user_id}/conversations- List user convosGET /api/v1/chat/stats- WebSocket stats
Message Types:
// User message
{type: "message", content: "Hello"}
// Typing indicator
{type: "typing", is_typing: true}
// Read receipt
{type: "read", message_id: "msg_123"}
// Agent streaming
{type: "agent_stream", content: "chunk..."}
// Agent complete
{type: "agent_complete", content: "full response"}Project Structure:
βββ langchain_learn
βββ app
βββ core
βββ __init__.py
βββ config.py
βββ services.py
βββ models
βββ __init__.py
βββ chat.py
βββ routes
βββ __init__.py
βββ agents.py
βββ websocket.py
βββ services
βββ agents
βββ __init__.py
βββ custom_rag_agent.py
βββ human_in_loop_agent.py
βββ mcp_agent.py
βββ memory_agent.py
βββ multi_agent_router.py
βββ multimodal_agent.py
βββ parallel_agent.py
βββ rag_agent.py
βββ research_agent.py
βββ sql_agent.py
βββ streaming_agent.py
βββ support_agent.py
βββ tool_agent.py
βββ mcp
βββ __init__.py
βββ mcp_client.py
βββ memory
βββ __init__.py
βββ long_term_memory.py
βββ tools
βββ __init__.py
βββ calculator.py
βββ database_tools.py
βββ web_search.py
βββ wikipedia.py
βββ __init__.py
βββ chat_service.py
βββ database.py
βββ langsmith_service.py
βββ vector_store.py
βββ websocket_manager.py
βββ utils
βββ __init__.py
βββ utc_now.py
βββ __init__.py
βββ dependencies.py
βββ main.py
βββ schemas.py
βββ data
βββ tests
βββ stress_test.sh
βββ test_all_phases.sh
βββ test_phase1.sh
βββ test_phase2.sh
βββ test_phase3.sh
βββ test_phase4.sh
βββ test_streaming.sh
βββ .env.example
βββ .gitignore
βββ .python-version
βββ Makefile
βββ prek.toml
βββ pyproject.toml
βββ README.md
βββ uv.lock# Format code
make format
# Lint
make lint
# Health check
make health# Full test suite
make test-all
# Streaming tests
make test-streaming
# Stress test
make test-stressMIT