This project is a production-style AI backend system designed to help developers debug distributed systems by combining:
- Document-based Retrieval (RAG)
- Real-time Log Ingestion (upcoming)
- LLM-based Root Cause Analysis
The system retrieves relevant context from documents (and later logs), and uses an LLM to generate accurate, grounded debugging explanations and fix suggestions.
Debugging distributed systems is difficult because:
- Logs are noisy and scattered
- Errors lack context
- Documentation is underutilized
This system solves that by:
Combining logs + documentation + semantic search โ to explain issues and suggest fixes
- Retrieval-Augmented Generation (RAG)
- Semantic search using embeddings
- Context-aware response generation
- FastAPI-based modular backend
- Clean separation of layers (API, retrieval, service, cache)
-
Qdrant (persistent via Docker)
-
Rich payload metadata:
- user_id (multi-tenancy ready)
- chunk tracking
- embedding versioning
- Retry mechanism with exponential backoff
- Timeout protection
- Safe fallback responses
- Redis-based response caching
- Reduces latency and cost
- API key-based authentication (middleware)
- Secure request validation
- Redis-based per-key rate limiting
- Prevents API abuse
Client โ FastAPI API Layer โ Middleware Layer (Auth + Rate Limiting) โ Service Layer (LLM Orchestration) โ Retrieval Layer (Qdrant Vector Search) โ Cache Layer (Redis) โ LLM (OpenAI API)
-
Client Request
- User sends query to
/api/ask
- User sends query to
-
Middleware Layer
- API Key Authentication
- Rate Limiting (Redis-based)
-
Cache Layer (Redis)
- Checks for cached response
- Returns immediately if available
-
Retrieval Layer (RAG)
- Converts query โ embedding
- Searches similar chunks in Qdrant
- Filters based on similarity threshold
- Builds contextual input
-
LLM Service Layer
- Constructs structured prompt (context + query)
- Calls OpenAI API with retry & timeout handling
-
Response Processing
- Formats response (answer, confidence, source)
- Stores result in Redis cache
-
Response Returned to Client
-
Qdrant (Vector Database)
- Stores embeddings + metadata payload
- Enables semantic search and filtering
-
Redis
- Used for caching responses
- Used for rate limiting counters
- Modular architecture with clear separation of concerns
- Config-driven system behavior
- Fault-tolerant LLM integration (retry + fallback)
- Scalable vector-based retrieval
- Secure and rate-limited API access
- Python โ Core language
- FastAPI โ Backend API framework
- Qdrant โ Vector database for semantic search
- Redis โ Caching and rate limiting
- OpenAI API โ LLM inference
- Docker โ Containerized services
git clone https://github.com/your-username/your-repo.git
cd your-repopython -m venv venv
source venv/bin/activate
pip install -r requirements.txtAPP_API_KEY=your_secret_key
OPENAI_API_KEY=your_openai_key
QDRANT_HOST=localhost
QDRANT_PORT=6333
REDIS_URL=redis://localhost:6379docker run -p 6333:6333 qdrant/qdrantredis-serveruvicorn app.main:app --reloadPOST /api/ask
x-api-key: your_secret_key
Content-Type: application/json
{
"query": "What backend technologies are used?"
}{
"answer": "The backend technologies include Python, FastAPI, Redis...",
"confidence": "medium",
"source": "rag_docs"
}- ๐ Log ingestion system (
/ingest-log) - ๐ง Incident detection (error patterns)
- ๐ Multi-source retrieval (logs + docs)
- โก Hybrid search (vector + keyword)
- ๐ Observability (Prometheus + Grafana)
- ๐ฅ Multi-user API key system
- โ๏ธ Cloud deployment (AWS)
- ๐ CI/CD pipeline
To build a production-grade AI backend system that demonstrates:
- Backend engineering expertise
- Scalable system design
- Real-world AI/LLM integration
Akash Akuthota Backend Developer (Python | FastAPI | AI Systems)