Last Updated: January 2025
Status: Active Development
This document tracks what needs to be implemented, in priority order. Use this as the source of truth for Claude Code / Cline development sessions.
The Express API and React dashboard are largely complete. The main work remaining is implementing the Python FastAPI backend that handles AI chat processing.
- User authentication (login/logout)
- Session management with PostgreSQL
- Client management CRUD
- Widget configuration CRUD
- API key generation and validation
- Domain restriction middleware
- Widget file serving with CDN headers
- Python backend service client (
server/services/python-backend.ts) - Widget routes v2 (
server/routes/widget-routes-v2.ts)
- Login page
- Overview dashboard
- Widget configuration page with live preview
- Settings page with API key management
- Knowledge base page (UI only)
- Analytics page (mock data)
- Theme support (light/dark)
- Vanilla JS widget (v1 and v2)
- CSS with theming support
- Session management
- Streaming support (v2)
- Docker Compose setup
- PostgreSQL schema (Drizzle ORM)
- Multi-tenant isolation in schema
These services form the foundation of AI chat processing.
File: ai-backend/src/services/qdrant.py
Estimated: 2-3 hours
Review Required: Yes (security - multi-tenant filtering)
Implement vector store operations:
- Initialize Qdrant client connection
- Create collection if not exists
- Search with client_id filtering (CRITICAL for multi-tenant)
- Upsert vectors with metadata
- Delete vectors by document_id
Key Pattern:
# ALWAYS filter by client_id
results = await qdrant.search(
collection_name="documents",
query_vector=embedding,
query_filter=Filter(must=[
FieldCondition(key="client_id", match=MatchValue(value=client_id))
])
)File: ai-backend/src/services/llm.py
Estimated: 3-4 hours
Review Required: Yes (cost implications, prompt design)
Implement LLM providers:
- Anthropic Claude client with prompt caching
- OpenAI GPT-4o-mini client
- Model selection based on client tier
- Token tracking for usage billing
- Streaming response support
Tier Logic:
- Free tier β GPT-4o-mini
- Paid tier β Claude Sonnet 4.5
File: ai-backend/src/services/cache.py
Estimated: 1-2 hours
Review Required: No
Implement caching:
- Query result caching (5 min TTL)
- Embedding caching (1 hour TTL)
- Rate limit counter storage
- Session data (optional)
File: ai-backend/src/services/postgres.py
Estimated: 2-3 hours
Review Required: Yes (multi-tenant queries)
Implement database operations:
- Connection pool setup
- Client validation by API key
- Chat log insertion
- Usage stats updates
- Document metadata queries
File: ai-backend/src/graph/widget_state.py
Estimated: 1 hour
Review Required: No
Define state schema for chat workflow.
File: ai-backend/src/graph/nodes/widget/retrieve.py
Estimated: 2-3 hours
Review Required: Yes (query construction, relevance)
Implement context retrieval:
- Embed user query
- Search Qdrant with client_id filter
- Score and rank results
- Format context for LLM
File: ai-backend/src/graph/nodes/widget/generate.py
Estimated: 3-4 hours
Review Required: Yes (prompt engineering, response quality)
Implement response generation:
- Build prompt with context
- Select LLM based on tier
- Generate response (streaming)
- Extract sources for citations
File: ai-backend/src/graph/nodes/widget/log.py
Estimated: 1-2 hours
Review Required: No
Implement async logging:
- Log to PostgreSQL (non-blocking)
- Update usage stats
- Track token consumption
File: ai-backend/src/graph/widget_workflow.py
Estimated: 1-2 hours
Review Required: Yes (flow correctness)
Wire up the workflow:
- Create StateGraph
- Add nodes
- Define edges
- Compile workflow
File: ai-backend/src/processing/parser.py
Estimated: 3-4 hours
Review Required: No
Parse uploaded documents:
- PDF text extraction
- DOCX parsing
- TXT/CSV handling
- Error handling for corrupt files
File: ai-backend/src/processing/chunker.py
Estimated: 2-3 hours
Review Required: Yes (chunk size affects quality)
Implement chunking:
- Split into ~500 token chunks
- 50 token overlap
- Preserve paragraph boundaries
- Handle edge cases
File: ai-backend/src/processing/embedder.py
Estimated: 2-3 hours
Review Required: No
Generate embeddings:
- Batch embedding API calls
- Cache embeddings in Redis
- Handle rate limits
File: ai-backend/src/processing/pipeline.py
Estimated: 2-3 hours
Review Required: Yes (end-to-end flow)
Wire up the pipeline:
- Orchestrate parse β chunk β embed β store
- Progress tracking
- Error handling and retry
- Callback to Express for status updates
File: ai-backend/src/api/routes.py
Estimated: 3-4 hours
Review Required: Yes (API contract)
Complete the API:
- POST /api/widget/chat (non-streaming)
- POST /api/widget/chat/stream (SSE)
- API key validation middleware
- Rate limiting middleware
- CORS configuration
Estimated: 2-3 hours Review Required: Yes (file security)
Add document processing:
- POST /api/widget/process-document
- GET /api/widget/documents/{id}/status
- Async processing queue
Estimated: 4-6 hours Review Required: Yes (coverage)
- Service layer tests
- Node function tests
- API route tests
Estimated: 4-6 hours Review Required: Yes
- End-to-end chat flow
- Document processing flow
- Multi-tenant isolation verification
Estimated: 2-3 hours Review Required: Yes (limits appropriate)
- Implement Redis-based rate limiting
- Free tier: 100 req/hour
- Paid tier: 1000 req/hour
- Overage tracking for billing
| Priority | Task | Estimated | Requires Review |
|---|---|---|---|
| P0 | Qdrant Service | 2-3h | β Yes |
| P0 | LLM Service | 3-4h | β Yes |
| P0 | Chat Endpoints | 3-4h | β Yes |
| P0 | LangGraph Workflow | 8-10h | β Yes |
| P1 | Redis Cache | 1-2h | No |
| P1 | PostgreSQL Service | 2-3h | β Yes |
| P1 | Document Processing | 10-12h | β Yes |
| P2 | Testing | 8-12h | β Yes |
| P2 | Rate Limiting | 2-3h | β Yes |
Total Estimated Time: 40-55 hours
The following MUST have human review before merging:
- Any Qdrant query - Must verify client_id filter
- Any PostgreSQL query - Must verify client_id filter
- API key validation - Must verify secure comparison
- Rate limiting logic - Must verify correct limits
- LLM prompts - Must verify no prompt injection risks
- File upload handling - Must verify type/size validation
- Architecture: ../ARCHITECTURE.md
- Billing/Limits: ../BILLING_AND_LIMITS.md
- Detailed Tasks: PYTHON_BACKEND_INTEGRATION_TASKS.md
- Widget Backend Details: WIDGET_LIGHT_BACKEND_TASKS.md
- Read this task list to understand priorities
- Check HUMAN_REVIEW_GUIDE.md for review requirements
- Read
.clinerules/clinerulesfor coding patterns - Start with a P0 task that doesn't require review (or get review first)
- After completing, update this document with [x] checkmarks