Agile vertical slices - each increment delivers a working, visible feature.
1. Walking Skeleton
↓
2. LLM Integration
↓
3. Single Document RAG ←─── Core MVP (functional Q&A)
↓
4. Multi-Document
↓
5. RAG Quality Hardening ←─── Quality MVP (caching, evaluation, hybrid search)
↓
┌───┴───┐
↓ ↓
6. Safety 7. Auth
↓ ↓
└───┬───┘
↓
8. Conversation History
↓
9. Audit Logging
↓
┌────┴────┐
↓ ↓
10. Observability 11. Feedback
↓ ↓
└────┬────┘
↓
12. Mobile Polish
↓
13. Production
Core MVP = Increments 1-4 (functional Q&A from documents) Quality MVP = Increments 1-5 (production-grade RAG with caching and evaluation)
Goal: Prove the stack works end-to-end. Hardcoded everything.
Deliverable: A web page where you type a question and get a response.
[Browser] → [FastAPI] → [Hardcoded response] → [Browser]
What you'll see:
- Visit
http://localhost:8000 - See a simple chat interface
- Type "Hello" → Get "Hello! I'm Retriever, the volunteer assistant."
Build:
- Project setup (pyproject.toml, dev container)
- FastAPI app with single route
- Jinja2 template with Tailwind (chat UI)
- HTMX for form submission
- Hardcoded response (no LLM yet)
- Health endpoint
/health - Input validation (1-2000 chars) (added during code review)
- XSS prevention tests (added during code review)
- Pre-commit hooks for automated quality checks (ruff, mypy)
- GitHub Actions CI pipeline (lint, type-check, test on PR)
Validates: Dev environment, FastAPI, HTMX, Tailwind, deployment pipeline
Status: ✅ Complete (PR #1)
Goal: Replace hardcoded response with actual Claude via OpenRouter.
Deliverable: Ask any question, get a real AI response (no RAG yet).
[Browser] → [FastAPI] → [OpenRouter/Claude] → [Browser]
What you'll see:
- Ask "What's the capital of France?" → Get actual Claude response
- See loading state while waiting
- See error message if API fails
Build:
- OpenRouter provider (Protocol-based)
- Environment config for API keys
- Loading spinner in UI (HTMX built-in from Increment 1)
- Error handling + display
- Request timeouts (30s for LLM calls)
- Circuit breaker for LLM calls (fail fast after 5 failures)
- Rate limiting (10 requests/minute per session)
- Input validation (max 2000 chars, basic sanitization) (done in Increment 1)
Validates: LLM integration, provider abstraction, error handling, resilience
Status: ✅ Complete
Goal: Load ONE document, answer questions from it.
Deliverable: Upload/index a document, ask questions about it.
[Document] → [Chunks] → [Embeddings] → [Chroma]
↓
[Question] → [Retrieve] → [Claude + Context] → [Answer]
What you'll see:
- Admin page: "Index document" button
- Index one shelter document
- Ask "Where do volunteers sign in?" → Answer from document
- See which chunks were used (debug view)
Build:
- Document loader (markdown/text first)
- Text chunker (structure-aware)
- OpenAI embeddings
- Chroma vector store
- RAG pipeline (retrieve + generate)
- Admin page to trigger indexing
- Show retrieved chunks in response
Validates: Full RAG pipeline, chunking strategy, retrieval quality
Status: ✅ Complete (PR #3)
Goal: Index multiple documents, show sources in answers.
Deliverable: Index all shelter docs, answers cite their sources.
What you'll see:
- Index multiple documents
- Ask question → Answer with expandable source citations
- Admin view of all indexed documents with metadata
Build:
-
Word document loader (.docx)- Not needed (using .md/.txt only) - Document metadata (title, section, filename)
- Source citation in answers (expandable citation cards)
- Document list in admin (with title, type badges)
- Re-index capability (already existed from Increment 3)
Enhancements (discovered during implementation):
- Markdown rendering in chat answers (currently shows raw markup)
- Indexing progress indicator in admin panel (no feedback during indexing)
Validates: Multi-document handling, citation accuracy
Status: ✅ Core complete (PR pending)
Goal: Production-grade RAG quality: caching, evaluation, hybrid retrieval.
Deliverable: Faster responses, measurable quality, better retrieval.
What you'll see:
- Repeated questions return instantly (~50ms vs ~3s)
- RAG quality tests run in CI with pass/fail
- Answers cite sources more accurately
Build:
- Semantic caching (cache by question similarity)
- Golden Q&A dataset (30+ examples from real docs)
- RAG quality tests (retrieval accuracy, answer accuracy)
- Hybrid retrieval (semantic + BM25 keyword search)
- Reranking integration (Cohere or RRF)
- Cache invalidation on document reindex
- Quality metrics logging
Validates: Cache effectiveness, retrieval quality improvement, regression detection
Status: ✅ Complete
Goal: Filter inappropriate content, detect attacks, prevent hallucinations.
Deliverable: Safe, accurate answers with attack prevention.
What you'll see:
- Ask inappropriate question → "I can only help with volunteer questions"
- Prompt injection attempt → Blocked and logged
- Answers verified against source documents
- Low-confidence answers flagged for review
Build:
- OpenAI Moderation API integration
- Input/output filtering
- Prompt injection detection (pattern-based)
- Hallucination detection (claim verification)
- Confidence scoring for answers
- Fallback responses for low-confidence
- Safety logging (without storing harmful content)
- Model fallback chain (Sonnet → Haiku)
Validates: Content moderation, attack prevention, answer accuracy
Status: ✅ Complete
Goal: Volunteers must log in to use the app.
Deliverable: Login page, protected chat, user sessions.
What you'll see:
- Visit app → Redirected to login
- Log in with email/password
- Access chat interface
- Log out
Build:
- User model + SQLite
- Registration endpoint (admin creates users)
- Login page
- JWT session handling
- Protected routes
- Logout
Validates: Auth flow, session management
Status: ✅ Complete
Goal: Remember conversation within a session.
Deliverable: Follow-up questions work, can see past Q&A.
What you'll see:
- Ask "Where do I sign in?"
- Follow up "What time does it open?" → Understands context
- Scroll up to see conversation history
Build:
- Conversation storage (session-based)
- Context window management
- Chat history UI
- Clear conversation button
Validates: Multi-turn conversations, context handling
Goal: Track all questions and answers for improvement.
Deliverable: Admin can see what volunteers are asking.
What you'll see:
- Admin dashboard with recent Q&A
- Filter by date, user
- See unanswered/low-confidence questions
- Export for analysis
Build:
- Audit log table
- Log every Q&A with metadata
- Admin dashboard page
- Basic analytics (common questions)
Validates: Audit trail, data for improvement
Goal: Production-ready monitoring.
Deliverable: Error tracking, structured logs, cost visibility.
What you'll see:
- Sentry dashboard with errors and performance
- Structured logs queryable by request ID
- Cost tracking in admin dashboard
- Health check endpoints
Build:
- Sentry integration (errors + performance)
- structlog setup (JSON, request IDs)
- OpenTelemetry tracing for production debugging
- Prompt versioning for tracking prompt changes
- Cost tracking per request
- Health check endpoints (/health, /health/ready)
- External uptime monitoring (UptimeRobot)
Validates: Production readiness, debugging capability
Goal: Volunteers can rate answers and quickly find key info.
Deliverable: Thumbs up/down on answers, feedback loop, prominent contact/source display.
What you'll see:
- Each answer has 👍/👎 buttons
- Feedback stored for review
- Admin sees low-rated answers
- Most likely contact person shown at top of answer (e.g., "Contact: Jane Smith, Adoption Coordinator")
- Primary source document highlighted (e.g., "Source: Adoption Procedures Guide")
Build:
- Feedback UI
- Feedback storage
- Admin feedback review
- Flag for document updates
- Extract and display primary contact from source documents
- Surface most relevant source document prominently at answer top
Validates: Continuous improvement loop, quick access to contacts/sources
Goal: Excellent mobile experience.
Deliverable: Fully responsive, touch-friendly on all devices.
What you'll see:
- Works great on phone, tablet, desktop
- Touch-friendly buttons
- Keyboard doesn't hide input
- Fast on slow connections
Build:
- Mobile testing & fixes
- Touch target optimization
- Offline-friendly error states
- Performance optimization
Validates: Real-world usability
Goal: Live on the internet, ready for volunteers.
Deliverable: Deployed app with documentation.
What you'll see:
- App running at
retriever.example.org - SSL certificate
- Volunteers can actually use it
Build:
- Railway/Render deployment
- Environment configuration
- Domain + SSL
- Volunteer onboarding guide
- Admin documentation
Validates: Production deployment, real users
Goal: Build and test production Docker image locally.
Deliverable: Production-ready Docker image that can be tested locally before Cloud Run deployment.
What you'll see:
- Build production image:
docker build -t retriever . - Run containerized app:
docker-compose up - Test production build locally before cloud deployment
- Data persists in volumes across container restarts
Development workflow:
- Active development: Git worktree on host with
uv run uvicorn --reload - Docker testing: Test production build locally before pushing to cloud
Build:
- Production-ready multi-stage Dockerfile
- .dockerignore for build optimization
- docker-compose.yml for container management
- README.md Docker deployment section
- Environment variables documented
- Volume management documentation
Validates: Production container build, local testing workflow, Cloud Run readiness
Status: 🚧 In Planning
Goal: Deploy to Google Cloud Run with managed infrastructure.
Deliverable: Production deployment on Cloud Run with auto-scaling.
What you'll see:
- App running on Cloud Run with custom domain
- Automatic scaling based on traffic
- Managed SSL certificates
- Centralized logging and monitoring
Build:
- Cloud Run deployment configuration
- GCS-backed volume mounting for data persistence
- Secret Manager integration for API keys
- GitHub Actions CI/CD pipeline
- Cloud Monitoring dashboards
- Deployment documentation
Validates: Cloud-native deployment, auto-scaling, production resilience
Status: ⏸️ Future (after Increment 14)
See implementation-plan.md for operational excellence items deferred until product validation.