Enterprise RAG (Retrieval-Augmented Generation) knowledge base with a ChatGPT-like UI, document ingestion, pgvector search, and source-cited answers.
- Shell layout: header + full-height sidebar + feature content
- Theme: Light / Dark / System modes
- AI Chat (
/query) (ChatGPT-like)- Multiple conversations with history stored in Postgres
- Chat list in sidebar with per-chat options (Rename, Delete)
- Modes: Fast / Thinking / Deep research / Web only (per chat)
- Sticky composer at the bottom and scrollable message history
- Copy assistant responses; resend a previous user prompt
- Citations per answer (toggle Sources) with a persistent "auto-open sources" preference
- Documents (
/documents)- Upload PDF/TXT/Markdown/CSV/XLSX
- Paste text ingestion
- View document metadata, processing stage, and chunk progress
- Reprocess failed documents
- Delete documents (and associated chunks)
- Search (
/search)- Single-shot query (no chat history)
- Modes + citations
- Document ingestion: extract text -> chunk -> embed -> store in Postgres + pgvector
- Vector retrieval: similarity search with tenant + visibility filtering
- RAG answering: retrieval -> prompt building -> LLM answer -> citations
- Fast/Thinking can answer general questions and clearly label grounding when no sources are available
- Deep research can include optional web snippets (if web search is configured)
- Web only uses only web snippets (no knowledge-base retrieval)
- Answer safety (basic): simple grounding/citation checks and guardrails
- Audit logging: tracks key actions like uploads and queries
- In-process background document queue with configurable worker count
- Copy
.env.exampleto.envat the repo root before runningdocker-compose up. - Fill strong, unique values for
POSTGRES_DB,POSTGRES_USER,POSTGRES_PASSWORD,DATABASE_URL, andSECRET_KEY. - Docker Compose sources
.envfor both the database container and backend service;.envis ignored via.gitignoreso secrets stay local.
- Retrieval evaluation dataset:
experiments/data/retrieval_eval.json - Precision@K script:
experiments/retrieval_metrics.py - Embedding comparison notebook:
experiments/embedding_comparison.ipynb - Design tradeoffs:
docs/design_decisions.md
GET /query- chat UIGET /documents- upload + manage documentsGET /search- single-shot query UI
Health
GET /- service infoGET /health- health checkGET /docs- Swagger UI
Auth
POST /auth/registerPOST /auth/loginPOST /auth/refresh
Users
GET /users/GET /users/{user_id}POST /users/PUT /users/{user_id}
Documents
POST /documents/- upload file (multipart)POST /documents/text- ingest pasted text (multipart form)GET /documents/- listGET /documents/{document_id}- details (includes progress fields)POST /documents/{document_id}/reprocess- re-run ingestionDELETE /documents/{document_id}- delete (admin only)
Search / Query
POST /query/- single-shot RAG query (supportsmode)
Chats
GET /chats/- list chatsPOST /chats/- create chat (supportsmode)GET /chats/{chat_id}- chat + messagesPATCH /chats/{chat_id}- renameDELETE /chats/{chat_id}- deletePOST /chats/{chat_id}/query- send a message (supportsmode, optionalagentic)
Admin
GET /admin/audit-logsGET /admin/stats
Agents (Agentic RAG)
POST /agents/query- agent planner-controller loop (supportsmode,max_steps,include_steps)
fast: fastest responses, smaller retrievalthinking: best quality on your knowledge base (default)deep_research: larger retrieval + optional web snippets (requiresWEB_SEARCH_PROVIDER+WEB_SEARCH_API_KEY)web_only: web snippets only (requires web search config)
- Chat UI includes an
Agentictoggle that runs a planner-controller loop with auditable steps (persisted in Postgres asagent_runsandagent_steps).
When LLM_PROVIDER=openai, you can split models by role:
LLM_MODEL- default chat model (used if specific overrides are not set)LLM_MODEL_FAST/LLM_MODEL_THINKING/LLM_MODEL_DEEP_RESEARCH- per-mode answer model overridesAGENT_PLANNER_MODEL- agent planner model (JSON action selection)AGENT_SUMMARY_MODEL- agent summarization model (optional)