A self-hosted document intelligence platform that lets you upload documents, ask questions, and get answers backed by source citations.
The project currently has two tracks:
- v4: a stable retrieval-first research engine
- v5: an experimental workspace agent that creates and reuses persistent artifacts
The main branch contains the stable document intelligence platform:
- PDF and document ingestion
- Hybrid retrieval (vector search + BM25)
- Source citations
- FastAPI backend
- Next.js frontend
- PostgreSQL + pgvector
- Self-hosted deployment
Active development is happening on the agentic branch.
The current research direction explores whether persistent artifacts can make AI assistants more useful than chat alone. Instead of relying entirely on conversation history, the agent creates, stores, and reuses workspace artifacts across sessions.
Highlights:
- Custom agent loop (no LangGraph)
- Provider-agnostic LLM layer
- Workspace artifacts
- Persistent context experiments
- Tool-driven architecture
β‘οΈ Experimental branch: https://github.com/anmolsharma152/CodexEngine/tree/agentic
Most document assistants answer a question and immediately forget the work they just performed.
CodexEngine started as a retrieval-augmented research system and is evolving into an experiment around persistent AI workspaces, where analysis, reports, and findings can become reusable knowledge objects.
CodexEngine began as a retrieval-first research engine and is now being used to explore persistent AI workspaces.
| Branch | Status | Purpose |
|---|---|---|
main |
Stable | Production-ready document intelligence platform (v4) |
agentic |
Experimental | Workspace-agent research and v5 development |
git clone https://github.com/anmolsharma152/CodexEngine.git
cd CodexEngine
# Backend
cd codex-backend
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env # fill in your keys
uvicorn server:app --reload --host 127.0.0.1 --port 8000
# Frontend (new terminal)
cd codex-frontend
npm install && npm run devSet NEXT_PUBLIC_SUPABASE_URL, NEXT_PUBLIC_SUPABASE_ANON_KEY, NEXT_PUBLIC_API_URL in codex-frontend/.env.local. Open http://localhost:3000 β register, upload a PDF, and start asking questions.
When you ask a question, CodexEngine:
- Decides if it needs to search your documents or can answer directly
- Searches your indexed content using vector similarity + keyword search + optional web fallback
- Scores and reranks the results
- Generates an answer with source citations (
[p. X],[r. X],[doc],[web])
All of this runs through the v4 retrieval pipeline, which currently uses LangGraph-based orchestration and self-evaluation loops (up to 3 retries if the initial answer is weak).
The experimental agentic branch replaces this architecture with a custom agent loop.
| Feature | Local / CI | Production (Render 512MB) |
|---|---|---|
| Embeddings | fastembed ONNX (bge-small-en-v1.5) |
Google Gemini API |
| Reranker | CrossEncoder (ms-marco-MiniLM-L-6-v2) |
Score-based sort |
| Detection | MemTotal > 1.5GB or no RENDER env |
RENDER=true or < 1.5GB |
Both modes produce 384-dimensional vectors.
flowchart TD
User([User / Browser])
subgraph Frontend [codex-frontend β Next.js 15]
AuthUI[Auth UI]
ChatUI[Chat UI / SSE]
DocMgr[Document Manager]
SupaSDK["@supabase/supabase-js<br>Auth JWT β Bearer"]
end
subgraph Supabase [Supabase]
SA[Auth<br>sign up / sign in]
SB[Storage<br>documents bucket]
end
subgraph Backend [codex-backend β FastAPI]
direction LR
%% Graph Flow
R[1. Router] -->|retrieval_required| C[2. Condenser]
R -->|direct/meta| A[6. Actor]
C --> Ret[3. Retriever]
Ret --> E[4. Evaluator]
E -->|retry_needed| RW[5. Rewriter]
RW --> Ret
E -->|sufficient: False| WS[Web Search Fallback]
E -->|sufficient: True| A
WS --> A
A --> Resp[SSE Response]
%% Ingestion Flow
subgraph Ingestion [Background Ingestion]
Q[(asyncio.Queue)]
Worker[Worker Task]
Q --> Worker
Worker -->|Chunk & Embed| DB
end
end
subgraph DB [PostgreSQL + pgvector]
Threads[threads]
Chunks[prose_chunks<br>384-dim vectors]
end
subgraph External [External APIs]
Groq[Groq<br>LLM β llama-3.1]
Gemini[Google Gemini<br>embeddings]
FastEmbed[fastembed ONNX<br>embeddings]
end
User --> Frontend
AuthUI --> SA
SA -.->|JWT session| SupaSDK
SupaSDK --> Backend
ChatUI <-->|SSE stream| Backend
DocMgr -->|upload to| SB
DocMgr -->|enqueue| Q
Backend --> DB
Ret ---> FastEmbed
Ret -.->|fallback| Gemini
Backend ---> Groq
cd codex-backend
source .venv/bin/activate
python tests/test_golden.py # Single golden query
python tests/test_rigorous.py # Full sweep
python eval/ragas_eval.py # RAGAS metrics- Deployment guide β Render, Vercel, Supabase setup
- API reference β endpoint table with request/response examples
- FastAPI backend
- Next.js frontend
- PostgreSQL + pgvector
- Hybrid retrieval (vector + BM25)
- Server-sent events (SSE) streaming
- Supabase authentication and storage
- Provider-agnostic LLM architecture
- Workspace-agent experimentation (v5)