One-Stop RAG Solution - Technical Design Document

Version: 1.1 Date: 2025-12-29

Project Setup
Executive Summary
System Architecture
Technology Stack
Database Schema
Backend Services
API Endpoints
Frontend Components
LangGraph Agent Architecture
Configuration
Development Setup
Implementation Notes & Lessons Learned
Testing Strategy

1. Project Setup

Initial Project Creation

This project was bootstrapped from a reusable Python project template that provides a production-ready foundation with common services and infrastructure.

Template Repository

Source: Python Project Template (internal) Features Provided:

FastAPI backend with async PostgreSQL (SQLAlchemy 2.0)
React + TypeScript + Vite frontend with TailwindCSS
Docker Compose orchestration for all services
Optional services: MongoDB, Neo4j, Redis, RabbitMQ/Celery
Alembic for database migrations
Health check endpoints
Feature flag system
Poetry for Python dependency management
Pre-configured ESLint, Prettier, and type checking

Setup Steps

# 1. Clone the template repository
git clone https://github.com/nitinnat/python-project-template.git one-stop-rag
cd one-stop-rag

# 2. Remove template git history and reinitialize
rm -rf .git
git init
git add .
git commit -m "Initial commit from template"

# 3. Update project-specific files
# - Update README.md with project description
# - Update package names in frontend/package.json
# - Update project name in backend/pyproject.toml
# - Clear TECHNICAL_DESIGN.md and IMPLEMENTATION_LOG.md

Template Structure Inherited

one-stop-rag/
├── backend/
│   ├── app/
│   │   ├── api/v1/          # API endpoints (health, admin, documents, graph)
│   │   ├── models/          # SQLAlchemy models (postgres, mongodb, neo4j)
│   │   ├── repositories/    # Data access layer
│   │   ├── services/        # Business logic
│   │   ├── schemas/         # Pydantic models
│   │   ├── helpers/         # Utilities (postgres, redis, celery, llm clients)
│   │   ├── config/          # Settings and feature flags
│   │   └── main.py          # FastAPI app entry point
│   ├── alembic/             # Database migrations
│   ├── Dockerfile
│   └── pyproject.toml       # Poetry dependencies
├── frontend/
│   ├── src/
│   │   ├── api/             # API client functions
│   │   ├── components/      # React components
│   │   ├── pages/           # Page components
│   │   ├── layouts/         # Layout components
│   │   ├── context/         # React context providers
│   │   └── hooks/           # Custom React hooks
│   ├── Dockerfile
│   └── package.json
├── docker-compose.yml       # Service orchestration
├── Makefile                 # Common commands
└── .env                     # Environment variables

Services Configured in Template

Service	Purpose	Default Port
backend	FastAPI application	8000
frontend	React application	5173
postgres	Primary database	5432
redis	Caching & sessions	6379
mongodb	Document database (optional)	27017
neo4j	Graph database (optional)	7474, 7687
rabbitmq	Message queue (optional)	5672, 15672
ollama	LLM inference (optional)	11434

Template Modifications for RAG

The following changes were made to adapt the template for the RAG use case:

PostgreSQL Image: Changed from postgres:16-alpine to pgvector/pgvector:pg16 for vector support
Ollama Service: Added to docker-compose.yml with model auto-pull configuration
Dependencies: Added RAG-specific packages:
- Backend: markitdown, langgraph, tiktoken, pgvector, langchain-text-splitters
- Frontend: react-markdown, remark-gfm
New Feature Flags: Added ENABLE_PGVECTOR=true and ENABLE_LLM_OLLAMA=true
Environment Variables: Configured OLLAMA_HOST and OLLAMA_MODELS

Benefits of Using the Template

Rapid Development: Pre-configured infrastructure saved ~2 days of setup time
Best Practices: Built-in patterns for services, repositories, and API structure
Production-Ready: Health checks, migrations, error handling already implemented
Flexibility: Feature flags allow enabling only needed services
Consistency: Established code organization and styling conventions

2. Executive Summary

One-Stop RAG is a production-ready Retrieval-Augmented Generation solution that allows users to:

Select a document folder via UI with persistence
Ingest documents (PDF, DOCX, PPTX, XLSX, TXT, MD) into a vector database
Chat with documents using multi-turn conversations powered by LangGraph

Key Features

Feature	Implementation
Document Conversion	Microsoft's `markitdown` library
Chunking	LangChain RecursiveCharacterTextSplitter (1000 tokens, 200 overlap)
Embeddings	Ollama `nomic-embed-text` (768 dimensions)
LLM	Ollama `phi3` model
Vector Storage	PostgreSQL + PGVector with HNSW indexing
Conversation Memory	PostgreSQL (persisted)
Agent Framework	LangGraph for multi-turn conversations

2. System Architecture

High-Level Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                              Frontend (React)                            │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                    /rag Page with Tabs                           │   │
│  │  ┌─────────────────────┐    ┌─────────────────────────────────┐ │   │
│  │  │   Documents Tab      │    │         Chat Tab                 │ │   │
│  │  │  - Folder selector   │    │  - Message history               │ │   │
│  │  │  - File list         │    │  - Source citations              │ │   │
│  │  │  - Ingest button     │    │  - Input field                   │ │   │
│  │  └─────────────────────┘    └─────────────────────────────────┘ │   │
│  └─────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                          Backend (FastAPI)                               │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │                      RAG API Endpoints                            │  │
│  │  /rag/documents/folder  /rag/documents/ingest  /rag/chat         │  │
│  └──────────────────────────────────────────────────────────────────┘  │
│                                    │                                     │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │                       Service Layer                               │  │
│  │  ┌────────────────┐  ┌────────────────┐  ┌────────────────────┐ │  │
│  │  │ Ingestion      │  │ Embedding      │  │ LangGraph          │ │  │
│  │  │ Service        │  │ Service        │  │ Chat Agent         │ │  │
│  │  │ (markitdown)   │  │ (Ollama)       │  │ (Retrieve→Generate)│ │  │
│  │  └────────────────┘  └────────────────┘  └────────────────────┘ │  │
│  └──────────────────────────────────────────────────────────────────┘  │
│                                    │                                     │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │                      Repository Layer                             │  │
│  │               RAG Repository (Vector Search + CRUD)               │  │
│  └──────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                          Data Layer                                      │
│  ┌───────────────────────────────┐  ┌────────────────────────────────┐ │
│  │     PostgreSQL + PGVector     │  │          Ollama                 │ │
│  │  - rag_documents              │  │  - nomic-embed-text (embed)    │ │
│  │  - rag_chunks (embeddings)    │  │  - phi3 (chat)                 │ │
│  │  - rag_conversations          │  │                                 │ │
│  │  - rag_messages               │  │                                 │ │
│  └───────────────────────────────┘  └────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘

Request Flow

User Query → Embed Query → Vector Search → Retrieve Top-K Chunks →
Build Context → Generate Response (phi3) → Return with Sources

3. Technology Stack

Backend

Component	Technology	Version
Web Framework	FastAPI	0.109+
ORM	SQLAlchemy 2.0	async
Vector DB	PostgreSQL + PGVector	16 / 0.7+
Document Conversion	markitdown	0.0.1a3
Text Splitting	LangChain	0.1+
Agent Framework	LangGraph	0.2+
Token Counting	tiktoken	0.5+
LLM/Embeddings	Ollama	phi3 / nomic-embed-text

Frontend

Component	Technology	Version
UI Framework	React	18
Language	TypeScript	5.3+
Build Tool	Vite	5
Styling	TailwindCSS	3.4+
State Management	TanStack Query	v5
Markdown Rendering	react-markdown	9+
HTTP Client	Axios	1.6+

4. Database Schema

Entity Relationship Diagram

┌─────────────────────┐
│    rag_documents    │
├─────────────────────┤
│ id (UUID, PK)       │
│ file_name           │
│ file_path           │
│ file_type           │
│ file_size           │
│ file_hash (unique)  │───────┐
│ folder_path         │       │
│ markdown_content    │       │
│ status              │       │
│ error_message       │       │
│ metadata (JSON)     │       │
│ created_at          │       │
│ updated_at          │       │
└─────────────────────┘       │
         │                    │
         │ 1:N                │
         ▼                    │
┌─────────────────────┐       │
│     rag_chunks      │       │
├─────────────────────┤       │
│ id (UUID, PK)       │       │
│ document_id (FK)    │───────┘
│ chunk_index         │
│ content             │
│ token_count         │
│ embedding (Vector)  │ ◄── HNSW Index
│ metadata (JSON)     │
│ created_at          │
│ updated_at          │
└─────────────────────┘

┌─────────────────────┐
│  rag_conversations  │
├─────────────────────┤
│ id (UUID, PK)       │
│ title               │
│ folder_path         │
│ is_active           │
│ created_at          │
│ updated_at          │
└─────────────────────┘
         │
         │ 1:N
         ▼
┌─────────────────────┐
│    rag_messages     │
├─────────────────────┤
│ id (UUID, PK)       │
│ conversation_id(FK) │
│ role                │
│ content             │
│ sources (JSON)      │
│ token_count         │
│ created_at          │
│ updated_at          │
└─────────────────────┘

Table Definitions

rag_documents

CREATE TABLE rag_documents (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    file_name VARCHAR(500) NOT NULL,
    file_path VARCHAR(2000) NOT NULL,
    file_type VARCHAR(50) NOT NULL,
    file_size INTEGER NOT NULL,
    file_hash VARCHAR(64) NOT NULL UNIQUE,
    folder_path VARCHAR(2000) NOT NULL,
    markdown_content TEXT,
    status VARCHAR(50) DEFAULT 'pending',
    error_message TEXT,
    metadata JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ
);

CREATE INDEX idx_rag_documents_folder ON rag_documents(folder_path);
CREATE INDEX idx_rag_documents_hash ON rag_documents(file_hash);

rag_chunks

CREATE TABLE rag_chunks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    document_id UUID NOT NULL REFERENCES rag_documents(id) ON DELETE CASCADE,
    chunk_index INTEGER NOT NULL,
    content TEXT NOT NULL,
    token_count INTEGER NOT NULL,
    embedding VECTOR(768),
    metadata JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ
);

-- HNSW index for fast cosine similarity search
CREATE INDEX idx_rag_chunks_embedding ON rag_chunks
    USING hnsw (embedding vector_cosine_ops);

rag_conversations

CREATE TABLE rag_conversations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    title VARCHAR(500),
    folder_path VARCHAR(2000),
    is_active BOOLEAN DEFAULT TRUE,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ
);

rag_messages

CREATE TABLE rag_messages (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    conversation_id UUID NOT NULL REFERENCES rag_conversations(id) ON DELETE CASCADE,
    role VARCHAR(20) NOT NULL,
    content TEXT NOT NULL,
    sources JSONB DEFAULT '[]',
    token_count INTEGER,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ
);

CREATE INDEX idx_rag_messages_conversation ON rag_messages(conversation_id);

5. Backend Services

Service Architecture

┌─────────────────────────────────────────────────────────────┐
│                      RagService (Facade)                     │
│  Orchestrates all RAG operations                            │
└─────────────────────────────────────────────────────────────┘
         │                    │                    │
         ▼                    ▼                    ▼
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│ IngestionService│  │EmbeddingService │  │  RagChatAgent   │
│                 │  │                 │  │   (LangGraph)   │
│ - list_folder() │  │ - embed_text()  │  │                 │
│ - ingest_folder│  │ - embed_batch() │  │ - Retrieve node │
│ - convert_to_md│  │                 │  │ - Augment node  │
│ - chunk_text() │  │                 │  │ - Generate node │
└─────────────────┘  └─────────────────┘  └─────────────────┘
         │                    │                    │
         └────────────────────┼────────────────────┘
                              ▼
                 ┌─────────────────────────┐
                 │     RagRepository       │
                 │  - create_document()    │
                 │  - create_chunks()      │
                 │  - vector_search()      │
                 │  - conversation CRUD    │
                 └─────────────────────────┘

5.1 Embedding Service

File: backend/app/services/rag/embedding_service.py

class EmbeddingService:
    """Generate embeddings using Ollama nomic-embed-text model."""

    model: str = "nomic-embed-text"

    async def embed_text(self, text: str) -> List[float]:
        """Generate embedding for single text (768 dimensions)."""

    async def embed_batch(self, texts: List[str], batch_size: int = 10) -> List[List[float]]:
        """Generate embeddings for multiple texts."""

5.2 Ingestion Service

File: backend/app/services/rag/ingestion_service.py

class IngestionService:
    """Process documents: convert → chunk → embed → store."""

    SUPPORTED_EXTENSIONS = {'.pdf', '.docx', '.pptx', '.xlsx', '.txt', '.md'}

    async def list_folder_files(self, folder_path: str) -> List[FileInfo]:
        """List all supported files in a folder."""

    async def ingest_folder(self, folder_path: str) -> AsyncGenerator[IngestProgress, None]:
        """Ingest all documents with progress updates (SSE)."""

    async def _convert_to_markdown(self, file_path: str) -> str:
        """Convert document to markdown using markitdown."""

    async def _chunk_markdown(self, content: str, document_id: UUID) -> List[ChunkData]:
        """Chunk markdown using LangChain splitter (1000 tokens, 200 overlap)."""

5.3 LangGraph Chat Agent

File: backend/app/services/rag/chat_agent.py

See Section 8 for detailed architecture.

6. API Endpoints

RAG Router: `/api/v1/rag`

Endpoint	Method	Description	Request	Response
`/documents/folder`	GET	List folder files	`?path=/path/to/folder`	`FolderContents`
`/documents/ingest`	POST	Ingest folder	`{folder_path: str}`	SSE stream
`/documents/status`	GET	Get ingestion status	`?folder_path=...`	`List[DocumentStatus]`
`/chat`	POST	Send chat message	`ChatRequest`	`ChatResponse`
`/chat/stream`	POST	Stream chat response	`ChatRequest`	SSE stream
`/conversations`	GET	List conversations	-	`List[ConversationSummary]`
`/conversations/{id}`	GET	Get conversation detail	-	`ConversationDetail`

Request/Response Schemas

class FolderContents(BaseModel):
    folder_path: str
    files: List[FileInfo]
    total_count: int

class FileInfo(BaseModel):
    name: str
    path: str
    type: str  # pdf, docx, pptx, xlsx, txt, md
    size: int
    modified_at: datetime

class ChatRequest(BaseModel):
    message: str
    conversation_id: Optional[UUID] = None
    folder_path: Optional[str] = None

class ChatResponse(BaseModel):
    conversation_id: UUID
    message: str
    sources: List[ChatSource]

class ChatSource(BaseModel):
    document_name: str
    chunk_content: str
    relevance_score: float

7. Frontend Components

Component Tree

App.tsx
└── MainLayout
    └── /rag route
        └── RagPage
            ├── Tab Navigation (Documents | Chat)
            ├── DocumentsTab
            │   ├── FolderInput (with localStorage persistence)
            │   ├── FileList
            │   │   └── FileCard (name, type, size, status)
            │   └── IngestButton (with progress bar)
            └── ChatTab
                ├── FolderContext (shows current folder)
                ├── MessageList
                │   ├── UserMessage
                │   └── AssistantMessage
                │       └── SourcesAccordion
                └── ChatInput

State Management

// RagPage state
const [activeTab, setActiveTab] = useState<'documents' | 'chat'>('documents');
const [selectedFolder, setSelectedFolder] = useState<string>(() =>
    localStorage.getItem('rag_selected_folder') || ''
);

// ChatTab state
const [messages, setMessages] = useState<ChatMessage[]>([]);
const [conversationId, setConversationId] = useState<string | undefined>();

API Client

// frontend/src/api/rag.ts
export const ragApi = {
    listFolder(path: string): Promise<FolderContents>,
    ingestFolder(path: string, onProgress: (event) => void): EventSource,
    chat(request: ChatRequest): Promise<ChatResponse>,
    chatStream(request: ChatRequest, onMessage: (event) => void): EventSource,
    listConversations(): Promise<ConversationSummary[]>,
};

8. LangGraph Agent Architecture

Graph Structure

┌─────────┐     ┌──────────┐     ┌─────────┐     ┌──────────┐
│  Entry  │ ──► │ Retrieve │ ──► │ Augment │ ──► │ Generate │ ──► End
└─────────┘     └──────────┘     └─────────┘     └──────────┘

Agent State

class AgentState(TypedDict):
    messages: List[Dict[str, str]]  # Conversation history
    query: str                       # Current user query
    context: str                     # Retrieved and formatted chunks
    sources: List[Dict[str, Any]]   # Source metadata
    folder_path: Optional[str]       # Folder scope for search
    final_response: Optional[str]    # Generated response

Node Implementations

Retrieve Node

async def retrieve_node(state: AgentState) -> Dict[str, Any]:
    """
    1. Generate embedding for user query
    2. Perform vector search in PGVector
    3. Return top-5 most similar chunks
    """
    query_embedding = await embedding_service.embed_text(state["query"])
    results = await repository.vector_search(
        embedding=query_embedding,
        folder_path=state.get("folder_path"),
        limit=5
    )
    return {"sources": format_sources(results)}

Augment Node

async def augment_node(state: AgentState) -> Dict[str, Any]:
    """
    Format retrieved chunks into context string.
    """
    context_parts = []
    for i, source in enumerate(state["sources"], 1):
        context_parts.append(
            f"[Document {i}: {source['document_name']}]\n{source['chunk_content']}"
        )
    return {"context": "\n\n---\n\n".join(context_parts)}

Generate Node

async def generate_node(state: AgentState) -> Dict[str, Any]:
    """
    Generate response using Ollama phi3 with:
    - System prompt with RAG context
    - Full conversation history
    - Current query
    """
    system_prompt = f"""You are a helpful AI assistant that answers questions
    based on the provided documents.

    CONTEXT:
    {state["context"]}

    Answer based on the context. If unsure, acknowledge uncertainty."""

    response = await ollama_client.chat(
        model="phi3",
        messages=[
            {"role": "system", "content": system_prompt},
            *state["messages"],
            {"role": "user", "content": state["query"]}
        ]
    )
    return {"final_response": response["content"]}

Vector Search Query

SELECT
    c.id,
    c.content,
    c.chunk_index,
    d.file_name,
    1 - (c.embedding <=> :query_embedding::vector) as similarity
FROM rag_chunks c
JOIN rag_documents d ON c.document_id = d.id
WHERE d.status = 'completed'
  AND (:folder_path IS NULL OR d.folder_path = :folder_path)
ORDER BY c.embedding <=> :query_embedding::vector
LIMIT 5;

9. Configuration

Environment Variables

# .env

# Feature Flags
ENABLE_PGVECTOR=true
ENABLE_LLM_OLLAMA=true

# Ollama Configuration
OLLAMA_HOST=http://ollama:11434
OLLAMA_MODELS=phi3,nomic-embed-text

# RAG Configuration (optional overrides)
RAG_CHUNK_SIZE=1000
RAG_CHUNK_OVERLAP=200
RAG_TOP_K=5
RAG_EMBEDDING_MODEL=nomic-embed-text
RAG_CHAT_MODEL=phi3

Feature Flag Integration

RAG routes are automatically enabled when ENABLE_PGVECTOR=true in the environment.

10. Development Setup

Prerequisites

Docker Desktop 24+
Docker Compose 2.0+

Quick Start

# 1. Clone repository
git clone <repo-url>
cd one-stop-rag

# 2. Start services
make dev

# 3. Access application
# Frontend: http://localhost:5173/rag
# Backend API: http://localhost:8000/docs

Running Migrations

# Apply RAG tables migration
make migrate

Testing the RAG Pipeline

# 1. List folder files
curl "http://localhost:8000/api/v1/rag/documents/folder?path=/path/to/docs"

# 2. Ingest documents (SSE)
curl -X POST "http://localhost:8000/api/v1/rag/documents/ingest" \
  -H "Content-Type: application/json" \
  -d '{"folder_path": "/path/to/docs"}'

# 3. Chat with documents
curl -X POST "http://localhost:8000/api/v1/rag/chat" \
  -H "Content-Type: application/json" \
  -d '{"message": "What is this document about?"}'

File Structure Summary

New Backend Files

backend/app/
├── models/postgres/
│   └── rag.py                    # Database models
├── schemas/
│   └── rag.py                    # Pydantic schemas
├── repositories/
│   └── rag_repository.py         # Data access layer
├── services/rag/
│   ├── __init__.py
│   ├── embedding_service.py      # Ollama embeddings
│   ├── ingestion_service.py      # Document processing
│   ├── chat_agent.py             # LangGraph agent
│   └── rag_service.py            # Facade service
└── api/v1/
    └── rag.py                    # REST endpoints

backend/alembic/versions/
└── YYYYMMDD_rag_tables.py        # Migration

New Frontend Files

frontend/src/
├── api/
│   ├── ragTypes.ts               # TypeScript types
│   └── rag.ts                    # API client
├── pages/
│   └── Rag.tsx                   # Main page
└── components/rag/
    ├── DocumentsTab.tsx          # Document management
    └── ChatTab.tsx               # Chat interface

11. Implementation Notes & Lessons Learned

Critical Fixes Applied

1. SQLAlchemy Async Greenlet Issue

Problem: MissingGreenlet error when accessing lazy-loaded relationships in async context.

Solution: Check if conversation_id exists before accessing conversation.messages:

# rag_service.py:173
if conversation_id and conversation.messages:  # Not just: if conversation.messages

2. Docker Network Hostname

Problem: Backend couldn't connect to Ollama using localhost.

Solution: Use Docker service name ollama instead:

OLLAMA_HOST=http://ollama:11434  # Not localhost:11434

3. PostgreSQL Parameter Binding

Problem: Named parameters (:param) don't work with asyncpg.

Solution: Use positional parameters ($1, $2, $3):

# Instead of: WHERE d.folder_path = :folder_path
# Use: WHERE d.folder_path = $1

4. PGVector Extension

Problem: Standard PostgreSQL image doesn't include pgvector.

Solution: Use pgvector/pgvector:pg16 image in docker-compose.yml.

Key Architecture Decisions

Manual Node Execution: LangGraph's ainvoke() caused issues with SQLAlchemy async context, so nodes are executed manually in sequence.
Positional SQL Parameters: All raw SQL in repositories uses positional parameters for asyncpg compatibility.
Eager Loading: Conversations with messages use selectinload() to avoid lazy loading issues.
SSE for Progress: Document ingestion uses Server-Sent Events for real-time progress updates.

12. Testing Strategy (Planned)

Backend Tests (pytest)

Unit Tests: Services with mocked dependencies
Integration Tests: Full API with test database
Vector Search Tests: Pre-computed embeddings for deterministic results

Frontend Tests (Playwright)

E2E Flows: Document ingestion → Chat → Conversation management
Component Tests: DocumentsTab, ChatTab with mocked API

Coverage Targets

Backend: 90%+ on RAG services and repositories
Frontend: Critical user flows with Playwright

Document Status: Final v1.1 (Updated after implementation) Implementation Status: Complete and Operational

FilesExpand file tree

TECHNICAL_DESIGN.md

Latest commit

History