An intelligent text-based CLI agent that provides conversational access to a knowledge base stored in PostgreSQL with PGVector. Uses RAG (Retrieval Augmented Generation) to search through embedded documents and provide contextual, accurate responses with source citations. Supports multiple document formats including audio files with Whisper transcription.
Start with the tutorials! Check out the docling_basics/ folder for progressive examples that teach Docling fundamentals:
- Simple PDF Conversion - Basic document processing
- Multiple Format Support - PDF, Word, PowerPoint handling
- Audio Transcription - Speech-to-text with Whisper
- Hybrid Chunking - Intelligent chunking for RAG systems
These tutorials provide the foundation for understanding how this full RAG agent works. β Go to Docling Basics
- π¬ Interactive text-based CLI with streaming responses
- π Semantic search through vector-embedded documents
- π Context-aware responses using RAG pipeline
- π― Source citation for all information provided
- π Real-time streaming text output as tokens arrive
- πΎ PostgreSQL/PGVector for scalable knowledge storage
- π§ Conversation history maintained across turns
- ποΈ Audio transcription with Whisper ASR (MP3 files)
- Python 3.9 or later
- PostgreSQL with PGVector extension (Supabase, Neon, self-hosted Postgres, etc.)
- API Keys:
- OpenAI API key (for embeddings and LLM)
# Install dependencies using UV
uv syncCopy .env.example to .env and fill in your credentials:
cp .env.example .envRequired variables:
-
DATABASE_URL- PostgreSQL connection string with PGVector extension- Example:
postgresql://user:password@localhost:5432/dbname - Supabase:
postgresql://postgres.[project-ref]:[password]@aws-0-[region].pooler.supabase.com:5432/postgres - Neon:
postgresql://[user]:[password]@[endpoint].neon.tech/[dbname]
- Example:
-
OPENAI_API_KEY- OpenAI API key for embeddings and LLM- Get from: https://platform.openai.com/api-keys
Optional variables:
LLM_CHOICE- OpenAI model to use (default:gpt-4o-mini)EMBEDDING_MODEL- Embedding model (default:text-embedding-3-small)
You must set up your PostgreSQL database with the PGVector extension and create the required schema:
-
Enable PGVector extension in your database (most cloud providers have this pre-installed)
CREATE EXTENSION IF NOT EXISTS vector;
-
Run the schema file to create tables and functions:
# In the SQL editor in Supabase/Neon, run: sql/schema.sql # Or using psql psql $DATABASE_URL < sql/schema.sql
The schema file (sql/schema.sql) creates:
documentstable for storing original documents with metadatachunkstable for text chunks with 1536-dimensional embeddingsmatch_chunks()function for vector similarity search
Add your documents to the documents/ folder. Multiple formats supported via Docling:
Supported Formats:
- π PDF (
.pdf) - π Word (
.docx,.doc) - π PowerPoint (
.pptx,.ppt) - π Excel (
.xlsx,.xls) - π HTML (
.html,.htm) - π Markdown (
.md,.markdown) - π Text (
.txt) - π΅ Audio (
.mp3) - transcribed with Whisper
# Ingest all supported documents in the documents/ folder
# NOTE: By default, this CLEARS existing data before ingestion
uv run python -m ingestion.ingest --documents documents/
# Adjust chunk size (default: 1000)
uv run python -m ingestion.ingest --documents documents/ --chunk-size 800The ingestion pipeline will:
- Auto-detect file type and use Docling for PDFs, Office docs, HTML, and audio
- Transcribe audio files using Whisper Turbo ASR with timestamps
- Convert to Markdown for consistent processing
- Split into semantic chunks with configurable size
- Generate embeddings using OpenAI
- Store in PostgreSQL with PGVector for similarity search
# Run the Docling RAG Agent CLI
uv run python cli.pyFeatures:
- π¨ Colored output for better readability
- π Session statistics (
statscommand) - π Clear history (
clearcommand) - π‘ Built-in help (
helpcommand) - β Database health check on startup
- π Real-time streaming responses
Available commands:
help- Show help informationclear- Clear conversation historystats- Show session statisticsexitorquit- Exit the CLI
Example interaction:
============================================================
π€ Docling RAG Knowledge Assistant
============================================================
AI-powered document search with streaming responses
Type 'exit', 'quit', or Ctrl+C to exit
Type 'help' for commands
============================================================
β Database connection successful
β Knowledge base ready: 20 documents, 156 chunks
Ready to chat! Ask me anything about the knowledge base.
You: What topics are covered in the knowledge base?
π€ Assistant: Based on the knowledge base, the main topics include...
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
You: quit
π Thank you for using the knowledge assistant. Goodbye!
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β CLI User ββββββΆβ RAG Agent ββββββΆβ PostgreSQL β
β (Input) β β (PydanticAI) β β PGVector β
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β
ββββββββ΄βββββββ
β β
βββββββΌβββββ ββββββΌββββββ
β OpenAI β β OpenAI β
β LLM β βEmbeddingsβ
ββββββββββββ ββββββββββββ
Audio files are automatically transcribed using OpenAI Whisper Turbo model:
How it works:
- When ingesting audio files (MP3 supported currently), Docling uses Whisper ASR
- Whisper generates accurate transcriptions with timestamps
- Transcripts are formatted as markdown with time markers
- Audio content becomes fully searchable through the RAG system
Benefits:
- ποΈ Speech-to-text: Convert podcasts, interviews, lectures into searchable text
- β±οΈ Timestamps: Track when specific content was mentioned
- π Semantic search: Find audio content by topic or keywords
- π€ Fully automatic: Drop audio files in
documents/folder and run ingestion
Model details:
- Model:
openai/whisper-large-v3-turbo - Optimized for: Speed and accuracy balance
- Languages: Multilingual support (90+ languages)
- Output format: Markdown with timestamps like
[time: 0.0-4.0] Transcribed text here
Example transcript format:
[time: 0.0-4.0] Welcome to our podcast on AI and machine learning.
[time: 5.28-9.96] Today we'll discuss retrieval augmented generation systems.The main agent (rag_agent.py) that:
- Manages database connections with connection pooling
- Handles interactive CLI with streaming responses
- Performs knowledge base searches via RAG
- Tracks conversation history for context
Function tool registered with the agent that:
- Generates query embeddings using OpenAI
- Searches using PGVector cosine similarity
- Returns top-k most relevant chunks
- Formats results with source citations
Example tool definition:
async def search_knowledge_base(
ctx: RunContext[None],
query: str,
limit: int = 5
) -> str:
"""Search the knowledge base using semantic similarity."""
# Generate embedding for query
# Search PostgreSQL with PGVector
# Format and return results-
documents: Stores original documents with metadataid,title,source,content,metadata,created_at,updated_at
-
chunks: Stores text chunks with vector embeddingsid,document_id,content,embedding(vector(1536)),chunk_index,metadata,token_count
-
match_chunks(): PostgreSQL function for vector similarity search- Uses cosine similarity (
1 - (embedding <=> query_embedding)) - Returns chunks with similarity scores above threshold
- Uses cosine similarity (
db_pool = await asyncpg.create_pool(
DATABASE_URL,
min_size=2,
max_size=10,
command_timeout=60
)The embedder includes built-in caching for frequently searched queries, reducing API calls and latency.
Token-by-token streaming provides immediate feedback to users while the LLM generates responses:
async with agent.run_stream(user_input, message_history=history) as result:
async for text in result.stream_text(delta=False):
print(f"\rAssistant: {text}", end="", flush=True)# Start all services
docker-compose up -d
# Ingest documents
docker-compose --profile ingestion up ingestion
# View logs
docker-compose logs -f rag-agentasync def search_knowledge_base(
ctx: RunContext[None],
query: str,
limit: int = 5
) -> str:
"""
Search the knowledge base using semantic similarity.
Args:
query: The search query to find relevant information
limit: Maximum number of results to return (default: 5)
Returns:
Formatted search results with source citations
"""-- Vector similarity search
SELECT * FROM match_chunks(
query_embedding::vector(1536),
match_count INT,
similarity_threshold FLOAT DEFAULT 0.7
)Returns chunks with:
id: Chunk UUIDcontent: Text contentembedding: Vector embeddingsimilarity: Cosine similarity score (0-1)document_title: Source document titledocument_source: Source document path
docling-rag-agent/
βββ cli.py # Enhanced CLI with colors and features (recommended)
βββ rag_agent.py # Basic CLI agent with PydanticAI
βββ ingestion/
β βββ ingest.py # Document ingestion pipeline
β βββ embedder.py # Embedding generation with caching
β βββ chunker.py # Document chunking logic
βββ utils/
β βββ providers.py # OpenAI model/client configuration
β βββ db_utils.py # Database connection pooling
β βββ models.py # Pydantic models for config
βββ sql/
β βββ schema.sql # PostgreSQL schema with PGVector
βββ documents/ # Sample documents for ingestion
βββ pyproject.toml # Project dependencies
βββ .env.example # Environment variables template
βββ README.md # This file