A production-grade multi-agent research pipeline using CrewAI, LangChain, Qdrant, and your choice of LLM (Ollama or OpenAI) with HuggingFace embeddings.
src/
├── core/ # Config, logging, exceptions
├── ingestion/ # load → split → embed (HuggingFace) → upsert
├── vectorstore/ # Qdrant client factory + collection manager
├── rag/ # RetrievalQA chain (Ollama or OpenAI)
├── tools/ # CrewAI BaseTool wrapping the RAG chain
├── agents/ # Researcher & Writer agent factories
├── crew/ # Crew orchestrator
├── api/ # FastAPI server (schemas, router, app)
└── main.py # CLI entrypoint
outputs/ # Agent-generated .md files land here
| Provider | Env | Notes |
|---|---|---|
| Ollama | LLM_PROVIDER=ollama |
Free, runs locally — default |
| OpenAI | LLM_PROVIDER=openai |
Requires OPENAI_API_KEY |
Uses sentence-transformers/all-MiniLM-L6-v2 via HuggingFace — runs 100% locally, no API key required. Outputs 384-dim vectors.
uv synccp .env.example .env
# Fill in: QDRANT_URL, QDRANT_API_KEY
# If using Ollama: ensure Ollama is running and set OLLAMA_MODEL
# If using OpenAI: set OPENAI_API_KEY and LLM_PROVIDER=openaiollama pull llama3.2 # or mistral, gemma3, phi4, etc.make ingest PDF=attention.pdfmake run # uses provider from .env
make run-full PDF=attention.pdf # ingest + run in one shotResults → outputs/researcher_analysis.md and outputs/writer_summary.md.
make serve # dev with auto-reload
make serve-prod # productionAPI docs → http://localhost:8000/docs
| Method | Path | Description |
|---|---|---|
| GET | /api/v1/health |
Health check |
| POST | /api/v1/ingest |
Ingest a PDF into the vector store |
| POST | /api/v1/run |
Kick off the research crew |
POST /api/v1/run supports per-request LLM override:
{
"auto_ingest": false,
"llm_provider": "ollama",
"model": "mistral"
}See .env.example for the full list.
| Variable | Default | Description |
|---|---|---|
LLM_PROVIDER |
ollama |
openai or ollama |
OLLAMA_MODEL |
llama3.2 |
Any Ollama model |
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama server |
OPENAI_API_KEY |
— | Required only for OpenAI |
OPENAI_MODEL |
gpt-4o |
OpenAI model name |
EMBEDDING_MODEL |
sentence-transformers/all-MiniLM-L6-v2 |
HuggingFace model |
QDRANT_URL |
— | Qdrant cluster endpoint |
QDRANT_API_KEY |
— | Qdrant API key |
VECTOR_SIZE |
384 |
all-MiniLM-L6-v2 dims |
CHUNK_SIZE |
512 |
Text chunk size |
LOG_LEVEL |
INFO |
Logging verbosity |
make lint # ruff check
make format # ruff format
make smoke-test # quick import checks (no API keys needed)