pAIjo RAG — Islamic Knowledge Retrieval System

A Retrieval-Augmented Generation (RAG) pipeline for an Islamic knowledge assistant serving the Indonesian Muslim community

Built in collaboration with Ainun Najib as part of the pAIjo WhatsApp Muslim Assistant project

Overview

pAIjo RAG is the retrieval-augmented generation component of pAIjo, a WhatsApp-based Islamic knowledge assistant designed for the Indonesian Muslim community.

The RAG system enables pAIjo to:

Retrieve verified Islamic knowledge from a curated vector database
Ground LLM responses in authentic Islamic sources to prevent hallucination
Serve real-time queries on Islamic jurisprudence (fiqih), worship practices, and religious guidance
Scale to concurrent users with sub-100ms retrieval latency

Why RAG for Islamic Knowledge?

Fabricating or misattributing Islamic quotes is a critical failure mode for any AI system. By implementing RAG, we ensure that every response is grounded in verified, curated content from trusted Islamic scholars and authenticated sources — not generated from potentially unreliable training data.

Architecture

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────────┐
│   User Query    │────▶│   FastAPI Server  │────▶│  Embedding Backend  │
│  (WhatsApp/     │     │   (Port 8100)     │     │  (Local MiniLM or   │
│   Telegram)     │     └────────┬─────────┘     │   OpenAI)           │
└─────────────────┘              │                └────────┬────────────┘
                                 │    ┌────────────────────┘
                                 ▼    ▼
                        ┌──────────────────┐
                        │     Qdrant       │
                        │   Vector DB      │
                        └────────┬─────────┘
                                 │
                                 ▼
                        ┌──────────────────┐
                        │  Ranked Results  │
                        │  + Source Cites   │
                        └──────────────────┘

Data Flow

Ingestion Pipeline — Islamic knowledge documents (JSON/Markdown) are chunked, embedded, and stored in Qdrant
Query Pipeline — User questions are embedded and matched against the vector store using cosine similarity
Response Pipeline — Retrieved chunks with scores and source attribution are returned to the caller

Tech Stack

Component	Technology	Purpose
API Framework	FastAPI (Python)	High-performance async REST API
Vector Database	Qdrant	Similarity search & vector storage
Embeddings (default)	sentence-transformers MiniLM	Local multilingual embeddings (384 dims)
Embeddings (optional)	OpenAI text-embedding-3-small	Cloud embeddings (1536 dims)
Configuration	Pydantic Settings	Type-safe env var configuration

API Endpoints

`GET /healthz`

Health check endpoint for monitoring and load balancer integration.

curl http://localhost:8100/healthz

Response:

{
  "status": "ok",
  "collection": "paijo_knowledge",
  "points": 68
}

`POST /ingest`

Ingest knowledge files from the knowledge directory into the vector database.

# Ingest all files
curl -X POST http://localhost:8100/ingest \
  -H "Content-Type: application/json" \
  -d '{}'

# Ingest a specific file
curl -X POST http://localhost:8100/ingest \
  -H "Content-Type: application/json" \
  -d '{"path": "rag-knowledge/ramadan-01.md"}'

`POST /retrieve`

Retrieve relevant knowledge chunks for a given query.

curl -X POST http://localhost:8100/retrieve \
  -H "Content-Type: application/json" \
  -d '{"query": "Apa itu tahlilan?", "top_k": 3}'

Response:

{
  "query": "Apa itu tahlilan?",
  "count": 3,
  "results": [
    {
      "text": "Tahlilan adalah tradisi membaca doa...",
      "title": "Tahlilan dan Kirim Doa untuk Mayit",
      "source": "Bahtsul Masail NU",
      "category": "fiqih",
      "score": 0.3042
    }
  ]
}

Knowledge Base

The RAG system contains curated knowledge chunks across multiple Islamic domains:

Category	Files	Topics
NU Islamic Traditions	24	Tahlilan, Qunut, Maulid Nabi, Tawassul, Istighatsah, Hizib, Sholawat, Yasin
Ramadan Guidance	12	Prayer times, fiqih puasa, tarawih, sahur/iftar, zakat fitrah
Fiqih & Ibadah	3 JSON	Wudhu, shalat, puasa, zakat, istilah dasar, fatwa
Other	2	Sample fatwa Muhammadiyah, verification test

Getting Started

Prerequisites

Python 3.10+
Qdrant (via Docker or standalone binary)

Installation

# Clone the repository
git clone https://github.com/adityonugrohoid/pAIjo-rag.git
cd pAIjo-rag

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac

# Install dependencies
pip install -r requirements.txt

# Copy and configure environment
cp .env.example .env
# Edit .env if needed (defaults work for local setup)

Start Qdrant

Option A: Docker (recommended for production)

docker compose up -d

Option B: Binary (lightweight, good for local dev)

Download and run the Qdrant binary directly — no Docker needed:

# Download (one-time)
curl -sL https://github.com/qdrant/qdrant/releases/latest/download/qdrant-x86_64-unknown-linux-musl.tar.gz | tar -xz -C /usr/local/bin/

# Run with persistent local storage
mkdir -p .qdrant_storage
cd .qdrant_storage && qdrant &

The storage directory is only ~1MB for the full knowledge base and is gitignored.

Run the Server (API mode)

This requires two terminals: one for Qdrant, one for the server.

uvicorn app.main:app --host 0.0.0.0 --port 8100

Then test with curl:

# Health check
curl http://localhost:8100/healthz

# Ingest all knowledge files
curl -X POST http://localhost:8100/ingest \
  -H "Content-Type: application/json" -d '{}'

# Test retrieval
curl -X POST http://localhost:8100/retrieve \
  -H "Content-Type: application/json" \
  -d '{"query": "apa itu tahlilan?", "top_k": 3}'

CLI Scripts (no server needed)

The CLI scripts connect to Qdrant directly — you only need Qdrant running, not the FastAPI server.

Ingest:

# Ingest all knowledge files
python scripts/ingest.py

# Ingest a specific file
python scripts/ingest.py --path rag-knowledge/ramadan-01.md

Retrieve:

# Search the knowledge base
python scripts/retrieve.py "apa itu tahlilan?"

# More results
python scripts/retrieve.py "kapan ramadhan?" --top-k 5

# Filter by category
python scripts/retrieve.py "shalat tarawih" --category ibadah

# Raw JSON output
python scripts/retrieve.py "undid iridium" --json

Embedding Providers

By default, pAIjo RAG uses local sentence-transformers (no API key required). To switch to OpenAI:

# In .env
EMBEDDING_PROVIDER=openai
OPENAI_API_KEY=sk-your-key-here

Note: Switching providers changes the vector dimension (384 vs 1536). You must recreate the Qdrant collection when switching.

Project Structure

pAIjo-rag/
├── app/
│   ├── main.py           # FastAPI app, lifespan, singletons
│   ├── config.py          # Pydantic Settings configuration
│   ├── models.py          # Request/response Pydantic models
│   ├── state.py           # Module-level singletons
│   ├── api/
│   │   └── routes.py      # /healthz, /retrieve, /ingest handlers
│   └── core/
│       ├── parser.py      # JSON/Markdown file parsing
│       ├── chunker.py     # Word-based text chunking with overlap
│       ├── embeddings.py  # Dual backend: local MiniLM + OpenAI
│       └── vectorstore.py # Qdrant client wrapper
├── scripts/
│   ├── ingest.py          # CLI ingestion tool
│   └── retrieve.py        # CLI retrieval tool
├── rag-knowledge/         # Curated Islamic knowledge base
├── .qdrant_storage/       # Local Qdrant data (gitignored)
├── docker-compose.yml     # Qdrant service
├── Dockerfile
├── requirements.txt
└── .env.example

Project Context

pAIjo — WhatsApp Muslim Assistant

pAIjo is a larger initiative to build an accessible, trustworthy Islamic knowledge assistant for Indonesian Muslims via WhatsApp — the most widely used messaging platform in Indonesia (200M+ users).

The RAG system is the knowledge backbone that ensures pAIjo's responses are grounded in verified Islamic scholarship rather than LLM hallucination — a critical requirement for religious content.

Collaboration

This project was built in collaboration with Ainun Najib, an Indonesian data platform & civic tech leader based in Singapore, who leads the pAIjo initiative.

Roles:

Ainun Najib — Project lead, architecture design, AI/ML strategy, knowledge curation, infrastructure
Adityo Nugroho — RAG implementation, FastAPI development, Qdrant integration, API design, testing, end-to-end verification

License

This project is licensed under the MIT License — see the LICENSE file for details.

Built for the Muslim community

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pAIjo RAG — Islamic Knowledge Retrieval System

Overview

Why RAG for Islamic Knowledge?

Architecture

Data Flow

Tech Stack

API Endpoints

`GET /healthz`

`POST /ingest`

`POST /retrieve`

Knowledge Base

Getting Started

Prerequisites

Installation

Start Qdrant

Run the Server (API mode)

CLI Scripts (no server needed)

Embedding Providers

Project Structure

Project Context

pAIjo — WhatsApp Muslim Assistant

Collaboration

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
rag-knowledge		rag-knowledge
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

pAIjo RAG — Islamic Knowledge Retrieval System

Overview

Why RAG for Islamic Knowledge?

Architecture

Data Flow

Tech Stack

API Endpoints

GET /healthz

POST /ingest

POST /retrieve

Knowledge Base

Getting Started

Prerequisites

Installation

Start Qdrant

Run the Server (API mode)

CLI Scripts (no server needed)

Embedding Providers

Project Structure

Project Context

pAIjo — WhatsApp Muslim Assistant

Collaboration

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /healthz`

`POST /ingest`

`POST /retrieve`

Packages