PageIndex Fork with Azure OpenAI & MCTS RAG

Fork of VectifyAI/PageIndex with Azure OpenAI support and enhanced MCTS-based retrieval.

🔥 What's New in This Fork

Azure OpenAI Support

Full Azure OpenAI integration alongside standard OpenAI
Environment-based configuration (no code changes needed to switch)
GPT-5/o1/o3 model compatibility fixes (temperature, max_completion_tokens, tiktoken)

MCTS-Based RAG (`cookbook/mcts_rag.py`)

Monte Carlo Tree Search for intelligent document exploration
Multi-document support - search across multiple PDFs simultaneously
UCB1-based exploration/exploitation balancing
Iterative relevance scoring with backpropagation
Handles large documents without context window overflow

Local-Only Operation

No PageIndex cloud API required
All processing happens locally with your Azure/OpenAI credentials

📑 About PageIndex

PageIndex is a vectorless, reasoning-based RAG system that builds a hierarchical tree index from documents and uses LLMs to reason over that index for retrieval.

Key Features:

No Vector DB: Uses document structure and LLM reasoning, not vector similarity
No Chunking: Documents organized into natural sections
Human-like Retrieval: Simulates how experts navigate complex documents
Explainable: Traceable reasoning with page/section references

⚙️ Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Configure Environment

Copy .env.example to .env and configure:

# For Azure OpenAI
AZURE_OPENAI_API_KEY=your-azure-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_VERSION=2024-12-01-preview
AZURE_OPENAI_DEPLOYMENT=gpt-4o  # or gpt-5, etc.

# Or for standard OpenAI
OPENAI_API_KEY=sk-your-openai-key

3. Generate Document Structure

python run_pageindex.py --pdf_path /path/to/document.pdf

Output saved to results/<document>_structure.json

4. Query with MCTS RAG

# Single document
python cookbook/mcts_rag.py \
  -s results/document_structure.json \
  -p path/to/document.pdf \
  -q "Your question here" \
  -v

# Multiple documents
python cookbook/mcts_rag.py \
  -s doc1_structure.json -p doc1.pdf \
  -s doc2_structure.json -p doc2.pdf \
  -q "Question across all docs" \
  -v

# Interactive mode
python cookbook/mcts_rag.py \
  -s results/document_structure.json \
  -p path/to/document.pdf \
  -i

📁 Project Structure

pageindex/
├── pageindex/              # Core library (Azure-enhanced)
│   ├── utils.py            # LLM utilities with Azure support
│   ├── page_index.py       # Structure generation
│   └── config.yaml         # Default settings
├── cookbook/
│   ├── mcts_rag.py         # 🔥 MCTS-based RAG (main tool)
│   └── local_RAG_azure.ipynb  # Jupyter notebook alternative
├── run_pageindex.py        # Structure generation CLI
├── results/                # Generated structures
├── tests/pdfs/             # Sample documents
└── tutorials/              # Documentation

🔧 MCTS RAG Options

Usage: python cookbook/mcts_rag.py [options]

Required:
  -s, --structure   Path to structure JSON (can specify multiple)
  -p, --pdf         Path to PDF file (must match structure order)

Query:
  -q, --query       Question to ask
  -i, --interactive Start interactive mode

Options:
  -v, --verbose     Show detailed search progress
  --iterations N    Max MCTS iterations (default: 20)
  --exploration F   UCB1 exploration weight (default: 1.414)
  --threshold F     Relevance threshold 0-1 (default: 0.6)

🆚 MCTS vs Simple RAG

Aspect	Simple RAG	MCTS RAG
Selection	Single LLM call	Iterative exploration
Strategy	Pick all relevant nodes	UCB1 explore/exploit
Multi-doc	Limited	✅ Designed for it
LLM Calls	2-3	10-30 (configurable)
Best for	Simple queries	Complex, multi-section queries

📝 Changes from Original

Component	Original	This Fork
OpenAI Client	Standard only	Azure + Standard
Model Config	Hardcoded	Environment variables
Retrieval	Cloud API or basic	MCTS-based local
GPT-5 Support	❌	✅ Full compatibility
Multi-document	Via cloud API	Local MCTS

📜 License

Apache 2.0 (same as original) - see LICENSE

Original project: VectifyAI/PageIndex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PageIndex Fork with Azure OpenAI & MCTS RAG

🔥 What's New in This Fork

Azure OpenAI Support

MCTS-Based RAG (`cookbook/mcts_rag.py`)

Local-Only Operation

📑 About PageIndex

⚙️ Quick Start

1. Install Dependencies

2. Configure Environment

3. Generate Document Structure

4. Query with MCTS RAG

📁 Project Structure

🔧 MCTS RAG Options

🆚 MCTS vs Simple RAG

📝 Changes from Original

📜 License

🔗 Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 235 Commits
cookbook		cookbook
pageindex		pageindex
results		results
tests		tests
tutorials		tutorials
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_pageindex.py		run_pageindex.py

Folders and files

Latest commit

History

Repository files navigation

PageIndex Fork with Azure OpenAI & MCTS RAG

🔥 What's New in This Fork

Azure OpenAI Support

MCTS-Based RAG (cookbook/mcts_rag.py)

Local-Only Operation

📑 About PageIndex

⚙️ Quick Start

1. Install Dependencies

2. Configure Environment

3. Generate Document Structure

4. Query with MCTS RAG

📁 Project Structure

🔧 MCTS RAG Options

🆚 MCTS vs Simple RAG

📝 Changes from Original

📜 License

🔗 Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

MCTS-Based RAG (`cookbook/mcts_rag.py`)

Packages