Neo4j Documentation MCP Server & Evaluation Pipeline

This project provides:

MCP Server: A Model Context Protocol server for Neo4j documentation
Evaluation Pipeline: Tools to evaluate and compare LLM retrieval approaches

Project Structure

.
├── mcp-neo4j-docs/          # MCP Server for Neo4j Documentation
│   ├── main.py              # MCP server implementation
│   ├── test_server.py       # Server testing utilities
│   ├── pyproject.toml       # Project dependencies
│   └── README.md            # MCP server documentation
│
├── evals/                   # Evaluation Pipeline
│   ├── eval_pipeline.py     # Main evaluation script
│   ├── populate_neo4j_vectors.py  # Vector store population
│   ├── test_questions.json  # Test question dataset
│   ├── .env.template        # Environment variables template
│   └── README.md            # Evaluation documentation
│
└── README.md                # This file

Quick Start

1. MCP Server Setup

The MCP server provides tools for browsing and reading Neo4j documentation with intelligent caching.

# Navigate to MCP directory
cd mcp-neo4j-docs

# Install dependencies
uv sync

# Run the server
python main.py

# Or test the server
python test_server.py

See mcp-neo4j-docs/README.md for detailed documentation.

2. Evaluation Pipeline Setup

The evaluation pipeline compares MCP server retrieval vs Neo4j vector search.

# Navigate to evals directory
cd evals

# Configure environment
cp .env.template .env
# Edit .env with your credentials

# (Optional) Populate Neo4j vector store
python populate_neo4j_vectors.py

# Run evaluation
python eval_pipeline.py

See evals/README.md for detailed documentation.

Components

MCP Server (`mcp-neo4j-docs/`)

An MCP server that provides:

Resources: Lists of Neo4j manuals and GraphAcademy courses
Tools: Browse manuals, read pages, access courses, manage cache
Caching: Automatic caching of fetched content

Key Features:

Browse all Neo4j documentation manuals
Read specific documentation pages
Access GraphAcademy courses
Smart caching for performance
Compatible with Claude Desktop, Cline, and other MCP clients

Evaluation Pipeline (`evals/`)

A comprehensive evaluation framework that:

Compares MCP server vs vector search retrieval
Measures accuracy and efficiency
Generates detailed reports

Key Features:

Dual Retrieval: Tests both MCP and vector search approaches
LLM Evaluation: Uses GPT-4o-mini to evaluate answer quality
Multiple Metrics: Accuracy, completeness, relevance, clarity, speed
Comprehensive Reporting: JSON, CSV, and console reports
Extensible: Easy to add custom test questions

Use Cases

For Developers

Test and optimize documentation retrieval systems
Compare different RAG (Retrieval-Augmented Generation) approaches
Benchmark MCP server performance
Evaluate LLM response quality

For Researchers

Study efficiency of different retrieval methods
Analyze trade-offs between speed and accuracy
Generate datasets for RAG evaluation
Compare semantic search vs direct fetching

For Documentation Teams

Assess documentation searchability
Identify gaps in documentation
Optimize content for better retrieval
Monitor documentation quality over time

Requirements

MCP Server

Python 3.12+
Dependencies: mcp, fastmcp, requests, beautifulsoup4

Evaluation Pipeline

Python 3.12+
Neo4j instance (for vector search)
OpenAI API key (for embeddings and LLM)
Additional dependencies: langchain, langchain-neo4j, langchain-openai, pandas, numpy

Environment Variables

For the evaluation pipeline, create evals/.env with:

# Neo4j Docs Chatbot Connection
DOCS_CHATBOT_URI=neo4j+s://your-instance.databases.neo4j.io
DOCS_CHATBOT_USERNAME=neo4j
DOCS_CHATBOT_PASSWORD=your-password
DOCS_CHATBOT_INDEX=documentation_embeddings
DOCS_CHATBOT_EMBEDDING_MODEL=text-embedding-3-small

# OpenAI API
OPENAI_API_KEY=your-openai-key

# Optional: Anthropic API Key
ANTHROPIC_API_KEY=your-anthropic-key

Example Workflow

# 1. Start the MCP server
cd mcp-neo4j-docs
python main.py &

# 2. Populate vector store (first time only)
cd ../evals
python populate_neo4j_vectors.py

# 3. Run evaluation
python eval_pipeline.py

# 4. Review results
cat evaluation_results.json
# or
open evaluation_results.csv

Sample Output

================================================================================
EVALUATION RESULTS SUMMARY
================================================================================

## Overall Performance
Total Questions Evaluated: 15

## Accuracy Metrics (Average Scores)
MCP - Overall:       8.10/10
Vector - Overall:    7.60/10

## Efficiency Metrics (Average Times)
MCP - Total:         5.65s
Vector - Total:      3.95s

## Winner Analysis
MCP Wins:     9 (60.0%)
Vector Wins:  6 (40.0%)

## Recommendations
✓ MCP Server provides better answer quality on average
✓ Vector Search is faster on average

Configuration

MCP Server

Configure in MCP client (e.g., Claude Desktop):

{
  "mcpServers": {
    "neo4j-docs": {
      "command": "python",
      "args": ["/path/to/mcp-neo4j-docs/main.py"]
    }
  }
}

Evaluation Pipeline

Customize test questions in evals/test_questions.json:

{
  "question": "Your question here",
  "expected_answer": "Expected answer",
  "category": "category-name",
  "difficulty": "easy|medium|hard"
}

Development

Running Tests

# Test MCP server
cd mcp-neo4j-docs
python test_server.py

# Test with specific manual
python test_server.py --manual cypher-manual

Adding New Evaluation Metrics

Extend LLMEvaluator class in evals/eval_pipeline.py:

def evaluate_custom_metric(self, question: str, answer: str) -> float:
    # Your evaluation logic here
    return score

Troubleshooting

MCP Server Issues

Ensure all dependencies are installed: uv sync
Check network connectivity to neo4j.com
Clear cache if stale data: Use clear_cache() tool

Evaluation Pipeline Issues

Import errors: Run from evals/ directory
Neo4j connection: Verify credentials in .env
OpenAI rate limits: Reduce question count or add delays
Vector search fails: Run populate_neo4j_vectors.py first

Performance Tips

MCP Server

Cache is automatic - frequently accessed pages load instantly
Limit page fetches per manual for faster browsing
Use specific manuals instead of browsing all

Evaluation Pipeline

Start with a small question set (5-10) for testing
Populate vector store once, reuse for multiple evaluations
Use gpt-4o-mini for cost-effective evaluation
Run evaluations during off-peak hours to avoid rate limits

Contributing

Contributions welcome! Areas for improvement:

Additional evaluation metrics
Support for more LLM providers
Enhanced caching strategies
Better documentation coverage
UI for evaluation results

License

See LICENSE file for details.

Resources

Support

For issues or questions:

Check the README files in each directory
Review the troubleshooting sections
Open an issue on GitHub (if applicable)

Built with ❤️ for better documentation retrieval and evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
client		client
evals		evals
mcp-neo4j-docs		mcp-neo4j-docs
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Neo4j Documentation MCP Server & Evaluation Pipeline

Project Structure

Quick Start

1. MCP Server Setup

2. Evaluation Pipeline Setup

Components

MCP Server (mcp-neo4j-docs/)

Evaluation Pipeline (evals/)

Use Cases

For Developers

For Researchers

For Documentation Teams

Requirements

MCP Server

Evaluation Pipeline

Environment Variables

Example Workflow

Sample Output

Configuration

MCP Server

Evaluation Pipeline

Development

Running Tests

Adding New Evaluation Metrics

Troubleshooting

MCP Server Issues

Evaluation Pipeline Issues

Performance Tips

MCP Server

Evaluation Pipeline

Contributing

License

Resources

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

MCP Server (`mcp-neo4j-docs/`)

Evaluation Pipeline (`evals/`)

Packages