This project provides:
- MCP Server: A Model Context Protocol server for Neo4j documentation
- Evaluation Pipeline: Tools to evaluate and compare LLM retrieval approaches
.
├── mcp-neo4j-docs/ # MCP Server for Neo4j Documentation
│ ├── main.py # MCP server implementation
│ ├── test_server.py # Server testing utilities
│ ├── pyproject.toml # Project dependencies
│ └── README.md # MCP server documentation
│
├── evals/ # Evaluation Pipeline
│ ├── eval_pipeline.py # Main evaluation script
│ ├── populate_neo4j_vectors.py # Vector store population
│ ├── test_questions.json # Test question dataset
│ ├── .env.template # Environment variables template
│ └── README.md # Evaluation documentation
│
└── README.md # This file
The MCP server provides tools for browsing and reading Neo4j documentation with intelligent caching.
# Navigate to MCP directory
cd mcp-neo4j-docs
# Install dependencies
uv sync
# Run the server
python main.py
# Or test the server
python test_server.pySee mcp-neo4j-docs/README.md for detailed documentation.
The evaluation pipeline compares MCP server retrieval vs Neo4j vector search.
# Navigate to evals directory
cd evals
# Configure environment
cp .env.template .env
# Edit .env with your credentials
# (Optional) Populate Neo4j vector store
python populate_neo4j_vectors.py
# Run evaluation
python eval_pipeline.pySee evals/README.md for detailed documentation.
An MCP server that provides:
- Resources: Lists of Neo4j manuals and GraphAcademy courses
- Tools: Browse manuals, read pages, access courses, manage cache
- Caching: Automatic caching of fetched content
Key Features:
- Browse all Neo4j documentation manuals
- Read specific documentation pages
- Access GraphAcademy courses
- Smart caching for performance
- Compatible with Claude Desktop, Cline, and other MCP clients
A comprehensive evaluation framework that:
- Compares MCP server vs vector search retrieval
- Measures accuracy and efficiency
- Generates detailed reports
Key Features:
- Dual Retrieval: Tests both MCP and vector search approaches
- LLM Evaluation: Uses GPT-4o-mini to evaluate answer quality
- Multiple Metrics: Accuracy, completeness, relevance, clarity, speed
- Comprehensive Reporting: JSON, CSV, and console reports
- Extensible: Easy to add custom test questions
- Test and optimize documentation retrieval systems
- Compare different RAG (Retrieval-Augmented Generation) approaches
- Benchmark MCP server performance
- Evaluate LLM response quality
- Study efficiency of different retrieval methods
- Analyze trade-offs between speed and accuracy
- Generate datasets for RAG evaluation
- Compare semantic search vs direct fetching
- Assess documentation searchability
- Identify gaps in documentation
- Optimize content for better retrieval
- Monitor documentation quality over time
- Python 3.12+
- Dependencies:
mcp,fastmcp,requests,beautifulsoup4
- Python 3.12+
- Neo4j instance (for vector search)
- OpenAI API key (for embeddings and LLM)
- Additional dependencies:
langchain,langchain-neo4j,langchain-openai,pandas,numpy
For the evaluation pipeline, create evals/.env with:
# Neo4j Docs Chatbot Connection
DOCS_CHATBOT_URI=neo4j+s://your-instance.databases.neo4j.io
DOCS_CHATBOT_USERNAME=neo4j
DOCS_CHATBOT_PASSWORD=your-password
DOCS_CHATBOT_INDEX=documentation_embeddings
DOCS_CHATBOT_EMBEDDING_MODEL=text-embedding-3-small
# OpenAI API
OPENAI_API_KEY=your-openai-key
# Optional: Anthropic API Key
ANTHROPIC_API_KEY=your-anthropic-key# 1. Start the MCP server
cd mcp-neo4j-docs
python main.py &
# 2. Populate vector store (first time only)
cd ../evals
python populate_neo4j_vectors.py
# 3. Run evaluation
python eval_pipeline.py
# 4. Review results
cat evaluation_results.json
# or
open evaluation_results.csv================================================================================
EVALUATION RESULTS SUMMARY
================================================================================
## Overall Performance
Total Questions Evaluated: 15
## Accuracy Metrics (Average Scores)
MCP - Overall: 8.10/10
Vector - Overall: 7.60/10
## Efficiency Metrics (Average Times)
MCP - Total: 5.65s
Vector - Total: 3.95s
## Winner Analysis
MCP Wins: 9 (60.0%)
Vector Wins: 6 (40.0%)
## Recommendations
✓ MCP Server provides better answer quality on average
✓ Vector Search is faster on average
Configure in MCP client (e.g., Claude Desktop):
{
"mcpServers": {
"neo4j-docs": {
"command": "python",
"args": ["/path/to/mcp-neo4j-docs/main.py"]
}
}
}Customize test questions in evals/test_questions.json:
{
"question": "Your question here",
"expected_answer": "Expected answer",
"category": "category-name",
"difficulty": "easy|medium|hard"
}# Test MCP server
cd mcp-neo4j-docs
python test_server.py
# Test with specific manual
python test_server.py --manual cypher-manualExtend LLMEvaluator class in evals/eval_pipeline.py:
def evaluate_custom_metric(self, question: str, answer: str) -> float:
# Your evaluation logic here
return score- Ensure all dependencies are installed:
uv sync - Check network connectivity to neo4j.com
- Clear cache if stale data: Use
clear_cache()tool
- Import errors: Run from
evals/directory - Neo4j connection: Verify credentials in
.env - OpenAI rate limits: Reduce question count or add delays
- Vector search fails: Run
populate_neo4j_vectors.pyfirst
- Cache is automatic - frequently accessed pages load instantly
- Limit page fetches per manual for faster browsing
- Use specific manuals instead of browsing all
- Start with a small question set (5-10) for testing
- Populate vector store once, reuse for multiple evaluations
- Use
gpt-4o-minifor cost-effective evaluation - Run evaluations during off-peak hours to avoid rate limits
Contributions welcome! Areas for improvement:
- Additional evaluation metrics
- Support for more LLM providers
- Enhanced caching strategies
- Better documentation coverage
- UI for evaluation results
See LICENSE file for details.
For issues or questions:
- Check the README files in each directory
- Review the troubleshooting sections
- Open an issue on GitHub (if applicable)
Built with ❤️ for better documentation retrieval and evaluation.