Complete guide for setting up and using the Model Context Protocol (MCP) server with the DataOps Toolkit.
The DataOps Toolkit includes a full Model Context Protocol (MCP) server implementation that enables AI assistants like Claude to interact naturally with all data transformation tools while maintaining complete audit trails.
Key Features:
- Automatic tool discovery - all tools instantly available
- Full audit trail for every operation
- Session management with persistence
- Support for all 22+ DataOps tools
- Claude Desktop integration ready
cd dataops-toolkit
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install with MCP support
pip install -e ".[mcp]"# Auto-discovery mode (recommended)
python start_mcp_auto.py
# The server will:
# - Auto-discover all tools in dataops/tools/
# - Start on stdio transport for Claude Desktop
# - Create MCP methods for each tool-
Find your Claude Desktop config file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
- macOS:
-
Edit the configuration:
{
"mcpServers": {
"dataops-toolkit": {
"command": "python",
"args": ["/full/path/to/dataops-toolkit/start_mcp_auto.py"],
"env": {
"PYTHONPATH": "/full/path/to/dataops-toolkit"
}
}
}
}-
Restart Claude Desktop
-
Test the integration - Ask Claude:
- "What DataOps tools do you have available?"
- "Can you profile the CSV file at data/customers.csv?"
- "Start a DataOps session for data cleaning"
All 22+ tools are automatically exposed via MCP:
csv_profile- Comprehensive statistical profilingdq_validate- Rule-based data validationquality_score- Multi-dimensional quality scoring
schema_infer- Automatic schema detectionmap_suggest- AI-powered field mappingfield_correspond- Cross-file field correspondence
column_cut- Column selection and reorderingrow_filter- Conditional row filteringcsv_join- Join operations on CSV filescsv_aggregate- Group by and aggregationcsv_transform- Schema transformationcsv_fill- Smart missing value filling
csv_sql- SQL queries on CSV filescsv_sql_multi- Multi-query SQL pipelinescsv_diff- File comparison and diffcsv_pivot- Pivot/unpivot operationsfunc_dep_check- Functional dependency analysis
csv_split- Split large files into chunkscsv_consolidate- Consolidate duplicatescsv_merge- Merge multiple CSVs
csv_clean- Data standardizationdedupe_er- Fuzzy deduplication
session_start- Start audit sessionsession_end- End sessionsession_status- Get current session
Human: I have customer data at data/customers.csv that needs cleaning.
Can you assess its quality and clean it up?
Claude: I'll help you assess and clean your customer data. Let me start by
analyzing its current quality.
[Claude automatically:]
1. Starts a session
2. Profiles the data
3. Calculates quality score
4. Identifies issues
5. Applies cleaning operations
6. Validates results
7. Ends session with full audit trail
# Example of how tools are called via MCP
{
"method": "csv_profile",
"params": {
"input_file": "data/customers.csv",
"output_file": "reports/profile.json",
"sample_size": 10000
}
}
# Response includes:
{
"success": true,
"result": {
"rows_analyzed": 10000,
"columns_profiled": 15,
"data_completeness": 92.5,
"quality_issues_found": 3,
"output_file": "reports/profile.json",
"operation_id": "01HXYZ..."
}
}The MCP server automatically discovers tools by:
- Scanning
dataops/tools/directory - Finding functions with
@audit_operationdecorator - Extracting parameters and documentation
- Creating MCP method definitions
- No restart needed when adding tools!
Sessions persist across MCP calls:
# First call starts session
session_start(who="claude", purpose="analysis")
# Subsequent calls use active session
csv_profile("input.csv") # Automatically uses current session
csv_clean("input.csv", "clean.csv") # Same session
# End session to finalize
session_end(status="success")Every MCP operation is fully audited:
- Operation timestamp and duration
- Input/output file SHA-256 hashes
- Parameters used
- Actor information (AI agent)
- Session grouping
- Lineage tracking
# Set working directory
export DATAOPS_WORK_DIR=/path/to/your/data
# Set audit directory
export DATAOPS_AUDIT_DIR=/path/to/audit
# Enable debug logging
export MCP_LOG_LEVEL=DEBUGConfigure multiple MCP servers for different projects:
{
"mcpServers": {
"dataops-sales": {
"command": "python",
"args": ["/path/to/dataops/start_mcp_auto.py"],
"env": {
"PYTHONPATH": "/path/to/dataops",
"DATAOPS_WORK_DIR": "/data/sales"
}
},
"dataops-marketing": {
"command": "python",
"args": ["/path/to/dataops/start_mcp_auto.py"],
"env": {
"PYTHONPATH": "/path/to/dataops",
"DATAOPS_WORK_DIR": "/data/marketing"
}
}
}
}- Check file paths are absolute, not relative
- Ensure Python is in PATH
- Verify PYTHONPATH is set correctly
- Restart Claude Desktop after config changes
- Verify tool exists in
dataops/tools/ - Check tool has
@audit_operationdecorator - Ensure proper Python imports
- Always start a session before operations
- Check
.audit/sessions/currentexists - Verify session hasn't expired
- Use absolute paths or paths relative to working directory
- Check DATAOPS_WORK_DIR if set
- Ensure data files exist and are readable
Enable detailed logging:
# In your config.json
"env": {
"PYTHONPATH": "/path/to/dataops",
"MCP_LOG_LEVEL": "DEBUG"
}# test_mcp.py
import json
import sys
# Test tool discovery
from dataops.mcp.auto_discovery import discover_tools
tools = discover_tools()
print(f"Found {len(tools)} tools:")
for name in sorted(tools.keys()):
print(f" - {name}")- Read-Only Data Directory: Source data in
data/is never modified - Audit Everything: All operations logged with cryptographic hashes
- Session Isolation: Each session has its own audit trail
- Path Validation: Prevents directory traversal attacks
- Parameter Sanitization: All inputs validated
# Use sampling for profiling
csv_profile("large.csv", sample_size=10000)
# Split before processing
csv_split("huge.csv", "chunks/", chunk_size=50000)# Process multiple files efficiently
session_start(who="claude", purpose="batch_processing")
for file in files:
csv_clean(file, f"clean_{file}")
session_end() # Single audit trail for batch- Create tool in
dataops/tools/:
# dataops/tools/my_tool.py
from dataops.audit.wrapper import audit_operation
@audit_operation(
tool_name="my_tool",
tool_version="0.1.0"
)
def my_tool(input_file: str, output_file: str, **kwargs):
"""Tool description for MCP."""
# Implementation
return result, metadata- Tool is automatically available via MCP!
Create composite operations:
# dataops/tools/quality_pipeline.py
@audit_operation
def quality_pipeline(input_file: str, output_dir: str):
"""Run complete quality assessment pipeline."""
# Profile
profile = csv_profile(input_file, f"{output_dir}/profile.json")
# Score
score = quality_score(input_file, f"{output_dir}/score.json")
# Validate
validate = dq_validate(input_file, "rules.json",
f"{output_dir}/violations.csv")
return {
"profile": profile,
"score": score,
"validation": validate
}- Always use sessions for grouping related operations
- Specify output files to maintain lineage
- Use descriptive purposes in session starts
- Clean scratch directory periodically
- Monitor audit size - archive old sessions
- Test with small data before processing large files
- Document custom rules for validation tools
- Check KNOWN_ISSUES.md for known problems
- Review LLMs.md for AI-specific guidance
- See USAGE_EXAMPLES.md for detailed examples
- Open issues on GitHub for bugs or features
- Python: 3.11+
- MCP Protocol: Latest
- Claude Desktop: 1.0+
- Pandas: 2.0+
- DuckDB: 0.9+