Extract process mining insights from SAP ECC 6.0 without S/4HANA migration
Zero risk - Read-only access, no data modification Zero cost - Open source, MIT license Zero migration - Works with your existing ECC system Fast time to value - Hours to first insights, not months
| Feature | Description |
|---|---|
| 🗣️ Natural Language Interface | Ask questions in plain English - "Why are orders delayed?" |
| 📊 OCEL 2.0 Export | Object-centric event logs for PM4Py, Celonis |
| ✅ Conformance Checking | Compare processes against O2C/P2P reference models |
| 🗺️ Visual Process Maps | Mermaid/GraphViz diagrams with bottleneck highlighting |
| 🔮 Predictive Monitoring | ML-based late delivery, credit hold, completion time predictions |
Choose your path:
# Generate synthetic SAP data, run analysis, view results
docker-compose up --build
# Open browser to http://localhost:8080# Export from SE16: VBAK, VBAP, LIKP, LIPS, VBRK, VBRP, STXH/STXL
# Place files in ./input-data/
docker-compose run pattern-engine --input-dir /app/input-data --output-dir /app/output# Copy and edit configuration
cp .env.rfc.example .env.rfc
# Edit .env.rfc with your SAP connection details
# Run with RFC adapter
docker-compose --profile rfc up mcp-server-rfcSee Installation Guide for detailed setup instructions.
+-----------------------------------------------------------------------------------+
| Pattern Discovery Report |
+-----------------------------------------------------------------------------------+
| Pattern: "Credit Hold Escalation" |
| ----------------------------------------------------------------------------------|
| Finding: Orders with 'CREDIT HOLD' in notes have 3.2x longer fulfillment cycles |
| |
| Occurrence: 234 orders (4.7% of dataset) |
| Sales Orgs: 1000 (64%), 2000 (36%) |
| Confidence: HIGH (p < 0.001) |
| |
| Caveat: Correlation only - does not imply causation |
+-----------------------------------------------------------------------------------+
Key Features:
- Text Pattern Discovery - Find hidden patterns in order notes, rejection reasons, and delivery instructions
- Document Flow Analysis - Trace complete order-to-cash chains with timing at each step
- Outcome Correlation - Identify text patterns that correlate with delays, partial shipments, or returns
- Evidence-Based Reporting - Every pattern links to specific documents for verification
- Privacy-First Design - PII redaction enabled by default, shareable output mode for external review
Ask questions about your SAP processes in plain English:
User: "Why are orders from sales org 1000 taking longer to ship?"
System: Based on analysis of 5,234 orders:
- Average delay: 4.2 days vs 1.8 days for other orgs
- Root cause: 73% have "CREDIT HOLD" in notes
- Recommendation: Review credit check thresholds for org 1000
Confidence: HIGH | Evidence: 847 documents analyzed
Supports multiple LLM providers:
- Ollama (local, private) - Default for air-gapped environments
- OpenAI (GPT-4) - For cloud deployments
- Anthropic (Claude) - Alternative cloud option
Export to the Object-Centric Event Log standard for advanced process mining:
{
"ocel:version": "2.0",
"ocel:objectTypes": ["order", "item", "delivery", "invoice"],
"ocel:events": [...],
"ocel:objects": [...]
}- Captures multi-object relationships (order → items → deliveries → invoices)
- Compatible with PM4Py, Celonis, and other OCEL tools
- Export formats: JSON, XML, SQLite
Compare actual SAP processes against expected Order-to-Cash models:
Conformance Report: 94.2% (4,712 / 5,000 cases)
Deviations Detected:
├── CRITICAL: Invoice before Goods Issue (23 cases)
├── MAJOR: Skipped Delivery step (187 cases)
└── MINOR: Duplicate Order Created (78 cases)
- Pre-built O2C reference models (simple and detailed)
- Severity scoring: Critical / Major / Minor
- Deviation types: skipped steps, wrong order, missing activities
Generate process flow diagrams with bottleneck highlighting:
graph LR
A[Order Created] -->|2.1 days| B[Delivery Created]
B -->|0.5 days| C[Goods Issued]
C -->|3.2 days| D[Invoice Created]
style C fill:#f8d7da
- Output formats: Mermaid (Markdown), GraphViz (DOT), SVG
- Color-coded bottleneck severity (green/yellow/red)
- Timing annotations between process steps
ML-based prediction for process outcomes:
Order 0000012345 - Risk Assessment:
├── Late Delivery: 78% probability (HIGH RISK)
│ └── Factors: credit_block, order_value > $50k
├── Credit Hold: 45% probability (MEDIUM RISK)
└── Est. Completion: 8.2 days
Prediction Types:
- Late Delivery - Probability based on case age, progress, stalls, rework
- Credit Hold - Likelihood based on credit check status, complexity
- Completion Time - Estimated hours remaining based on progress/pace
29 Extracted Features:
- Temporal: case age, time since last event, avg time between events
- Activity: milestones reached, rework detection, loop count, backtracks
- Resource: unique resources, handoff count
- Risk indicators: stalled cases, credit holds, rejections, blocks
Risk Scoring:
- 🟢 Low (0-25%) | 🟡 Medium (25-50%) | 🟠 High (50-75%) | 🔴 Critical (75-100%)
- Configurable alert thresholds
- Actionable recommendations based on detected risk factors
| Consideration | S/4HANA Migration | SAP Workflow Mining |
|---|---|---|
| Timeline | 18-36 months | Hours to first insights |
| Cost | $10M-$100M+ | Free (MIT license) |
| Risk | Business disruption | Zero - read-only access |
| Data Location | Cloud/hosted | On-premise only |
| Prerequisites | Greenfield/brownfield project | Works with existing ECC 6.0 |
| Process Visibility | After migration | Before any changes |
| Use Case | Full transformation | Process discovery & optimization |
This tool does not replace S/4HANA. It helps you understand your current processes before making migration decisions - or find optimization opportunities in your existing ECC system.
- Docker & Docker Compose (recommended)
- OR Node.js 18+ and Python 3.10+ for local development
git clone https://github.com/your-org/sap-workflow-mining.git
cd sap-workflow-mining
docker-compose up --buildSee docs/adapter_guide.md for:
- RFC adapter configuration for ECC 6.0
- OData adapter configuration for S/4HANA
- CSV import from SE16 exports
- Air-gapped installation options
Configure the natural language interface in .env:
# Option 1: Local Ollama (default, private)
LLM_PROVIDER=ollama
OLLAMA_HOST=http://localhost:11434
LLM_MODEL=llama3
# Option 2: OpenAI
LLM_PROVIDER=openai
LLM_API_KEY=sk-...
LLM_MODEL=gpt-4
# Option 3: Anthropic
LLM_PROVIDER=anthropic
LLM_API_KEY=sk-ant-...
LLM_MODEL=claude-3-sonnet-20240229For air-gapped environments, use Ollama with locally downloaded models.
Interactive demos for all v2.0 process mining tools. No SAP connection required - all demos use synthetic data.
cd mcp-server
# Natural Language Interface - ask questions in plain English
npx tsx ../demos/ask_process_demo.ts
npx tsx ../demos/ask_process_demo.ts --interactive # Interactive mode
# OCEL 2.0 Export - export to process mining standard format
npx tsx ../demos/export_ocel_demo.ts
# Conformance Checking - compare against O2C reference model
npx tsx ../demos/check_conformance_demo.ts
# Visual Process Maps - generate Mermaid flowcharts
npx tsx ../demos/visualize_process_demo.ts
# Predictive Monitoring - ML-based risk predictions
npx tsx ../demos/predict_outcome_demo.ts| Demo | Description |
|---|---|
ask_process_demo.ts |
Natural language queries with LLM integration |
export_ocel_demo.ts |
OCEL 2.0 export with object/event breakdown |
check_conformance_demo.ts |
Deviation detection and severity scoring |
visualize_process_demo.ts |
Mermaid diagrams with bottleneck highlighting |
predict_outcome_demo.ts |
Risk predictions and alerts |
salt_adapter_demo.ts |
Real SAP O2C data from SALT dataset |
visualize_process_bpi_demo.ts |
Process maps with real P2P data (BPI 2019) |
predict_outcome_bpi_demo.ts |
Risk predictions with real P2P data (BPI 2019) |
ask_process_bpi_demo.ts |
Natural language queries on P2P data |
Use real SAP Purchase-to-Pay data from the BPI Challenge 2019 for testing with authentic business patterns.
# Download and convert BPI 2019 data
python scripts/download_bpi_2019.py
# Run demos with real P2P data
npx tsx demos/visualize_process_bpi_demo.ts 50
npx tsx demos/predict_outcome_bpi_demo.ts 30
npx tsx demos/ask_process_bpi_demo.tsDataset Statistics:
| Metric | Value |
|---|---|
| Total cases | 251,734 |
| Total events | 1.5M+ |
| Unique activities | 39 |
| Process type | Purchase-to-Pay (P2P) |
| Source | Multinational coatings company |
Activities include: SRM workflows, Purchase Orders, Goods Receipts, Service Entries, Invoice Processing, Vendor interactions
Use real SAP ERP data from SAP's SALT dataset on HuggingFace for testing with authentic business patterns.
# 1. Install Python dependencies
pip install datasets pyarrow
# 2. Download SALT dataset
python scripts/download-salt.py
# 3. Run demo with real data
cd mcp-server
npx tsx ../demos/salt_adapter_demo.tsSALT (Sales Autocompletion Linked Business Tables) contains:
| Table | Description | Records |
|---|---|---|
| I_SalesDocument | Sales order headers | ~1M+ |
| I_SalesDocumentItem | Order line items | ~5M+ |
| I_Customer | Customer master data | ~100K |
| I_AddrOrgNamePostalAddress | Address data | ~100K |
import { SaltAdapter } from './adapters/salt/index.js';
const adapter = new SaltAdapter({
maxDocuments: 10000, // Limit for memory management
});
await adapter.initialize();
// Get real sales order data
const header = await adapter.getSalesDocHeader({ vbeln: '0000012345' });
const items = await adapter.getSalesDocItems({ vbeln: '0000012345' });
// Get dataset statistics
const stats = adapter.getStats();
console.log(`Loaded ${stats.salesDocuments} sales documents`);SALT contains sales orders only (no deliveries or invoices). For full Order-to-Cash testing:
- Use SALT for sales order analysis and ML training
- Use synthetic adapter for complete O2C flow testing
- Combine both for comprehensive validation
| Aspect | Synthetic Data | SALT Real Data |
|---|---|---|
| Patterns | Random/artificial | Authentic business patterns |
| ML Training | Limited accuracy | Real-world feature distributions |
| Demos | Good for UI testing | Compelling for stakeholders |
| Validation | Functional testing | Business logic validation |
We've validated the MCP tools against real SAP datasets. View the detailed analysis:
| Dataset | Cases | Events | Key Findings | Report |
|---|---|---|---|---|
| BPI Challenge 2019 | 251,734 | 1.6M | 42 activities, 64-day median throughput | View → |
| SAP IDES O2C | 646 | 5,708 | 158 variants, bottlenecks identified | View → |
| SAP IDES P2P | 2,486 | 7,420 | 7 compliance violations detected | View → |
Process Diagrams: Mermaid flowcharts for O2C and P2P
Test Suite: 427 tests passing, 4 skipped (SALT data required)
This system is designed for enterprise security requirements.
| Concern | How We Address It |
|---|---|
| Data Access | Read-only BAPIs only - no write operations, no arbitrary SQL |
| Data Location | All processing is on-premise - no cloud, no external APIs |
| Network | No outbound connections, no telemetry, no phone-home |
| PII Protection | Automatic redaction of emails, phones, names, addresses |
| Audit Trail | Every query logged with parameters, timestamps, row counts |
| Row Limits | Default 200 rows per query, max 1000 - prevents bulk extraction |
See SECURITY.md for complete security documentation.
The RFC user requires display-only access to SD documents:
Authorization Object: S_RFC
RFC_TYPE = FUGR
RFC_NAME = STXR, 2001, 2051, 2056, 2074, 2077
ACTVT = 16 (Execute)
Authorization Object: V_VBAK_VKO
VKORG = [Your Sales Organizations]
ACTVT = 03 (Display)
Authorization Object: V_VBAK_AAT
AUART = * (or specific document types)
ACTVT = 03 (Display)
Copy-paste ready role template: See docs/SAP_AUTHORIZATION.md
| BAPI | Purpose | Tables Accessed |
|---|---|---|
BAPI_SALESORDER_GETLIST |
List sales orders | VBAK |
SD_SALESDOCUMENT_READ |
Read order header/items | VBAK, VBAP |
BAPI_SALESDOCU_GETRELATIONS |
Document flow (VBFA) | VBFA |
BAPI_OUTB_DELIVERY_GET_DETAIL |
Delivery details | LIKP, LIPS |
BAPI_BILLINGDOC_GETDETAIL |
Invoice details | VBRK, VBRP |
READ_TEXT |
Long text fields | STXH, STXL |
BAPI_CUSTOMER_GETDETAIL2 |
Customer master (stub) | KNA1 |
BAPI_MATERIAL_GET_DETAIL |
Material master (stub) | MARA |
No direct table access. No RFC_READ_TABLE unless explicitly enabled.
+------------------------------------------------------------------+
| Your Network |
| +------------------------------------------------------------+ |
| | | |
| | +----------------+ +-------------------+ | |
| | | SAP ECC 6.0 | | SAP Workflow | | |
| | | | | Mining Server | | |
| | | +----------+ | | | | |
| | | | SD/MM | | RFC | +-------------+ | | |
| | | | Tables |<--------->| MCP Server | | | |
| | | +----------+ | (R/O)| +-------------+ | | |
| | | | | | | | |
| | +----------------+ | v | | |
| | | +-------------+ | | |
| | | | Pattern | | | |
| | | | Engine | | | |
| | | +-------------+ | | |
| | | | | | |
| | | v | | |
| | +----------------+ | +-------------+ | | |
| | | Browser |<------>| Web Viewer | | | |
| | | (localhost) | | +-------------+ | | |
| | +----------------+ +-------------------+ | |
| | | |
| +------------------------------------------------------------+ |
| |
| NO EXTERNAL CONNECTIONS |
+------------------------------------------------------------------+
Data Flow:
- MCP Server connects to SAP via RFC (read-only BAPIs)
- Pattern Engine analyzes text fields and document flows
- Results stored locally with PII redacted
- Web Viewer displays findings on localhost
Nothing leaves your network.
No. This is an independent open-source project. It uses standard SAP BAPIs that are publicly documented.
Minimal impact. All queries are:
- Read-only (no locks)
- Row-limited (200 default, 1000 max)
- Rate-limited (configurable)
- Use standard BAPIs (not direct table access)
We recommend running initial analysis during off-peak hours.
Currently SD (Sales & Distribution) and MM (Materials Management) document flows. FI/CO integration is planned for future releases.
Yes. The tool uses BAPIs which are database-agnostic. Works with HANA, Oracle, DB2, SQL Server, MaxDB.
Yes. The Docker images can be built offline and transferred. No external dependencies at runtime.
Every pattern card includes:
- Sample document numbers for verification in SAP (VA03, VL03N, VF03)
- Statistical confidence intervals
- Explicit caveats about correlation vs. causation
- PII redaction is enabled by default
- No data leaves your network
- Shareable mode applies additional redaction
- See SECURITY.md for compliance considerations
Yes. See CONTRIBUTING.md for guidelines. Feature requests via GitHub Issues.
The MCP server includes a governance layer based on PromptSpeak symbolic frames for pre-execution blocking and human-in-the-loop approval workflows.
When AI agents access SAP data, you need controls to:
- Prevent bulk extraction - Hold requests for large date ranges or row counts
- Protect sensitive data - Require approval for searches containing PII patterns
- Halt rogue agents - Circuit breaker to immediately stop misbehaving agents
- Audit everything - Complete trail of all operations for compliance
Every operation has a symbolic frame indicating mode, domain, action, and entity:
Frame: ⊕◐◀α
│ │ │ └── Entity: α (primary agent)
│ │ └──── Action: ◀ (retrieve)
│ └────── Domain: ◐ (operational)
└──────── Mode: ⊕ (strict)
| Symbol | Category | Meaning |
|---|---|---|
⊕ |
Mode | Strict - exact compliance required |
⊘ |
Mode | Neutral - standard operation |
⊖ |
Mode | Flexible - allow interpretation |
⊗ |
Mode | Forbidden - blocks all actions |
◊ |
Domain | Financial (invoices, values) |
◐ |
Domain | Operational (orders, deliveries) |
◀ |
Action | Retrieve data |
▲ |
Action | Analyze/search |
● |
Action | Validate |
α β γ |
Entity | Primary/secondary/tertiary agent |
Operations are automatically held for human approval when:
| Trigger | Threshold | Example |
|---|---|---|
| Broad date range | >90 days | date_from: 2024-01-01, date_to: 2024-12-31 |
| High row limit | >500 rows | limit: 1000 |
| Sensitive patterns | SSN, credit card, password | pattern: "social security" |
Agent Request
│
▼
┌─────────────┐ ┌─────────────┐
│ Circuit │────▶│ BLOCKED │ (if agent halted)
│ Breaker │ └─────────────┘
└─────────────┘
│ OK
▼
┌─────────────┐ ┌─────────────┐
│ Frame │────▶│ BLOCKED │ (if ⊗ forbidden)
│ Validation │ └─────────────┘
└─────────────┘
│ OK
▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Hold │────▶│ HELD │────▶│ Human │
│ Check │ │ (pending) │ │ Approval │
└─────────────┘ └─────────────┘ └─────────────┘
│ OK │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ EXECUTE │◀────────────────────────│ APPROVED │
└─────────────┘ └─────────────┘
| Tool | Purpose |
|---|---|
ps_precheck |
Dry-run: check if operation would be allowed |
ps_list_holds |
List pending holds awaiting approval |
ps_approve_hold |
Approve a held operation |
ps_reject_hold |
Reject a held operation with reason |
ps_agent_status |
Check circuit breaker state for an agent |
ps_halt_agent |
Immediately halt an agent (blocks all ops) |
ps_resume_agent |
Resume a halted agent |
ps_stats |
Get governance statistics |
ps_frame_docs |
Get PromptSpeak frame reference |
// 1. Agent makes a request that triggers hold
const result = await mcp.callTool('search_doc_text', {
pattern: 'delivery',
date_from: '2024-01-01',
date_to: '2024-12-31', // >90 days triggers hold
});
// Returns: { held: true, hold_id: 'hold_abc123', reason: 'broad_date_range' }
// 2. Supervisor reviews pending holds
const holds = await mcp.callTool('ps_list_holds', {});
// Returns: [{ holdId: 'hold_abc123', tool: 'search_doc_text', severity: 'medium' }]
// 3. Supervisor approves
const approved = await mcp.callTool('ps_approve_hold', {
hold_id: 'hold_abc123',
approved_by: 'supervisor@example.com'
});
// Returns: { allowed: true, auditId: 'audit_xyz789' }// Immediately block a misbehaving agent
await mcp.callTool('ps_halt_agent', {
agent_id: 'agent-123',
reason: 'Excessive query rate detected'
});
// All subsequent requests from this agent are blocked
const result = await mcp.callTool('get_doc_text', {
doc_type: 'order',
doc_key: '0000000001',
_agent_id: 'agent-123' // Identifies the agent
});
// Returns: { error: 'Governance Blocked', message: 'Agent halted: Excessive query rate' }
// Resume when issue is resolved
await mcp.callTool('ps_resume_agent', { agent_id: 'agent-123' });| Tool | Purpose | Returns |
|---|---|---|
search_doc_text |
Find documents by text pattern | doc_type, doc_key, snippet, match_score |
get_doc_text |
Get all text fields for a document | header_texts[], item_texts[] |
get_doc_flow |
Get order-delivery-invoice chain | chain with keys, statuses, dates |
get_sales_doc_header |
Order header details | sales_org, customer, dates, values |
get_sales_doc_items |
Order line items | materials, quantities, values |
get_delivery_timing |
Requested vs actual delivery | timestamps, variance analysis |
get_invoice_timing |
Invoice creation/posting | invoice dates, accounting refs |
get_master_stub |
Safe master data attributes | hashed IDs, categories (no PII) |
| Tool | Purpose | Returns |
|---|---|---|
ask_process |
Natural language queries | answer, confidence, evidence, recommendations |
export_ocel |
Export to OCEL 2.0 format | OCEL JSON/XML with objects and events |
check_conformance |
Compare against O2C model | conformance_rate, deviations, severity_summary |
visualize_process |
Generate process diagrams | Mermaid/DOT/SVG with bottleneck highlighting |
predict_outcome |
ML-based outcome prediction | predictions, alerts, risk_levels, factors |
| Tool | Purpose | Returns |
|---|---|---|
ps_precheck |
Check if operation would be allowed | wouldAllow, wouldHold, reason |
ps_list_holds |
List pending holds | Array of hold requests |
ps_approve_hold |
Approve a held operation | Execution result with auditId |
ps_reject_hold |
Reject a held operation | Success boolean |
ps_agent_status |
Get agent circuit breaker state | isAllowed, state, haltReason |
ps_halt_agent |
Halt an agent immediately | halted, agent_id |
ps_resume_agent |
Resume a halted agent | resumed, agent_id |
ps_stats |
Get governance statistics | holds, haltedAgents, auditEntries |
ps_frame_docs |
Get PromptSpeak documentation | Frame format reference |
MIT License - See LICENSE
This is enterprise-friendly open source:
- Use commercially without restriction
- Modify and distribute freely
- No copyleft obligations
- No warranty (provided as-is)
- Documentation: docs/
- Issues: GitHub Issues
- Security: See SECURITY.md for vulnerability reporting
This tool is provided as-is for process analysis purposes. It does not modify SAP data. Users are responsible for:
- Ensuring compliance with organizational data access policies
- Validating findings before making business decisions
- Proper configuration of SAP authorizations
Correlation does not imply causation. All pattern findings should be verified against actual business processes.