Nexus is a high-performance property graph database designed for read-heavy workloads with native vector search integration. It implements a Neo4j-inspired storage architecture with Cypher subset query language, optimized for KNN-seeded graph traversal.
- Property Graph Model: Nodes with labels, relationships with types, both carrying properties
- Read Optimized: Record stores with linked lists for O(1) traversal without index lookups
- Vector Native: First-class KNN support via HNSW indexes per label
- Simple Transactions: MVCC via epochs, single-writer for predictability
- Append-Only: Immutable record architecture with periodic compaction
┌─────────────────────────────────────────────────────────────────┐
│ REST/HTTP API │
│ (Cypher, KNN Traverse, Bulk Ingest) │
└────────────────────────┬────────────────────────────────────────┘
│
┌────────────────────────┴────────────────────────────────────────┐
│ Cypher Executor │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Parser │→ │ Planner │→ │Operators │→ │ Results │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ Pattern Match • Expand • Filter • Project │
└────────────────────────┬────────────────────────────────────────┘
│
┌────────────────────────┴────────────────────────────────────────┐
│ Transaction Layer │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ MVCC (Epochs) │ │ Locking (Queues) │ │
│ └──────────────────┘ └──────────────────┘ │
└────────────────────────┬────────────────────────────────────────┘
│
┌────────────────────────┴────────────────────────────────────────┐
│ Index Layer │
│ ┌─────────┐ ┌─────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Label │ │ B-tree │ │Tantivy │ │ HNSW │ │
│ │ Bitmap │ │(Props) │ │(FullText)│ │ (KNN) │ │
│ └─────────┘ └─────────┘ └──────────┘ └──────────┘ │
└────────────────────────┬────────────────────────────────────────┘
│
┌────────────────────────┴────────────────────────────────────────┐
│ Storage Layer │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Catalog │ │ WAL │ │ Record │ │ Page │ │
│ │ (LMDB) │ │ (AppendJ)│ │ Stores │ │ Cache │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ Labels/Keys Durability Nodes/Rels Memory Mgmt │
└─────────────────────────────────────────────────────────────────┘
Fixed-size record architecture for predictable performance:
NodeRecord (32 bytes):
┌──────────────┬──────────────┬──────────────┬──────────────┐
│ label_bits │ first_rel_ptr│ prop_ptr │ flags │
│ (8 bytes) │ (8 bytes) │ (8 bytes) │ (8 bytes) │
└──────────────┴──────────────┴──────────────┴──────────────┘
- label_bits: Bitmap of label IDs (supports 64 labels per node)
- first_rel_ptr: Head of doubly-linked relationship list
- prop_ptr: Pointer to property chain
- flags: Deleted, locked, version bits
RelationshipRecord (48 bytes):
┌─────────┬─────────┬─────────┬────────────┬────────────┬──────────┬────────┐
│ src_id │ dst_id │ type_id │next_src_ptr│next_dst_ptr│ prop_ptr │ flags │
│(8 bytes)│(8 bytes)│(4 bytes)│ (8 bytes) │ (8 bytes) │(8 bytes) │(4 bytes)│
└─────────┴─────────┴─────────┴────────────┴────────────┴──────────┴────────┘
- Doubly-linked lists: next_src_ptr (outgoing from src), next_dst_ptr (incoming to dst)
- Enables O(1) traversal without index lookups
PropertyRecord (Variable):
┌──────────┬──────────┬──────────┬──────────┐
│ key_id │ type │ value │ next_ptr │
│(4 bytes) │(1 byte) │ (varies) │(8 bytes) │
└──────────┴──────────┴──────────┴──────────┘
- Chain of properties per entity
- Small values inline, large values in strings.store
- Types: null, bool, int64, float64, string_ref, bytes_ref
String/Blob Dictionary:
┌──────────────┬──────────────┬──────────────┬──────────┐
│ varint_len │ data │ crc32 │ padding │
└──────────────┴──────────────┴──────────────┴──────────┘
- Deduplicated string/blob storage
- CRC32 for corruption detection
- Reference counted for garbage collection
Bidirectional mappings stored in embedded key-value store:
Tables:
- label_name → label_id
- label_id → label_name
- type_name → type_id
- type_id → type_name
- key_name → key_id
- key_id → key_name
Metadata:
- Statistics (node count per label, relationship count per type)
- Schema constraints (UNIQUE, NOT NULL)
- Index definitions
Nexus maintains two parallel id namespaces for each node:
Internal ID (u64): Physical record offset in the node store, immutable and used for all graph traversal. Unchanged from the original architecture.
External ID (optional ExternalId enum): Caller-supplied, unique per database. Supports four variants:
Hash: Blake3, SHA-256, or SHA-512 (32 or 64 bytes)Uuid: 16-byte RFC 4122 formString: UTF-8 up to 256 bytesBytes: Opaque binary up to 64 bytes
Storage: Two LMDB sub-databases in the catalog:
external_ids: Maps encodedExternalId→u64internal id (forward index)internal_ids: Mapsu64→ encodedExternalId(reverse index)
Semantics: External IDs are optional per node; nodes without an external id behave identically to pre-phase-9 behavior. Both indexes are updated atomically during node create/delete and rebuilt on WAL replay. Conflict resolution (ERROR, MATCH, REPLACE) is enforced at the storage layer and surfaces through Cypher semantics.
4-8KB pages with eviction policies:
Page Structure:
┌──────────┬──────────┬──────────┬──────────┐
│ page_id │ checksum │ data │ flags │
│(8 bytes) │(4 bytes) │(varies) │(4 bytes) │
└──────────┴──────────┴──────────┴──────────┘
Eviction Policies:
- Clock: Simple second-chance algorithm
- 2Q: Hot/cold queue split
- TinyLFU: Frequency + recency estimation
Pin/Unpin Semantics:
- Pinned pages cannot be evicted (active transactions)
- Dirty pages flushed on checkpoint
- xxHash3 checksums for validation
Append-only transaction log:
WAL Entry Format:
┌──────────┬──────────┬──────────┬──────────┬──────────┐
│ epoch │ tx_id │ type │ payload │ crc32 │
│(8 bytes) │(8 bytes) │(1 byte) │(varies) │(4 bytes) │
└──────────┴──────────┴──────────┴──────────┴──────────┘
Entry Types:
- BEGIN_TX, COMMIT_TX, ABORT_TX
- CREATE_NODE, DELETE_NODE
- CREATE_REL, DELETE_REL
- SET_PROPERTY, DELETE_PROPERTY
- CHECKPOINT
Checkpointing:
1. Freeze current epoch
2. Flush all dirty pages
3. Write CHECKPOINT entry
4. Truncate old WAL segments
5. Run compactor on record stores
Epoch-Based Snapshots:
- Global epoch counter (atomic u64)
- Readers pin an epoch (snapshot isolation)
- Writers increment epoch on commit
- Garbage collection removes old versions
Visibility Rules:
- Record visible if: created_epoch ≤ tx_epoch < deleted_epoch
- Append-only: new versions append, old kept until GC
Single-Writer per Partition (MVP):
- Hash(node_id) → partition_id
- Queue per partition (FIFO)
- Group commit: batch small writes
Future (V2):
- Intent locks (read/write)
- Deadlock detection via wait-for graph
RoaringBitmap per Label:
label_id → RoaringBitmap<node_id>
- Compressed bitmap (runs, arrays, bitmaps)
- Fast AND/OR/NOT for label filtering
- Cardinality estimates for planner
Composite Key: (label_id, key_id, value) → [node_ids]
- Range queries: WHERE n.age > 18
- Equality: WHERE n.email = "user@example.com"
- Statistics: NDV (number distinct values), histograms
Tantivy Index per (label, key):
- Inverted index with positions
- BM25 scoring
- Fuzzy search, phrase queries
- Procedure: CALL text.search('Person', 'bio', 'engineer')
HNSW Index per Label:
File: hnsw_<label_id>.bin
Mapping: node_id → embedding_idx
- HNSW (Hierarchical Navigable Small World)
- Cosine similarity or Euclidean distance
- Configurable ef_construction, M parameters
- Procedure: CALL vector.knn('Person', $embedding, 10)
Supported syntax (20% covering 80% use cases):
-- Pattern matching
MATCH (n:Person)-[r:KNOWS]->(m:Person)
WHERE n.age > 25 AND m.city = 'NYC'
RETURN n.name, m.name, r.since
ORDER BY r.since DESC
LIMIT 100
-- KNN-seeded traversal
CALL vector.knn('Person', $embedding, 10) YIELD node AS n
MATCH (n)-[:WORKS_AT]->(c:Company)
RETURN n.name, c.name
-- Aggregations
MATCH (p:Person)-[:LIKES]->(product:Product)
RETURN product.category, COUNT(*) AS likes
ORDER BY likes DESCOperator Pipeline:
1. NodeByLabel(label_id)
→ Scan label bitmap
→ Output: stream of node_ids
2. Filter(predicate)
→ Apply property filters
→ Vectorized evaluation (SIMD where possible)
3. Expand(type_id, direction)
→ Follow linked lists (next_src_ptr/next_dst_ptr)
→ Direction: OUT, IN, BOTH
4. Project(expressions)
→ Evaluate return expressions
→ Property access, functions
5. OrderBy(keys) + Limit(n)
→ Top-K heap for efficiency
→ Partial sort when LIMIT present
6. Aggregate(group_keys, agg_funcs)
→ Hash aggregation
→ COUNT, SUM, AVG, MIN, MAX, COLLECT
Heuristic Cost-Based:
Statistics Used:
- |V_label|: Node count per label
- |E_type|: Relationship count per type
- avg_degree(label, type): Average out-degree
- NDV(label, key): Number distinct values
Optimization Rules:
1. Push filters down (early elimination)
2. Reorder patterns by selectivity (smallest first)
3. Index selection (label vs property vs KNN)
4. Limit pushdown for top-K queries
Example:
MATCH (a:Rare)-[:TYPE1]->(b:Common)-[:TYPE2]->(c)
→ Start with Rare (smaller cardinality)
→ Expand to Common, filter early
Primary protocol following Vectorizer implementation:
Default Transport: StreamableHTTP
- Chunked transfer encoding for large result sets
- Server-Sent Events (SSE) for streaming
- HTTP/2 multiplexing and flow control
- Efficient for both small and large responses
Endpoints:
POST /cypher
{
"query": "MATCH (n:Person) RETURN n LIMIT 10",
"params": {"name": "Alice"}
}
POST /knn_traverse
{
"label": "Person",
"vector": [0.1, 0.2, ...],
"k": 10,
"expand": ["(n)-[:KNOWS]->(m)"],
"where": "m.age > 25",
"limit": 100
}
POST /ingest
{
"nodes": [
{"labels": ["Person"], "properties": {"name": "Alice", "age": 30}}
],
"relationships": [
{"src": 1, "dst": 2, "type": "KNOWS", "properties": {"since": 2020}}
]
}
Streaming Response (SSE):
event: row
data: {"node": {...}, "score": 0.95}
event: row
data: {"node": {...}, "score": 0.92}
event: complete
data: {"total": 100, "execution_time_ms": 15}
Model Context Protocol for AI integrations (Vectorizer-style):
MCP Tools (19+ focused tools):
- nexus/query - Execute Cypher queries
- nexus/knn_search - KNN vector search
- nexus/pattern_match - Graph pattern matching
- nexus/ingest_node - Create single node
- nexus/ingest_relationship - Create relationship
- nexus/get_schema - Get graph schema
- nexus/get_stats - Database statistics
- ... (expandable)
Benefits:
- Reduced entropy (no enum parameters)
- Tool-specific parameters only
- Better model tool calling accuracy
Universal Model Interoperability Protocol following Vectorizer v0.2.1:
UMICP Features:
- Native JSON types support
- Tool Discovery endpoint: GET /umicp/discover
- Exposes all MCP tools with full schemas
- Cross-service graph queries
- Event-driven graph updates
Discovery Response:
{
"service": "nexus-graph",
"version": "0.1.0",
"protocol": "UMICP/0.2.1",
"tools": [
{
"name": "graph.query",
"description": "Execute Cypher query",
"inputSchema": {...}
}
]
}
Direct integration with Vectorizer for hybrid search:
Integration Modes:
1. Embedding Generation:
POST /vectorizer/embed → vector
Store vector in Nexus KNN index
2. Hybrid Search:
Nexus KNN search → node_ids
Vectorizer semantic search → enhanced relevance
Combined ranking via Reciprocal Rank Fusion (RRF)
3. Bidirectional Sync:
Vectorizer change → Nexus graph update (relationship creation)
Nexus mutation → Vectorizer re-index (embedding update)
Example Flow:
┌────────────┐ embed() ┌────────────┐
│ Vectorizer │◄─────────────────│ Nexus │
│ │────────────────► │ │
└────────────┘ vector result └────────────┘
│ │
│ semantic search │ graph traversal
▼ ▼
Relevance scores Relationship context
│ │
└────────────┬───────────────────┘
│
Combined Results
(RRF ranking algorithm)
- Node lookup by ID: O(1) - direct offset
- Expand neighbors: O(degree) - linked list traversal
- Pattern match: O(|V_start| × selectivity × expansions)
- KNN search: O(log N) - HNSW logarithmic
- Full scan: O(|V|) - bitmap scan
Target Throughput (Single Node):
- Point reads: 100K+ ops/sec
- KNN queries: 10K+ ops/sec
- Pattern traversal: 1K-10K ops/sec (depth dependent)
- Insert node: O(1) - append to nodes.store
- Insert relationship: O(1) - append + update pointers
- Update property: O(props_per_entity) - traverse chain
- Bulk ingest: Batch via WAL, bypass cache
Target Throughput (Single Writer):
- Inserts: 10K-50K ops/sec
- Updates: 5K-20K ops/sec
- Bulk load: 100K+ nodes/sec (direct store generation)
Per Node: ~32 bytes (record) + properties + index entries
Per Relationship: ~48 bytes + properties
Property: ~16 bytes + value size
Index Overhead:
- Label bitmap: ~0.1-1 byte per node (compressed)
- HNSW vector: ~100-200 bytes per vector (M=16)
Example (1M nodes, 2M relationships, avg 5 props):
- Nodes: 32MB
- Relationships: 96MB
- Properties: ~160MB (assuming 20 bytes avg per prop)
- Indexes: ~50MB (bitmaps + HNSW)
Total: ~340MB (reasonable)
Vectorizer-style authentication with flexible configuration:
Configuration:
- Default: Authentication DISABLED (localhost development)
- Production: REQUIRED when binding to 0.0.0.0 (public interface)
- API Keys: 32-character random strings
- Storage: Hashed with Argon2 in catalog (LMDB)
AuthConfig:
{
"enabled": false, // Default: disabled for localhost
"require_for_public_bind": true, // Force enable for 0.0.0.0
"api_key_length": 32,
"rate_limit_per_minute": 1000,
"rate_limit_per_hour": 10000,
"jwt_secret": "change-in-production",
"jwt_expiration": 3600
}
API Key Format:
{
"id": "key_abc123",
"name": "Production App",
"key_hash": "argon2id$...",
"user_id": "admin",
"permissions": ["read", "write", "admin"],
"created_at": 1704067200,
"expires_at": null, // Never expires
"active": true
}
Usage:
Authorization: Bearer nexus_sk_abc123...xyz
Rate Limiting:
- 1000 requests/minute per API key
- 10000 requests/hour per API key
- 429 Too Many Requests on exceed
- X-RateLimit-* headers in response
Permissions:
- READ: Query execution (MATCH, CALL vector.knn)
- WRITE: Data mutations (CREATE, SET, DELETE)
- ADMIN: Index management, constraints, schema changes
- SUPER: Replication, cluster management
Role-Based Access Control (RBAC):
- User → Roles → Permissions
- API Key → Permissions (direct)
- JWT tokens for session management
- Audit logging for all write operations
Transport Security:
- TLS 1.3 for production (via Axum/Tower)
- mTLS for service-to-service (V2)
- Certificate-based authentication (optional)
Inspired by Redis/Vectorizer replication with graph-specific optimizations:
Topology:
┌────────────────┐ ┌────────────────┐
│ Master Node │────────>│ Replica Node 1 │
│ (Read+Write) │ │ (Read-Only) │
└────────┬───────┘ └────────────────┘
│
└───────────────> ┌────────────────┐
│ Replica Node 2 │
│ (Read-Only) │
└────────────────┘
Replication Flow:
1. Master receives write
2. Append to WAL
3. Apply to local storage
4. Stream WAL entry to replicas
5. Replicas apply + ACK
6. Master commits transaction
1. Full Sync (Initial):
- Master creates snapshot
- Transfer snapshot.tar.zst to replica
- CRC32 checksum verification
- Replica loads snapshot
- Switch to incremental sync
2. Incremental Sync:
- Stream WAL entries to replicas
- Circular replication log (1M operations)
- Auto-reconnect with exponential backoff
- Lag monitoring and alerts
3. Async Replication:
- Master doesn't wait for replica ACK (default)
- Higher throughput, eventual consistency
- Configurable ACK timeout
4. Sync Replication (optional):
- Wait for N replicas to ACK
- Lower throughput, stronger durability
- Configurable quorum
WAL Streaming:
┌──────────────────────────────────────┐
│ Master WAL Entry │
│ {epoch, tx_id, type, payload, crc} │
└──────────────┬───────────────────────┘
│ TCP stream
┌──────────────▼───────────────────────┐
│ Replica Receiver │
│ - Validate CRC │
│ - Apply to local WAL │
│ - Update storage │
│ - Send ACK │
└──────────────────────────────────────┘
REST API Endpoints:
- GET /replication/status
- POST /replication/promote (replica → master)
- POST /replication/pause
- POST /replication/resume
- GET /replication/lag (replication lag in seconds)
Automatic Failover (V1):
1. Monitor master health (heartbeat every 5s)
2. Detect master failure (3 missed heartbeats)
3. Elect new master (manual or via consensus)
4. Promote replica: POST /replication/promote
5. Redirect clients to new master
6. Old master rejoins as replica (when recovered)
Manual Failover:
curl -X POST http://replica:15474/replication/promote \
-H "Authorization: Bearer admin_key"
Hash-based partitioning for horizontal scalability:
Shard Assignment:
shard_id = hash(node_id) % num_shards
Example (4 shards):
- Shard 0: node_ids where hash % 4 == 0
- Shard 1: node_ids where hash % 4 == 1
- Shard 2: node_ids where hash % 4 == 2
- Shard 3: node_ids where hash % 4 == 3
Relationship Placement:
- Relationships reside with source node
- Cross-shard edges stored as remote pointers
- Minimize cross-shard hops in queries
Topology:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Shard 0 │ │ Shard 1 │ │ Shard 2 │
│ (Master) │ │ (Master) │ │ (Master) │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
┌───┴────┐ ┌───┴────┐ ┌───┴────┐
│Replica1│ │Replica1│ │Replica1│
└────────┘ └────────┘ └────────┘
Query Coordinator:
1. Parse Cypher query
2. Identify required shards (WHERE clause analysis)
3. Decompose plan into shard-local subplans
4. Execute in parallel (scatter)
5. Merge results (gather)
6. Apply global ORDER BY + LIMIT
Optimizations:
- Pushdown filters to shards
- Pushdown LIMIT (top-K per shard)
- Minimize data transfer
- Cache remote node metadata
Cross-Shard Traversal:
MATCH (n:Person)-[:KNOWS]->(m:Person)
→ If n and m on different shards:
1. Execute on n's shard
2. Fetch m's metadata remotely
3. Continue traversal
4. Cache cross-shard edges
Raft Consensus (via openraft):
- One Raft group per shard
- Leader handles all writes
- Followers replicate via Raft log
- Strong consistency per shard
- Eventual consistency cross-shard
Shard Metadata (in catalog):
{
"shard_id": 0,
"leader": "node1:15474",
"followers": ["node2:15474", "node3:15474"],
"status": "healthy",
"node_count": 250000,
"rel_count": 500000
}
Modern desktop application for visual graph management and exploration:
Technology Stack:
- Electron (cross-platform desktop)
- Vue 3 + Composition API
- TailwindCSS (styling)
- D3.js / Cytoscape.js (graph visualization)
- Chart.js (metrics)
Features:
🎨 Beautiful interface with dark/light themes
📊 Real-time graph visualization (force-directed layout)
🔍 Visual Cypher query builder
⚡ Live query execution with syntax highlighting
📈 Database metrics and monitoring
💾 Backup/restore operations
🔧 Configuration editor
📁 Schema browser (labels, types, properties)
🎯 KNN vector search interface
1. Graph Visualization:
- Force-directed graph layout
- Node filtering by label
- Relationship filtering by type
- Property inspector
- Zoom, pan, node selection
2. Query Interface:
- Cypher editor with syntax highlighting
- Query history
- Result table/graph view toggle
- Export results (JSON, CSV)
- Saved queries
3. Schema Management:
- View all labels and types
- Property statistics
- Index management
- Constraint creation/deletion
4. KNN Vector Search:
- Text input → generate embedding
- Visual similarity search results
- Hybrid query builder (KNN + patterns)
- Vector index management
5. Database Operations:
- Backup/restore
- Import/export data
- Replication monitoring
- Performance metrics
- Log viewer
6. Monitoring Dashboard:
- Query throughput
- Page cache hit rate
- WAL size
- Replication lag
- Index sizes
# Development
cd gui
npm install
npm run dev
# Build installers
npm run build:win # Windows MSI
npm run build:mac # macOS DMG
npm run build:linux # Linux AppImage/DEB
# Run packaged app
./dist/Nexus-Setup-0.1.0.exe # Windows
./dist/Nexus-0.1.0.dmg # macOS
./dist/nexus_0.1.0_amd64.AppImage # Linux┌─────────────────────────────────────────┐
│ Electron Main Process │
│ - Window management │
│ - Auto-updater │
│ - File system access │
└─────────────┬───────────────────────────┘
│ IPC
┌─────────────▼───────────────────────────┐
│ Electron Renderer Process │
│ ┌────────────────────────────────────┐ │
│ │ Vue 3 Application │ │
│ │ - Graph Visualization (Cytoscape) │ │
│ │ - Query Editor (CodeMirror) │ │
│ │ - Dashboard (Chart.js) │ │
│ └──────────┬─────────────────────────┘ │
└─────────────┼───────────────────────────┘
│ HTTP/WebSocket
┌─────────────▼───────────────────────────┐
│ Nexus Server (Axum) │
│ http://localhost:15474 │
└──────────────────────────────────────────┘
- Batch optimizations (vectorization, SIMD)
- Advanced indexes (B-tree, full-text)
- Query cache (prepared statements)
- Read replicas (WAL streaming)
Sharding Strategy:
- Hash(node_id) → shard_id
- Relationships reside with source node
- Cross-shard edges via remote pointers
Replication:
- Raft consensus per shard (openraft)
- Leader handles writes, followers serve reads
- Causal consistency via vector clocks
Distributed Queries:
- Coordinator decomposes plan
- Pushdown filters/limits to shards
- Scatter/gather with streaming results
- Minimize cross-shard hops
| Feature | Neo4j | Nexus |
|---|---|---|
| Storage | Record stores + page cache | Same approach |
| Query Language | Full Cypher | Cypher subset (20%) |
| Transactions | ACID, full MVCC | Simplified MVCC (epochs) |
| Indexes | B-tree, full-text, native | Same + native KNN (HNSW) |
| Clustering | Causal cluster | Future (openraft) |
| Vector Search | Plugin (GDS) | Native first-class |
| Target Workload | General graph | Read-heavy + RAG |
- Temporal Graph: Valid-time versioning for time-travel queries
- Geospatial: PostGIS-like spatial indexes and functions
- Graph Algorithms: Native BFS, DFS, PageRank, community detection
- Streaming Ingestion: Kafka/Pulsar integration
- Advanced Analytics: Integration with Apache Arrow for OLAP
The Graph Correlation Analysis module provides automatic generation of code relationship graphs from source code, enabling visualization of code flow, dependencies, and processing patterns. It integrates with Vectorizer for semantic code analysis and provides multiple protocol interfaces (REST, MCP, UMICP).
┌─────────────────────────────────────────────────────────────────────┐
│ API Layer (REST/MCP/UMICP) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ REST API │ │ MCP Tools │ │ UMICP Handler│ │
│ │ /api/v1/ │ │ graph_* │ │ graph.* │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
└─────────┼──────────────────┼──────────────────┼────────────────────┘
│ │ │
┌─────────┴──────────────────┴──────────────────┴────────────────────┐
│ Graph Correlation Manager │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ GraphCorrelationManager │ │
│ │ - Builder Registry (Call, Dependency, DataFlow, Component) │ │
│ │ - Graph Storage & Retrieval │ │
│ │ - Request Routing │ │
│ └──────────────────────────────────────────────────────────────┘ │
└─────────┬──────────────────────────────────────────────────────────┘
│
┌─────────┴──────────────────────────────────────────────────────────┐
│ Graph Builder Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Call Graph │ │ Dependency │ │ Data Flow │ │
│ │ Builder │ │ Graph Builder│ │ Graph Builder│ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ ┌──────┴──────────────────┴──────────────────┴──────┐ │
│ │ Component Graph Builder │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────┬──────────────────────────────────────────────────────────┘
│
┌─────────┴──────────────────────────────────────────────────────────┐
│ Analysis Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Pattern │ │ Statistics │ │ Impact │ │
│ │ Recognition │ │ Calculator │ │ Analysis │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ ┌──────┴──────────────────┴──────────────────┴──────┐ │
│ │ Pattern Detectors: │ │
│ │ - PipelinePatternDetector │ │
│ │ - EventDrivenPatternDetector │ │
│ │ - ArchitecturalPatternDetector │ │
│ │ - DesignPatternDetector (Observer, Factory, etc.) │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────┬──────────────────────────────────────────────────────────┘
│
┌─────────┴──────────────────────────────────────────────────────────┐
│ Visualization Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ SVG Renderer │ │ Layout │ │ Export │ │
│ │ │ │ Algorithms │ │ Formats │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ ┌──────┴──────────────────┴──────────────────┴──────┐ │
│ │ Layouts: Force-directed, Hierarchical, Flow-based │ │
│ │ Formats: JSON, GraphML, GEXF, DOT, SVG, PNG, PDF │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────┬──────────────────────────────────────────────────────────┘
│
┌─────────┴──────────────────────────────────────────────────────────┐
│ Data Source Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Vectorizer │ │ Source Code │ │ AST Parser │ │
│ │ Integration │ │ Files │ │ │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ ┌──────┴──────────────────┴──────────────────┴──────┐ │
│ │ GraphSourceData: files, functions, imports │ │
│ └──────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ GraphBuilder Trait │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ build(source_data) -> CorrelationGraph │ │
│ │ graph_type() -> GraphType │ │
│ │ name() -> &str │ │
│ │ capabilities() -> GraphBuilderCapabilities │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
▲
│ implements
┌───────────────┼───────────────┬───────────────┐
│ │ │ │
┌───┴────┐ ┌──────┴──────┐ ┌────┴──────┐ ┌────┴──────┐
│ Call │ │ Dependency │ │ DataFlow │ │ Component │
│ Graph │ │ Graph │ │ Graph │ │ Graph │
│ Builder│ │ Builder │ │ Builder │ │ Builder │
└────────┘ └─────────────┘ └───────────┘ └───────────┘
┌─────────────────────────────────────────────────────────────┐
│ PatternDetector Trait │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ detect(graph) -> PatternDetectionResult │ │
│ │ name() -> &str │ │
│ │ supported_patterns() -> Vec<PatternType> │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
▲
│ implements
┌───────────────┼───────────────┬───────────────┐
│ │ │ │
┌───┴────┐ ┌──────┴──────┐ ┌────┴──────┐ ┌────┴──────┐
│Pipeline│ │Event-Driven │ │Architect. │ │Design │
│Pattern │ │Pattern │ │Pattern │ │Pattern │
│Detector│ │Detector │ │Detector │ │Detector │
└────────┘ └─────────────┘ └───────────┘ └───────────┘
Source Code Files
│
▼
┌─────────────────┐
│ GraphSourceData │
│ - files │
│ - functions │
│ - imports │
└────────┬────────┘
│
▼
┌─────────────────────────────────────┐
│ Graph Builder (by type) │
│ ┌───────────────────────────────┐ │
│ │ 1. Extract relationships │ │
│ │ 2. Create nodes & edges │ │
│ │ 3. Add metadata │ │
│ │ 4. Calculate layout │ │
│ └───────────────────────────────┘ │
└────────┬────────────────────────────┘
│
▼
┌─────────────────┐
│ CorrelationGraph│
│ - nodes │
│ - edges │
│ - metadata │
└────────┬────────┘
│
├─────────────────┬─────────────────┬─────────────────┐
▼ ▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Pattern │ │ Statistics │ │ Visualization│ │ Export │
│ Detection │ │ Calculation │ │ Rendering │ │ Formats │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
nexus-core/src/graph/correlation/
├── mod.rs # Core types, GraphBuilder trait, Manager
├── call_graph.rs # Call graph generation
├── dependency.rs # Dependency graph generation
├── data_flow.rs # Data flow analysis & graph generation
├── component.rs # Component (OOP) analysis & graph generation
├── pattern_recognition.rs # Pattern detection (Pipeline, Event, Arch, Design)
├── visualization.rs # SVG rendering, layouts, styling
├── graph_export.rs # Export to JSON, GraphML, GEXF, DOT
├── graph_statistics.rs # Statistics calculation
├── graph_diff.rs # Graph comparison & diff
├── performance.rs # Performance optimization
├── dependency_filter.rs # Dependency filtering utilities
├── impact_analysis.rs # Impact analysis for changes
├── vectorizer_cache.rs # Vectorizer query caching
└── version_constraints.rs # Version constraint analysis
┌─────────────────────────────────────────────────────────────┐
│ External Systems │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Vectorizer │ │ LLM Agents │ │ Visualization│ │
│ │ MCP Server │ │ (via MCP) │ │ Tools │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
└─────────┼──────────────────┼──────────────────┼────────────┘
│ │ │
│ MCP Tools │ REST API │ Export Files
│ │ │
┌─────────┴──────────────────┴──────────────────┴────────────┐
│ Graph Correlation Analysis Module │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Protocol Handlers: │ │
│ │ - MCP: graph_correlation_* tools │ │
│ │ - REST: /api/v1/graphs/* endpoints │ │
│ │ - UMICP: graph.* methods │ │
│ └──────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────┘
CorrelationGraph
│
▼
┌─────────────────────────────────────┐
│ Pattern Detection Pipeline │
│ ┌───────────────────────────────┐ │
│ │ 1. Pipeline Pattern Detector │ │
│ │ 2. Event-Driven Detector │ │
│ │ 3. Architectural Detector │ │
│ │ 4. Design Pattern Detector │ │
│ └───────────────────────────────┘ │
└────────┬─────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Pattern Detection Result │
│ - DetectedPattern[] │
│ - PatternStatistics │
│ - Quality Score │
└────────┬─────────────────────────────┘
│
├─────────────────┬─────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Quality │ │ Visualization│ │ Recommendation│
│ Metrics │ │ Overlays │ │ Engine │
└──────────────┘ └──────────────┘ └──────────────┘
- Trait-Based Architecture: Uses Rust traits (
GraphBuilder,PatternDetector) for extensibility - Type-Safe Graph Types: Enum-based graph types prevent invalid operations
- Modular Analysis: Separate modules for each analysis type (call, dependency, data flow, component)
- Protocol Agnostic: Core logic independent of API protocol (REST/MCP/UMICP)
- Caching Support: Vectorizer query caching for performance
- Export Flexibility: Multiple export formats for different use cases
- Pattern Extensibility: Easy to add new pattern detectors via trait implementation
Every piece of shared server state — the engine, the shared executor,
the multi-database manager, RBAC, the auth/JWT/audit chain, and the
root-user config — lives on the NexusServer struct in
nexus-server/src/lib.rs.
NexusServer::new builds the struct once inside main.rs::async_main,
wraps it in an Arc, and hands the same Arc<NexusServer> to Axum via
.with_state(...).
HTTP handlers reach the state through Axum's State extractor:
pub async fn create_node(
State(server): State<Arc<NexusServer>>,
Json(request): Json<CreateNodeRequest>,
) -> Json<CreateNodeResponse> {
let mut engine = server.engine.write().await;
// ...
}This replaces an older pattern that kept every subsystem in its own
OnceLock<Arc<_>> per API file (api::cypher::EXECUTOR,
api::data::ENGINE, …), which broke test isolation — two tests that
each called init_engine silently collided on the same process-wide
singleton. The migration was sliced into phase2a–phase2e and is
now complete:
- phase2a — cypher + data
- phase2b — schema + stats + knn
- phase2c — performance monitoring: query statistics, plan cache, DBMS procedures, MCP tool statistics, MCP tool cache
- phase2d — graph correlation + comparison + UMICP: the
GraphCorrelationManager, the two comparisonGraphinstances, and theGraphUmicpHandler(JSON-RPC dispatcher forgraph.*methods). Default comparison graphs are built insideNexusServer::newvia the privatebuild_default_comparison_graphshelper. - phase2e — observability: the per-server
start_time: Instantand theArc<PrometheusMetrics>counter pack.api::health::init+api::prometheus::initare gone; handlers read uptime offserver.start_timeand counters offserver.metrics. The cypher execute path threads the same&NexusServerintorecord_prometheus_metrics.
Rule for future subsystems: add an Arc<_> field to NexusServer,
build it inside NexusServer::new with whatever defaults apply, and
read it from handlers via server.<field>. Do not reach for a
OnceLock. The anti-regression guard at
nexus-server/tests/no_oncelock_globals.rs
greps nexus-server/src/api/ on every cargo test run and fails if a
static <NAME>: OnceLock<…> declaration is reintroduced. Genuine
process-wide singletons (e.g. RESP3 / RPC listener counters shared
across accept threads) can opt into the allow-list at the top of that
file with an explanatory comment.
- Neo4j Internals: https://neo4j.com/docs/operations-manual/current/
- HNSW Algorithm: https://arxiv.org/abs/1603.09320
- MVCC in Databases: Postgres, CockroachDB documentation
- Roaring Bitmaps: https://roaringbitmap.org/
- Tantivy: https://github.com/quickwit-oss/tantivy