This demonstration showcases the power of ArangoDB in harmonizing structured integrated circuit (IC) hardware design (RTL/Verilog), temporal version history (Git), and unstructured technical specifications (GraphRAG) into a single, queryable knowledge graph. The current implementation uses the OR1200 RISC processor as sample data.
[Schema diagram — generated by running the ETL pipeline and opening the graph in ArangoDB Visualizer]
This project is a modern implementation of the principles established in the Design Knowledge Management System (DKMS) research program co-authored for the Air Force Materiel Command (1989-1992). It realizes the vision of a "Semantic Bridge" between design intent and implementation that was pioneered in these foundational reports.
For details on the theoretical foundations, see docs/research/DKMS_Foundations.md.
The knowledge graph harmonizes three disparate data silos: RTL code structure, Git history, and technical specifications.
[Schema diagram — generated by running docs/project/SCHEMA.md Mermaid diagram or viewing in ArangoDB Visualizer]
The core value of this project is the Semantic Bridge, which connects unstructured documentation (GraphRAG) to structured hardware implementation (RTL). Below is a visualization from the ArangoDB Graph Visualizer showing a Documentation Entity (center) resolved to multiple RTL Modules (the "Flip-Flop" logic block hierarchy).
[Semantic Bridge visualization — generated by opening IC_Temporal_Knowledge_Graph in ArangoDB Visualizer and running the "Show Entity Resolutions" canvas action]
- Semantic Bridge: Automatically links Verilog modules, ports, and signals to entities referenced in corresponding documentation sections using lexical analysis.
- High-Performance Consolidation: Uses set-based AQL operations for near-instant (sub-second) entity resolution across thousands of documentation nodes.
- Temporal Insight: Ingests full Git history to allow "Time-Travel" queries across the evolution of the hardware design.
- Author Expertise Mapping: First-class contributor vertices enable knowledge transfer, collaboration analysis, and bus factor assessment.
- Granular RTL Graph: Decomposes monolithic Verilog files into a rich graph of
Module,Port,Signal, andLogicChunknodes. - GraphRAG Augmented: Integrated with high-quality entity and community extraction from the Arango AI team's GraphRAG pipeline.
Note: The GraphRAG document import pipeline requires ArangoDB AMP (cloud) with the GenAI services feature enabled. It is an optional, advanced component — the core RTL + Git + semantic bridging pipeline works independently without it.
The GraphRAG collections (OR1200_Entities, OR1200_Golden_Entities, OR1200_Relations, etc.) were populated during development and are present in the demo database. The Python-based re-import pipeline (src/etl_graphrag.py) was implemented but encountered API integration issues during deadline-constrained development and has not been fully validated end-to-end.
What works without GraphRAG:
- Full RTL parsing and graph construction
- Git history ingestion and author expertise mapping
- Semantic bridging between RTL elements and documentation entities (reads existing
OR1200_Golden_Entities) - All AQL queries and visualizations in the demo
What requires ArangoDB AMP + GraphRAG:
- Re-importing or refreshing document entities from PDFs (
src/etl_graphrag.py) - Running the Importer/Retriever services via the GenAI API
See GRAPHRAG_STATUS.md for a detailed description of the integration, known issues, and instructions for attempting a fresh import.
src/: Core ETL and bridging scripts.docs/: Comprehensive documentation (see docs/README.md)project/: Core project docs (Walkthrough, Schema, PRD)reference/: Technical references
tests/: Unit tests for parsing and normalization logic.or1200/: The source RTL repository (submodule).validation/: Ground truth datasets and validation scripts.
- Python 3.10+
- ArangoDB instance (local Docker or remote)
- Cluster users: if you see collection shards spread across many DB-Servers (one shard per collection, different leaders), graph-heavy queries pay extra network cost. See docs/arangodb-cluster-sharding.md for OneShard vs SmartGraph,
scripts/setup/create_oneshard_database.py(new DB), andscripts/setup/migrate_to_oneshard.sh(dump → drop → OneShard → restore).
Copy env.template to .env in the root directory and configure your settings:
cp env.template .envThen edit .env with your specific values:
# Choose LOCAL or REMOTE mode
ARANGO_MODE=LOCAL
# For REMOTE mode, configure these:
ARANGO_ENDPOINT=https://your-instance.arango.ai
ARANGO_USERNAME=root
ARANGO_PASSWORD=your_password
ARANGO_DATABASE=ic-knowledge-graph-temporal
# For LOCAL mode (Docker), configure these:
LOCAL_ARANGO_ENDPOINT=http://localhost:8530
LOCAL_ARANGO_USERNAME=root
LOCAL_ARANGO_PASSWORD=
LOCAL_ARANGO_DATABASE=ic-knowledge-graph-temporal
# GraphRAG prefix for collection names
GRAPHRAG_PREFIX=OR1200_pip install -r requirements-core.txtKey Dependencies:
arango-entity-resolution==3.1.0- Official PyPI package for entity resolution- Provides
WeightedFieldSimilarityfor multi-field scoring (name + description) - Lazy loading ensures fast startup times
- No manual configuration required
- Provides
Optional (GraphRAG/document processing):
pip install -r requirements.txtThis repo runs analytics via the agentic-graph-analytics project. Install from source (editable):
cd ~/code/agentic-graph-analytics
git pull origin main
pip install -e .Ensure .env has valid ArangoDB credentials—the workflow uses JWT for GRAL; tokens expire during long runs and are auto-refreshed using ARANGO_ENDPOINT, ARANGO_USER (or ARANGO_USERNAME), and ARANGO_PASSWORD.
The entire ingestion and bridging process is orchestrated via:
./src/import_all.sh
python3 src/create_graph.py
python3 src/bridger.py
python3 src/etl_authors.py # Optional: Author expertise mappingAuthor Expertise Mapping (optional but recommended):
- Extracts 8 unique authors from Git commits
- Creates AUTHORED edges (author → commit)
- Creates MAINTAINS edges (author → module) based on commit frequency
- Enables expertise queries, bus factor analysis, and collaboration networks
Run the test suite to ensure the environment is correctly configured:
pytest tests/Customers can explore the preloaded demo database ic-knowledge-graph-temporal in read-only mode, then create their own numbered sandbox database ic-knowledge-graph-1, ic-knowledge-graph-2, … for hands-on exercises.
See docs/CUSTOMER_EXERCISE_WORKFLOW.md for the step-by-step process (UI-primary DB creation, GraphRAG UI import, and one-command setup).
Once your ArangoDB database is populated (pipeline above), run:
python run_ic_analysis.pyReports are written to ic_analysis_output/ as both Markdown and interactive HTML.
The "Semantic Bridge" can be explored visually via the ArangoDB Dashboard:
- Go to Graphs -> IC_Temporal_Knowledge_Graph.
- Identify cross-model links:
(RTL_Module) -[RESOLVED_TO]-> (OR1200_Golden_Entities).
Complete demonstration materials are available:
- Quick Start: Read
docs/DEMO_EXECUTIVE_SUMMARY.md(5-minute overview) - Setup Theme: Run
python scripts/setup/install_theme.pyto install the 'hardware-design' visualization theme - Setup Queries: Run
python scripts/setup/install_demo_setup.pyto install queries and actions - Demo Guide: Follow
docs/DEMO_SCRIPT.mdfor a comprehensive demonstration - Preparation: Use
docs/DEMO_README.mdfor setup checklist and troubleshooting
The demo showcases:
- Hierarchical semantic bridges (spec → code)
- Temporal design audit (time-travel queries)
- Type-safe entity resolution
- Sub-200ms graph traversals
- Agent integration for 10x token savings
For technical details, see the Project Walkthrough.