Integrated Circuit Design Knowledge Graph Demo

This demonstration showcases the power of ArangoDB in harmonizing structured integrated circuit (IC) hardware design (RTL/Verilog), temporal version history (Git), and unstructured technical specifications (GraphRAG) into a single, queryable knowledge graph. The current implementation uses the OR1200 RISC processor as sample data.

[Schema diagram — generated by running the ETL pipeline and opening the graph in ArangoDB Visualizer]

Research Foundations

This project is a modern implementation of the principles established in the Design Knowledge Management System (DKMS) research program co-authored for the Air Force Materiel Command (1989-1992). It realizes the vision of a "Semantic Bridge" between design intent and implementation that was pioneered in these foundational reports.

For details on the theoretical foundations, see docs/research/DKMS_Foundations.md.

Visualizing the Knowledge Graph

Global Schema

The knowledge graph harmonizes three disparate data silos: RTL code structure, Git history, and technical specifications.

[Schema diagram — generated by running docs/project/SCHEMA.md Mermaid diagram or viewing in ArangoDB Visualizer]

The Semantic Bridge

The core value of this project is the Semantic Bridge, which connects unstructured documentation (GraphRAG) to structured hardware implementation (RTL). Below is a visualization from the ArangoDB Graph Visualizer showing a Documentation Entity (center) resolved to multiple RTL Modules (the "Flip-Flop" logic block hierarchy).

[Semantic Bridge visualization — generated by opening IC_Temporal_Knowledge_Graph in ArangoDB Visualizer and running the "Show Entity Resolutions" canvas action]

Key Features

Semantic Bridge: Automatically links Verilog modules, ports, and signals to entities referenced in corresponding documentation sections using lexical analysis.
High-Performance Consolidation: Uses set-based AQL operations for near-instant (sub-second) entity resolution across thousands of documentation nodes.
Temporal Insight: Ingests full Git history to allow "Time-Travel" queries across the evolution of the hardware design.
Author Expertise Mapping: First-class contributor vertices enable knowledge transfer, collaboration analysis, and bus factor assessment.
Granular RTL Graph: Decomposes monolithic Verilog files into a rich graph of Module, Port, Signal, and LogicChunk nodes.
GraphRAG Augmented: Integrated with high-quality entity and community extraction from the Arango AI team's GraphRAG pipeline.

GraphRAG Status

Note: The GraphRAG document import pipeline requires ArangoDB AMP (cloud) with the GenAI services feature enabled. It is an optional, advanced component — the core RTL + Git + semantic bridging pipeline works independently without it.

The GraphRAG collections (OR1200_Entities, OR1200_Golden_Entities, OR1200_Relations, etc.) were populated during development and are present in the demo database. The Python-based re-import pipeline (src/etl_graphrag.py) was implemented but encountered API integration issues during deadline-constrained development and has not been fully validated end-to-end.

What works without GraphRAG:

Full RTL parsing and graph construction
Git history ingestion and author expertise mapping
Semantic bridging between RTL elements and documentation entities (reads existing OR1200_Golden_Entities)
All AQL queries and visualizations in the demo

What requires ArangoDB AMP + GraphRAG:

Re-importing or refreshing document entities from PDFs (src/etl_graphrag.py)
Running the Importer/Retriever services via the GenAI API

See GRAPHRAG_STATUS.md for a detailed description of the integration, known issues, and instructions for attempting a fresh import.

Project Structure

src/: Core ETL and bridging scripts.
docs/: Comprehensive documentation (see docs/README.md)
- project/: Core project docs (Walkthrough, Schema, PRD)
- reference/: Technical references
tests/: Unit tests for parsing and normalization logic.
or1200/: The source RTL repository (submodule).
validation/: Ground truth datasets and validation scripts.

Setup & Usage

1. Prerequisites

Python 3.10+
ArangoDB instance (local Docker or remote)
Cluster users: if you see collection shards spread across many DB-Servers (one shard per collection, different leaders), graph-heavy queries pay extra network cost. See docs/arangodb-cluster-sharding.md for OneShard vs SmartGraph, scripts/setup/create_oneshard_database.py (new DB), and scripts/setup/migrate_to_oneshard.sh (dump → drop → OneShard → restore).

2. Environment Configuration

Copy env.template to .env in the root directory and configure your settings:

cp env.template .env

Then edit .env with your specific values:

# Choose LOCAL or REMOTE mode
ARANGO_MODE=LOCAL

# For REMOTE mode, configure these:
ARANGO_ENDPOINT=https://your-instance.arango.ai
ARANGO_USERNAME=root
ARANGO_PASSWORD=your_password
ARANGO_DATABASE=ic-knowledge-graph-temporal

# For LOCAL mode (Docker), configure these:
LOCAL_ARANGO_ENDPOINT=http://localhost:8530
LOCAL_ARANGO_USERNAME=root
LOCAL_ARANGO_PASSWORD=
LOCAL_ARANGO_DATABASE=ic-knowledge-graph-temporal

# GraphRAG prefix for collection names
GRAPHRAG_PREFIX=OR1200_

3. Install Dependencies

pip install -r requirements-core.txt

Key Dependencies:

arango-entity-resolution==3.1.0 - Official PyPI package for entity resolution
- Provides WeightedFieldSimilarity for multi-field scoring (name + description)
- Lazy loading ensures fast startup times
- No manual configuration required

Optional (GraphRAG/document processing):

pip install -r requirements.txt

3b. Install agentic graph analytics (required for analytics reports)

This repo runs analytics via the agentic-graph-analytics project. Install from source (editable):

cd ~/code/agentic-graph-analytics
git pull origin main
pip install -e .

Ensure .env has valid ArangoDB credentials—the workflow uses JWT for GRAL; tokens expire during long runs and are auto-refreshed using ARANGO_ENDPOINT, ARANGO_USER (or ARANGO_USERNAME), and ARANGO_PASSWORD.

4. Running the Pipeline

The entire ingestion and bridging process is orchestrated via:

./src/import_all.sh
python3 src/create_graph.py
python3 src/bridger.py
python3 src/etl_authors.py  # Optional: Author expertise mapping

Author Expertise Mapping (optional but recommended):

Extracts 8 unique authors from Git commits
Creates AUTHORED edges (author → commit)
Creates MAINTAINS edges (author → module) based on commit frequency
Enables expertise queries, bus factor analysis, and collaboration networks

5. Verification

Run the test suite to ensure the environment is correctly configured:

pytest tests/

Customer hands-on workflow (numbered databases)

Customers can explore the preloaded demo database ic-knowledge-graph-temporal in read-only mode, then create their own numbered sandbox database ic-knowledge-graph-1, ic-knowledge-graph-2, … for hands-on exercises.

See docs/CUSTOMER_EXERCISE_WORKFLOW.md for the step-by-step process (UI-primary DB creation, GraphRAG UI import, and one-command setup).

Agentic analytics (reports)

Once your ArangoDB database is populated (pipeline above), run:

python run_ic_analysis.py

Reports are written to ic_analysis_output/ as both Markdown and interactive HTML.

Visualization

The "Semantic Bridge" can be explored visually via the ArangoDB Dashboard:

Go to Graphs -> IC_Temporal_Knowledge_Graph.
Identify cross-model links: (RTL_Module) -[RESOLVED_TO]-> (OR1200_Golden_Entities).

Demo Materials

Complete demonstration materials are available:

Quick Start: Read docs/DEMO_EXECUTIVE_SUMMARY.md (5-minute overview)
Setup Theme: Run python scripts/setup/install_theme.py to install the 'hardware-design' visualization theme
Setup Queries: Run python scripts/setup/install_demo_setup.py to install queries and actions
Demo Guide: Follow docs/DEMO_SCRIPT.md for a comprehensive demonstration
Preparation: Use docs/DEMO_README.md for setup checklist and troubleshooting

The demo showcases:

Hierarchical semantic bridges (spec → code)
Temporal design audit (time-travel queries)
Type-safe entity resolution
Sub-200ms graph traversals
Agent integration for 10x token savings

For technical details, see the Project Walkthrough.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.cursor		.cursor
.github/workflows		.github/workflows
data		data
docs		docs
or1200 @ 09f7535		or1200 @ 09f7535
scripts		scripts
src		src
tests		tests
validation		validation
.gitignore		.gitignore
.gitmodules		.gitmodules
CURSOR_HANDOFF.md		CURSOR_HANDOFF.md
GRAPHRAG_STATUS.md		GRAPHRAG_STATUS.md
HARDENING_AND_VERIFICATION.md		HARDENING_AND_VERIFICATION.md
README.md		README.md
business-requirements.md		business-requirements.md
conftest.py		conftest.py
env.template		env.template
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-core.txt		requirements-core.txt
requirements-graphrag.txt		requirements-graphrag.txt
requirements.txt		requirements.txt
run_ic_analysis.py		run_ic_analysis.py
validate_integration.py		validate_integration.py
validate_quality.py		validate_quality.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Integrated Circuit Design Knowledge Graph Demo

Research Foundations

Visualizing the Knowledge Graph

Global Schema

The Semantic Bridge

Key Features

GraphRAG Status

Project Structure

Setup & Usage

1. Prerequisites

2. Environment Configuration

3. Install Dependencies

3b. Install agentic graph analytics (required for analytics reports)

4. Running the Pipeline

5. Verification

Customer hands-on workflow (numbered databases)

Agentic analytics (reports)

Visualization

Demo Materials

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Integrated Circuit Design Knowledge Graph Demo

Research Foundations

Visualizing the Knowledge Graph

Global Schema

The Semantic Bridge

Key Features

GraphRAG Status

Project Structure

Setup & Usage

1. Prerequisites

2. Environment Configuration

3. Install Dependencies

3b. Install agentic graph analytics (required for analytics reports)

4. Running the Pipeline

5. Verification

Customer hands-on workflow (numbered databases)

Agentic analytics (reports)

Visualization

Demo Materials

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages