Transform Choas into Intelligence. Build AI systems that are explainable, traceable, and trustworthy — not black boxes.
Frontal's Ontology Engine is an open-source semantic intelligence framework that transforms raw, unstructured data into validated, explainable, and auditable knowledge for modern AI systems.
It provides the semantic foundation for:
- GraphRAG systems
- AI Agents & Multi-Agent Systems
- Reasoning and decision-support models
- High-stakes enterprise AI platforms
Frontal's Ontology Engine is built for environments where every answer must be explainable, traceable, and governed.
Most AI systems fail in high-stakes domains because they operate on text similarity, not meaning.
- PDFs, DOCX, emails, logs
- APIs, databases, streams
- Conflicting facts and duplicates
- Siloed systems with no lineage
- Formal domain rules (ontologies)
- Structured and validated entities
- Explicit semantic relationships
- Explainable reasoning paths
- End-to-end traceability
- Audit-ready provenance
Without semantics:
- Decisions cannot be explained
- Errors cannot be traced
- Conflicts go undetected
- Compliance becomes impossible
Trustworthy AI requires semantic accountability.
| Traditional RAG | Frontal's Ontology Engine |
|---|---|
| Black-box answers | Explainable reasoning |
| No provenance | Source-level traceability |
| Vector similarity only | Semantic + graph reasoning |
| No conflict handling | Explicit contradiction detection |
| Unsafe for high-stakes use | Designed for governed environments |
- PDFs, DOCX, HTML
- JSON, CSV, databases
- APIs, streams, archives
- Multi-modal content
All data enters through a single ingestion pipeline with metadata and source tracking.
This layer enforces governance by design:
- Entity extraction & normalization
- Relationship discovery & triplet generation
- Automated ontology induction
- Entity deduplication (Jaro-Winkler, disjoint properties)
- Conflict detection and resolution
- Provenance tracking (source, time, confidence)
- Reasoning trace generation
- Context engineering for grounded LLM outputs
- Knowledge Graphs (queryable, temporal, explainable)
- OWL Ontologies (HermiT / Pellet validated)
- Vector Embeddings (FastEmbed by default)
Every AI response can be traced back to:
- Source documents
- Extracted entities & relations
- Ontology rules applied
- Reasoning steps used
- Explainable GraphRAG — Graph-based reasoning with inspectable paths
- Automated Ontology Generation — Domain rules encoded explicitly
- Traceable Knowledge Graphs — Full lineage and versioning
- Agent Memory with Guardrails — Rule-validated agent actions
- Production-Grade QA — Deduplication, conflict detection, validation
- LLM-Agnostic Design — Works across providers with structured outputs
- Scalable Pipelines — Parallel, modular, production-friendly
Frontal's Ontology Engine is designed for domains where mistakes have real consequences:
- Healthcare & Life Sciences — Clinical reasoning, audit trails
- Finance & Risk — Explainable decisions, regulatory compliance
- Legal & Compliance — Evidence-backed reasoning
- Cybersecurity & Intelligence — Attribution and provenance
- Government & Defense — Governed, auditable AI systems
- AI / ML Engineers — Explainable GraphRAG & agents
- Data Engineers — Governed semantic pipelines
- Knowledge Engineers — Ontologies & KGs at scale
- Enterprise Teams — Trustworthy AI infrastructure
- Risk & Compliance Teams — Audit-ready systems
pip install ontology-engine
# or
pip install ontology-engine[all]# Clone and install in editable mode
git clone https://github.com/Hawksight-AI/ontology_engine.git
cd ontology-engine
pip install -e .
# Or with all optional dependencies
pip install -e ".[all]"
# Development setup
pip install -e ".[dev]"New to Frontal's Ontology Engine? Check out the Cookbook for hands-on examples!
- Cookbook - Interactive notebooks
- Introduction - Getting started tutorials
- Advanced - Advanced techniques
- Use Cases - Real-world applications
| Data Ingestion | Semantic Extract | Knowledge Graphs | Ontology |
|---|---|---|---|
| Multiple Formats | Entity & Relations | Graph Analytics | Auto Generation |
| Context | GraphRAG | LLM Providers | Pipeline |
| Agent Memory, Context Graph, Context Retriever | Hybrid RAG | 100+ LLMs | Parallel Workers |
| QA | Reasoning | ||
| Conflict Resolution | Rule-based Inference |
Multiple file formats • PDF, DOCX, HTML, JSON, CSV, databases, feeds, archives
from ontology_engine.ingest import FileIngestor, WebIngestor, DBIngestor
file_ingestor = FileIngestor(recursive=True)
web_ingestor = WebIngestor(max_depth=3)
db_ingestor = DBIngestor(connection_string="postgresql://...")
sources = []
sources.extend(file_ingestor.ingest("documents/"))
sources.extend(web_ingestor.ingest("https://example.com"))
sources.extend(db_ingestor.ingest(query="SELECT * FROM articles"))
print(f" Ingested {len(sources)} sources")Multi-format parsing • Text normalization • Intelligent chunking
from ontology_engine.parse import DocumentParser, DoclingParser
from ontology_engine.normalize import TextNormalizer
from ontology_engine.split import TextSplitter
# Standard parsing
parser = DocumentParser()
parsed = parser.parse("document.pdf", format="auto")
# Enhanced parsing with Docling (recommended for complex layouts/tables)
# Requires: pip install docling
docling_parser = DoclingParser(enable_ocr=True)
result = docling_parser.parse("complex_table.pdf")
print(f"Text (Markdown): {result['full_text'][:100]}...")
print(f"Extracted {len(result['tables'])} tables")
for i, table in enumerate(result['tables']):
print(f"Table {i+1} headers: {table.get('headers', [])}")
# Normalize text
normalizer = TextNormalizer()
normalized = normalizer.normalize(parsed, clean_html=True, normalize_entities=True)
# Split into chunks
splitter = TextSplitter(method="token", chunk_size=1000, chunk_overlap=200)
chunks = splitter.split(normalized)Cookbook: Document Parsing • Data Normalization • Chunking & Splitting
Entity & Relation Extraction • NER, Relationships, Events, Triplets with LLM Enhancement
from ontology_engine.semantic_extract import NERExtractor, RelationExtractor
text = "Apple Inc., founded by Steve Jobs in 1976, acquired Beats Electronics for $3 billion."
# Extract entities
ner_extractor = NERExtractor(method="ml", model="en_core_web_sm")
entities = ner_extractor.extract(text)
# Extract relationships
relation_extractor = RelationExtractor(method="dependency", model="en_core_web_sm")
relationships = relation_extractor.extract(text, entities=entities)
print(f"Entities: {len(entities)}, Relationships: {len(relationships)}")Cookbook: Entity Extraction • Relation Extraction • Advanced Extraction
Production-Ready KGs • Entity Resolution • Temporal Support • Graph Analytics
from ontology_engine.semantic_extract import NERExtractor, RelationExtractor
from ontology_engine.kg import GraphBuilder
# Extract entities and relationships
ner_extractor = NERExtractor(method="ml", model="en_core_web_sm")
relation_extractor = RelationExtractor(method="dependency", model="en_core_web_sm")
entities = ner_extractor.extract(text)
relationships = relation_extractor.extract(text, entities=entities)
# Build knowledge graph
builder = GraphBuilder()
kg = builder.build({"entities": entities, "relationships": relationships})
print(f"Nodes: {len(kg.get('entities', []))}, Edges: {len(kg.get('relationships', []))}")Cookbook: Building Knowledge Graphs • Graph Analytics
FastEmbed by default • Multiple backends • Semantic search
from ontology_engine.embeddings import EmbeddingGenerator
from ontology_engine.vector_store import VectorStore
# Generate embeddings
embedding_gen = EmbeddingGenerator(model_name="sentence-transformers/all-MiniLM-L6-v2", dimension=384)
embeddings = embedding_gen.generate_embeddings(chunks, data_type="text")
# Store in vector database
vector_store = VectorStore(backend="faiss", dimension=384)
vector_store.store_vectors(vectors=embeddings, metadata=[{"text": chunk} for chunk in chunks])
# Search
results = vector_store.search(query="supply chain", top_k=5)Cookbook: Embedding Generation • Vector Store
Neo4j, FalkorDB, Amazon Neptune support • SPARQL queries • RDF triplets
from ontology_engine.graph_store import GraphStore
from ontology_engine.triplet_store import TripletStore
# Graph Store (Neo4j, FalkorDB)
graph_store = GraphStore(backend="neo4j", uri="bolt://localhost:7687", user="neo4j", password="password")
graph_store.add_nodes([{"id": "n1", "labels": ["Person"], "properties": {"name": "Alice"}}])
# Amazon Neptune Graph Store (OpenCypher via HTTP with IAM Auth)
neptune_store = GraphStore(
backend="neptune",
endpoint="your-cluster.us-east-1.neptune.amazonaws.com",
port=8182,
region="us-east-1",
iam_auth=True, # Uses AWS credential chain (boto3, env vars, or IAM role)
)
# Node Operations
neptune_store.add_nodes([
{"labels": ["Person"], "properties": {"id": "alice", "name": "Alice", "age": 30}},
{"labels": ["Person"], "properties": {"id": "bob", "name": "Bob", "age": 25}},
])
# Query Operations
result = neptune_store.execute_query("MATCH (p:Person) RETURN p.name, p.age")
# Triplet Store (Blazegraph, Jena, RDF4J)
triplet_store = TripletStore(backend="blazegraph", endpoint="http://localhost:9999/blazegraph")
triplet_store.add_triplet({"subject": "Alice", "predicate": "knows", "object": "Bob"})
results = triplet_store.execute_query("SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10")Cookbook: Graph Store • Triplet Store
6-Stage LLM Pipeline • Automatic OWL Generation • HermiT/Pellet Validation
from ontology_engine.ontology import OntologyGenerator
generator = OntologyGenerator(llm_provider="openai", model="gpt-4")
ontology = generator.generate_from_documents(sources=["domain_docs/"])
print(f"Classes: {len(ontology.classes)}")Persistent Memory • Context Graph • Context Retriever • Hybrid Retrieval (Vector + Graph) • Production Graph Store (Neo4j) • Entity Linking • Multi-Hop Reasoning
from ontology_engine.context import AgentContext, ContextGraph, ContextRetriever
from ontology_engine.vector_store import VectorStore
from ontology_engine.graph_store import GraphStore
from ontology_engine.llms import Groq
# Initialize Context with Hybrid Retrieval (Graph + Vector)
context = AgentContext(
vector_store=VectorStore(backend="faiss"),
knowledge_graph=GraphStore(backend="neo4j"), # Optional: Use persistent graph
hybrid_alpha=0.75 # 75% weight to Knowledge Graph, 25% to Vector
)
# Build Context Graph from entities and relationships
graph_stats = context.build_graph(
entities=kg.get('entities', []),
relationships=kg.get('relationships', []),
link_entities=True
)
# Store memory with automatic entity linking
context.store(
"User is building a RAG system with Frontal's Ontology Engine",
metadata={"priority": "high", "topic": "rag"}
)
# Use Context Retriever for hybrid retrieval
retriever = context.retriever # Access underlying ContextRetriever
results = retriever.retrieve(
query="What is the user building?",
max_results=10,
use_graph_expansion=True
)
# Retrieve with context expansion
results = context.retrieve("What is the user building?", use_graph_expansion=True)
# Query with reasoning and LLM-generated responses
llm_provider = Groq(model="llama-3.1-8b-instant", api_key=os.getenv("GROQ_API_KEY"))
reasoned_result = context.query_with_reasoning(
query="What is the user building?",
llm_provider=llm_provider,
max_hops=2
)Core Components:
- ContextGraph: Builds and manages context graphs from entities and relationships for enhanced retrieval
- ContextRetriever: Performs hybrid retrieval combining vector search, graph traversal, and memory for optimal context relevance
- AgentContext: High-level interface integrating Context Graph and Context Retriever for GraphRAG applications
Core Notebooks:
- Context Module Introduction - Basic memory and storage.
- Advanced Context Engineering - Hybrid retrieval, graph builders, and custom memory policies.
- Fraud Detection - Demonstrates Context Graph and Context Retriever for fraud detection with GraphRAG.
Related Components: Vector Store • Embedding Generation • Advanced Vector Store
30% Accuracy Improvement • Vector + Graph Hybrid Search • 91% Accuracy • Multi-Hop Reasoning • LLM-Generated Responses
from ontology_engine.context import AgentContext
from ontology_engine.llms import Groq, OpenAI, LiteLLM
from ontology_engine.vector_store import VectorStore
import os
# Initialize GraphRAG with hybrid retrieval
context = AgentContext(
vector_store=VectorStore(backend="faiss"),
knowledge_graph=kg
)
# Configure LLM provider (supports Groq, OpenAI, HuggingFace, LiteLLM)
llm_provider = Groq(
model="llama-3.1-8b-instant",
api_key=os.getenv("GROQ_API_KEY")
)
# Query with multi-hop reasoning and LLM-generated responses
result = context.query_with_reasoning(
query="What IPs are associated with security alerts?",
llm_provider=llm_provider,
max_results=10,
max_hops=2
)
print(f"Response: {result['response']}")
print(f"Reasoning Path: {result['reasoning_path']}")
print(f"Confidence: {result['confidence']:.3f}")Key Features:
- Multi-Hop Reasoning: Traverses knowledge graph up to N hops to find related entities
- LLM-Generated Responses: Natural language answers grounded in graph context
- Reasoning Trace: Shows entity relationship paths used in reasoning
- Multiple LLM Providers: Supports Groq, OpenAI, HuggingFace, and LiteLLM (100+ LLMs)
Cookbook: GraphRAG • Real-Time Anomaly Detection
Unified LLM Interface • 100+ LLM Support via LiteLLM • Clean Imports • Multiple Providers
from ontology_engine.llms import Groq, OpenAI, HuggingFaceLLM, LiteLLM
import os
# Groq - Fast inference
groq = Groq(
model="llama-3.1-8b-instant",
api_key=os.getenv("GROQ_API_KEY")
)
response = groq.generate("What is AI?")
# OpenAI
openai = OpenAI(
model="gpt-4",
api_key=os.getenv("OPENAI_API_KEY")
)
response = openai.generate("What is AI?")
# HuggingFace - Local models
hf = HuggingFaceLLM(model_name="gpt2")
response = hf.generate("What is AI?")
# LiteLLM - Unified interface to 100+ LLMs
litellm = LiteLLM(
model="openai/gpt-4o", # or "anthropic/claude-sonnet-4-20250514", "groq/llama-3.1-8b-instant", etc.
api_key=os.getenv("OPENAI_API_KEY")
)
response = litellm.generate("What is AI?")
# Structured output
structured = groq.generate_structured("Extract entities from: Apple Inc. was founded by Steve Jobs.")Supported Providers:
- Groq: Fast inference with Llama models
- OpenAI: GPT-3.5, GPT-4, and other OpenAI models
- HuggingFace: Local LLM inference with Transformers
- LiteLLM: Unified interface to 100+ LLM providers (OpenAI, Anthropic, Azure, Bedrock, Vertex AI, and more)
Rule-based Inference • Forward/Backward Chaining • Rete Algorithm • Explanation Generation
from ontology_engine.reasoning import Reasoner
# Initialize Reasoner
reasoner = Reasoner()
# Define rules and facts
rules = ["IF Parent(?a, ?b) AND Parent(?b, ?c) THEN Grandparent(?a, ?c)"]
facts = ["Parent(Alice, Bob)", "Parent(Bob, Charlie)"]
# Infer new facts (Forward Chaining)
inferred = reasoner.infer_facts(facts, rules)
print(f"Inferred: {inferred}") # ['Grandparent(Alice, Charlie)']
# Explain reasoning
from ontology_engine.reasoning import ExplanationGenerator
explainer = ExplanationGenerator()
# ... generate explanation for inferred factsCookbook: Reasoning • Rete Engine
Orchestrator-Worker Pattern • Parallel Execution • Scalable Processing
from ontology_engine.pipeline import PipelineBuilder, ExecutionEngine
pipeline = PipelineBuilder() \
.add_step("ingest", "custom", func=ingest_data) \
.add_step("extract", "custom", func=extract_entities) \
.add_step("build", "custom", func=build_graph) \
.build()
result = ExecutionEngine().execute_pipeline(pipeline, parallel=True)Enterprise-Grade QA • Conflict Detection • Deduplication
from ontology_engine.deduplication import DuplicateDetector
from ontology_engine.conflicts import ConflictDetector
entities = kg.get("entities", [])
conflicts = ConflictDetector().detect_conflicts(entities)
duplicates = DuplicateDetector(similarity_threshold=0.85).detect_duplicates(entities)
print(f"Conflicts: {len(conflicts)} | Duplicates: {len(duplicates)}")Cookbook: Conflict Detection & Resolution • Deduplication
Interactive graphs • Multi-format export • Graph analytics
from ontology_engine.visualization import KGVisualizer
from ontology_engine.export import GraphExporter
# Visualize knowledge graph
viz = KGVisualizer(layout="force")
fig = viz.visualize_network(kg, output="interactive")
fig.show()
# Export to multiple formats
exporter = GraphExporter()
exporter.export(kg, format="json", output_path="graph.json")
exporter.export(kg, format="graphml", output_path="graph.graphml")Cookbook: Visualization • Export
Foundation data • Entity resolution • Domain knowledge
from ontology_engine.seed import SeedDataManager
seed_manager = SeedDataManager()
seed_manager.seed_data.entities = [
{"id": "s1", "text": "Supplier A", "type": "Supplier", "source": "foundation", "verified": True}
]
# Use seed data for entity resolution
resolved = seed_manager.resolve_entities(extracted_entities)For comprehensive examples, see the Cookbook with interactive notebooks!
from ontology_engine.semantic_extract import NERExtractor, RelationExtractor
from ontology_engine.kg import GraphBuilder
from ontology_engine.context import AgentContext, ContextGraph
from ontology_engine.vector_store import VectorStore
# Extract entities and relationships
ner_extractor = NERExtractor(method="ml", model="en_core_web_sm")
relation_extractor = RelationExtractor(method="dependency", model="en_core_web_sm")
text = "Apple Inc. was founded by Steve Jobs in 1976."
entities = ner_extractor.extract(text)
relationships = relation_extractor.extract(text, entities=entities)
# Build knowledge graph
builder = GraphBuilder()
kg = builder.build({"entities": entities, "relationships": relationships})
# Query using GraphRAG
vector_store = VectorStore(backend="faiss", dimension=384)
context_graph = ContextGraph()
context_graph.build_from_entities_and_relationships(
entities=kg.get('entities', []),
relationships=kg.get('relationships', [])
)
context = AgentContext(vector_store=vector_store, knowledge_graph=context_graph)
results = context.retrieve("Who founded Apple?", max_results=5)
print(f"Found {len(results)} results")Cookbook: Your First Knowledge Graph
Enterprise Knowledge Engineering — Unify data sources into knowledge graphs, breaking down silos.
AI Agents & Autonomous Systems — Build agents with persistent memory and semantic understanding.
Multi-Format Document Processing — Process multiple formats through a unified pipeline.
Data Pipeline Processing — Build scalable pipelines with parallel execution.
Intelligence & Security — Analyze networks, threat intelligence, forensic analysis.
Finance & Trading — Fraud detection, market intelligence, risk assessment.
Biomedical — Drug discovery, medical literature analysis.
Interactive Jupyter Notebooks designed to take you from beginner to expert.
| Recipe | Description | Link |
|---|---|---|
| GraphRAG Complete | Build a production-ready Graph Retrieval Augmented Generation system. Features Graph Validation, Hybrid Retrieval, and Logical Inference. | Open Notebook |
| RAG vs. GraphRAG | Side-by-side comparison. Demonstrates the Reasoning Gap and how GraphRAG solves it with Inference Engines. | Open Notebook |
| First Knowledge Graph | Go from raw text to a queryable knowledge graph in 20 minutes. | Open Notebook |
| Real-Time Anomalies | Detect anomalies in streaming data using temporal knowledge graphs and pattern detection. | Open Notebook |
- [Welcome to Frontal's Ontology Engine](cookbook/introduction/01_Welcome_to_Frontal's Ontology Engine.ipynb) - Framework Overview
- Data Ingestion - Universal Ingestion
- Entity Extraction - NER & Relationships
- Building Knowledge Graphs - Graph Construction
Domain-Specific Cookbooks showcasing real-world applications with real data sources, advanced chunking strategies, temporal KGs, GraphRAG, and comprehensive Frontal's Ontology Engine module integration:
- Drug Discovery Pipeline - PubMed RSS, entity-aware chunking, GraphRAG, vector similarity search
- Genomic Variant Analysis - bioRxiv RSS, temporal KGs, deduplication, pathway analysis
- Financial Data Integration MCP - Alpha Vantage API, MCP servers, seed data, real-time ingestion
- Fraud Detection - Transaction streams, temporal KGs, pattern detection, conflict resolution, Context Graph, Context Retriever, GraphRAG with Groq LLM
- DeFi Protocol Intelligence - CoinDesk RSS, ontology-aware chunking, conflict detection, ontology generation
- Transaction Network Analysis - Blockchain APIs, deduplication, network analytics
- Real-Time Anomaly Detection - CVE RSS, Kafka streams, temporal KGs, sentence chunking
- Threat Intelligence Hybrid RAG - Security RSS, entity-aware chunking, enhanced GraphRAG, deduplication
- Criminal Network Analysis - OSINT RSS, deduplication, network centrality, graph analytics
- Intelligence Analysis Orchestrator Worker - Pipeline orchestrator, multi-source integration, conflict detection
- Energy Market Analysis - Energy RSS, EIA API, temporal KGs, TemporalPatternDetector, trend prediction
- Supply Chain Data Integration - Logistics RSS, deduplication, relationship mapping
Explore Use Case Examples — See real-world implementations in finance, biomedical, cybersecurity, and more. 14 comprehensive domain-specific cookbooks with real data sources, advanced chunking strategies, temporal KGs, GraphRAG, and full Frontal's Ontology Engine module integration.
Incremental Updates — Real-time stream processing with Kafka, RabbitMQ, Kinesis for live updates.
Multi-Language Support — Process multiple languages with automatic detection.
Custom Ontology Import — Import and extend Schema.org and custom ontologies.
Advanced Reasoning — Forward/backward chaining, Rete-based pattern matching, and automated explanation generation.
Graph Analytics — Centrality, community detection, path finding, temporal analysis.
Custom Pipelines — Build custom pipelines with parallel execution.
API Integration — Integrate external APIs for entity enrichment.
See Advanced Examples — Advanced extraction, graph analytics, reasoning, and more.
- Core framework (v1.0)
- GraphRAG engine
- 6-stage ontology pipeline
- Advanced reasoning v2 (Rete, Forward/Backward Chaining)
- Quality assurance features and Quality Assurance module
- Enhanced multi-language support
- Evals
- Real-time streaming improvements
- Multi-modal processing
| Channel | Purpose |
|---|---|
| Discord | Real-time help, showcases |
| GitHub Discussions | Q&A, feature requests |
Enterprise support, professional services, and commercial licensing will be available in the future. For now, we offer community support through Discord and GitHub Discussions.
Current Support:
- Community Support - Free support via Discord and GitHub Discussions
- Bug Reports - GitHub Issues
Future Enterprise Offerings:
- Professional support with SLA
- Enterprise licensing
- Custom development services
- Priority feature requests
- Dedicated support channels
Stay tuned for updates!
# Fork and clone
git clone https://github.com/your-username/ontology_engine.git
cd ontology-engine
# Create branch
git checkout -b feature/your-feature
# Install dev dependencies
pip install -e ".[dev,test]"
# Make changes and test
pytest tests/
black ontology-engine/
flake8 ontology-engine/
# Commit and push
git commit -m "Add feature"
git push origin feature/your-feature- Code - New features, bug fixes
- Documentation - Improvements, tutorials
- Bug Reports - Create issue
- Feature Requests - Request feature
Frontal's Ontology Engine is licensed under the MIT License - see the LICENSE file for details.