Skip to content

frontal-labs/semantic-engine

Frontal's Ontology Engine Logo

Frontal's Ontology Engine

Open-Source Semantic Layer & Knowledge Engineering Framework

Python 3.8+ License: MIT PyPI Monthly Downloads Total Downloads CI Discord

Give us a Star • Fork us • Join our Discord

Transform Choas into Intelligence. Build AI systems that are explainable, traceable, and trustworthy — not black boxes.


What is Frontal's Ontology Engine?

Frontal's Ontology Engine is an open-source semantic intelligence framework that transforms raw, unstructured data into validated, explainable, and auditable knowledge for modern AI systems.

It provides the semantic foundation for:

  • GraphRAG systems
  • AI Agents & Multi-Agent Systems
  • Reasoning and decision-support models
  • High-stakes enterprise AI platforms

Frontal's Ontology Engine is built for environments where every answer must be explainable, traceable, and governed.


The Core Problem: The Semantic & Trust Gap

Most AI systems fail in high-stakes domains because they operate on text similarity, not meaning.

What Organizations Have

  • PDFs, DOCX, emails, logs
  • APIs, databases, streams
  • Conflicting facts and duplicates
  • Siloed systems with no lineage

What High-Stakes AI Requires

  • Formal domain rules (ontologies)
  • Structured and validated entities
  • Explicit semantic relationships
  • Explainable reasoning paths
  • End-to-end traceability
  • Audit-ready provenance

Without semantics:

  • Decisions cannot be explained
  • Errors cannot be traced
  • Conflicts go undetected
  • Compliance becomes impossible

Trustworthy AI requires semantic accountability.


Frontal's Ontology Engine vs Traditional RAG

Traditional RAG Frontal's Ontology Engine
Black-box answers Explainable reasoning
No provenance Source-level traceability
Vector similarity only Semantic + graph reasoning
No conflict handling Explicit contradiction detection
Unsafe for high-stakes use Designed for governed environments

Frontal's Ontology Engine Architecture

1⃣ Input Layer — Governed Ingestion

  • PDFs, DOCX, HTML
  • JSON, CSV, databases
  • APIs, streams, archives
  • Multi-modal content

All data enters through a single ingestion pipeline with metadata and source tracking.


2⃣ Semantic Layer — Trust & Reasoning Engine

This layer enforces governance by design:

  • Entity extraction & normalization
  • Relationship discovery & triplet generation
  • Automated ontology induction
  • Entity deduplication (Jaro-Winkler, disjoint properties)
  • Conflict detection and resolution
  • Provenance tracking (source, time, confidence)
  • Reasoning trace generation
  • Context engineering for grounded LLM outputs

3⃣ Output Layer — Auditable Knowledge Assets

  • Knowledge Graphs (queryable, temporal, explainable)
  • OWL Ontologies (HermiT / Pellet validated)
  • Vector Embeddings (FastEmbed by default)

Every AI response can be traced back to:

  • Source documents
  • Extracted entities & relations
  • Ontology rules applied
  • Reasoning steps used

Core Capabilities (High-Stakes Ready)

  • Explainable GraphRAG — Graph-based reasoning with inspectable paths
  • Automated Ontology Generation — Domain rules encoded explicitly
  • Traceable Knowledge Graphs — Full lineage and versioning
  • Agent Memory with Guardrails — Rule-validated agent actions
  • Production-Grade QA — Deduplication, conflict detection, validation
  • LLM-Agnostic Design — Works across providers with structured outputs
  • Scalable Pipelines — Parallel, modular, production-friendly

Built for High-Stakes Domains

Frontal's Ontology Engine is designed for domains where mistakes have real consequences:

  • Healthcare & Life Sciences — Clinical reasoning, audit trails
  • Finance & Risk — Explainable decisions, regulatory compliance
  • Legal & Compliance — Evidence-backed reasoning
  • Cybersecurity & Intelligence — Attribution and provenance
  • Government & Defense — Governed, auditable AI systems

Who Uses Frontal's Ontology Engine?

  • AI / ML Engineers — Explainable GraphRAG & agents
  • Data Engineers — Governed semantic pipelines
  • Knowledge Engineers — Ontologies & KGs at scale
  • Enterprise Teams — Trustworthy AI infrastructure
  • Risk & Compliance Teams — Audit-ready systems

Installation

Install from PyPI (Recommended)

pip install ontology-engine
# or
pip install ontology-engine[all]

Install from Source (Development)

# Clone and install in editable mode
git clone https://github.com/Hawksight-AI/ontology_engine.git
cd ontology-engine
pip install -e .

# Or with all optional dependencies
pip install -e ".[all]"

# Development setup
pip install -e ".[dev]"

Resources

New to Frontal's Ontology Engine? Check out the Cookbook for hands-on examples!

Core Capabilities

Data Ingestion Semantic Extract Knowledge Graphs Ontology
Multiple Formats Entity & Relations Graph Analytics Auto Generation
Context GraphRAG LLM Providers Pipeline
Agent Memory, Context Graph, Context Retriever Hybrid RAG 100+ LLMs Parallel Workers
QA Reasoning
Conflict Resolution Rule-based Inference

Universal Data Ingestion

Multiple file formats • PDF, DOCX, HTML, JSON, CSV, databases, feeds, archives

from ontology_engine.ingest import FileIngestor, WebIngestor, DBIngestor

file_ingestor = FileIngestor(recursive=True)
web_ingestor = WebIngestor(max_depth=3)
db_ingestor = DBIngestor(connection_string="postgresql://...")

sources = []
sources.extend(file_ingestor.ingest("documents/"))
sources.extend(web_ingestor.ingest("https://example.com"))
sources.extend(db_ingestor.ingest(query="SELECT * FROM articles"))

print(f" Ingested {len(sources)} sources")

Cookbook: Data Ingestion

Document Parsing & Processing

Multi-format parsingText normalizationIntelligent chunking

from ontology_engine.parse import DocumentParser, DoclingParser
from ontology_engine.normalize import TextNormalizer
from ontology_engine.split import TextSplitter

# Standard parsing
parser = DocumentParser()
parsed = parser.parse("document.pdf", format="auto")

# Enhanced parsing with Docling (recommended for complex layouts/tables)
# Requires: pip install docling
docling_parser = DoclingParser(enable_ocr=True)
result = docling_parser.parse("complex_table.pdf")

print(f"Text (Markdown): {result['full_text'][:100]}...")
print(f"Extracted {len(result['tables'])} tables")
for i, table in enumerate(result['tables']):
 print(f"Table {i+1} headers: {table.get('headers', [])}")

# Normalize text
normalizer = TextNormalizer()
normalized = normalizer.normalize(parsed, clean_html=True, normalize_entities=True)

# Split into chunks
splitter = TextSplitter(method="token", chunk_size=1000, chunk_overlap=200)
chunks = splitter.split(normalized)

Cookbook: Document ParsingData NormalizationChunking & Splitting

Semantic Intelligence Engine

Entity & Relation Extraction • NER, Relationships, Events, Triplets with LLM Enhancement

from ontology_engine.semantic_extract import NERExtractor, RelationExtractor

text = "Apple Inc., founded by Steve Jobs in 1976, acquired Beats Electronics for $3 billion."

# Extract entities
ner_extractor = NERExtractor(method="ml", model="en_core_web_sm")
entities = ner_extractor.extract(text)

# Extract relationships
relation_extractor = RelationExtractor(method="dependency", model="en_core_web_sm")
relationships = relation_extractor.extract(text, entities=entities)

print(f"Entities: {len(entities)}, Relationships: {len(relationships)}")

Cookbook: Entity ExtractionRelation ExtractionAdvanced Extraction

Knowledge Graph Construction

Production-Ready KGs • Entity Resolution • Temporal Support • Graph Analytics

from ontology_engine.semantic_extract import NERExtractor, RelationExtractor
from ontology_engine.kg import GraphBuilder

# Extract entities and relationships
ner_extractor = NERExtractor(method="ml", model="en_core_web_sm")
relation_extractor = RelationExtractor(method="dependency", model="en_core_web_sm")

entities = ner_extractor.extract(text)
relationships = relation_extractor.extract(text, entities=entities)

# Build knowledge graph
builder = GraphBuilder()
kg = builder.build({"entities": entities, "relationships": relationships})

print(f"Nodes: {len(kg.get('entities', []))}, Edges: {len(kg.get('relationships', []))}")

Cookbook: Building Knowledge GraphsGraph Analytics

Embeddings & Vector Store

FastEmbed by defaultMultiple backendsSemantic search

from ontology_engine.embeddings import EmbeddingGenerator
from ontology_engine.vector_store import VectorStore

# Generate embeddings
embedding_gen = EmbeddingGenerator(model_name="sentence-transformers/all-MiniLM-L6-v2", dimension=384)
embeddings = embedding_gen.generate_embeddings(chunks, data_type="text")

# Store in vector database
vector_store = VectorStore(backend="faiss", dimension=384)
vector_store.store_vectors(vectors=embeddings, metadata=[{"text": chunk} for chunk in chunks])

# Search
results = vector_store.search(query="supply chain", top_k=5)

Cookbook: Embedding GenerationVector Store

Graph Store & Triplet Store

Neo4j, FalkorDB, Amazon Neptune supportSPARQL queriesRDF triplets

from ontology_engine.graph_store import GraphStore
from ontology_engine.triplet_store import TripletStore

# Graph Store (Neo4j, FalkorDB)
graph_store = GraphStore(backend="neo4j", uri="bolt://localhost:7687", user="neo4j", password="password")
graph_store.add_nodes([{"id": "n1", "labels": ["Person"], "properties": {"name": "Alice"}}])

# Amazon Neptune Graph Store (OpenCypher via HTTP with IAM Auth)
neptune_store = GraphStore(
 backend="neptune",
 endpoint="your-cluster.us-east-1.neptune.amazonaws.com",
 port=8182,
 region="us-east-1",
 iam_auth=True, # Uses AWS credential chain (boto3, env vars, or IAM role)
)

# Node Operations
neptune_store.add_nodes([
 {"labels": ["Person"], "properties": {"id": "alice", "name": "Alice", "age": 30}},
 {"labels": ["Person"], "properties": {"id": "bob", "name": "Bob", "age": 25}},
])

# Query Operations
result = neptune_store.execute_query("MATCH (p:Person) RETURN p.name, p.age")

# Triplet Store (Blazegraph, Jena, RDF4J)
triplet_store = TripletStore(backend="blazegraph", endpoint="http://localhost:9999/blazegraph")
triplet_store.add_triplet({"subject": "Alice", "predicate": "knows", "object": "Bob"})
results = triplet_store.execute_query("SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10")

Cookbook: Graph StoreTriplet Store

Ontology Generation & Management

6-Stage LLM Pipeline • Automatic OWL Generation • HermiT/Pellet Validation

from ontology_engine.ontology import OntologyGenerator

generator = OntologyGenerator(llm_provider="openai", model="gpt-4")
ontology = generator.generate_from_documents(sources=["domain_docs/"])

print(f"Classes: {len(ontology.classes)}")

Cookbook: Ontology

Context Engineering & Memory Systems

Persistent MemoryContext GraphContext RetrieverHybrid Retrieval (Vector + Graph)Production Graph Store (Neo4j)Entity LinkingMulti-Hop Reasoning

from ontology_engine.context import AgentContext, ContextGraph, ContextRetriever
from ontology_engine.vector_store import VectorStore
from ontology_engine.graph_store import GraphStore
from ontology_engine.llms import Groq

# Initialize Context with Hybrid Retrieval (Graph + Vector)
context = AgentContext(
 vector_store=VectorStore(backend="faiss"),
 knowledge_graph=GraphStore(backend="neo4j"), # Optional: Use persistent graph
 hybrid_alpha=0.75 # 75% weight to Knowledge Graph, 25% to Vector
)

# Build Context Graph from entities and relationships
graph_stats = context.build_graph(
 entities=kg.get('entities', []),
 relationships=kg.get('relationships', []),
 link_entities=True
)

# Store memory with automatic entity linking
context.store(
 "User is building a RAG system with Frontal's Ontology Engine",
 metadata={"priority": "high", "topic": "rag"}
)

# Use Context Retriever for hybrid retrieval
retriever = context.retriever # Access underlying ContextRetriever
results = retriever.retrieve(
 query="What is the user building?",
 max_results=10,
 use_graph_expansion=True
)

# Retrieve with context expansion
results = context.retrieve("What is the user building?", use_graph_expansion=True)

# Query with reasoning and LLM-generated responses
llm_provider = Groq(model="llama-3.1-8b-instant", api_key=os.getenv("GROQ_API_KEY"))
reasoned_result = context.query_with_reasoning(
 query="What is the user building?",
 llm_provider=llm_provider,
 max_hops=2
)

Core Components:

  • ContextGraph: Builds and manages context graphs from entities and relationships for enhanced retrieval
  • ContextRetriever: Performs hybrid retrieval combining vector search, graph traversal, and memory for optimal context relevance
  • AgentContext: High-level interface integrating Context Graph and Context Retriever for GraphRAG applications

Core Notebooks:

Related Components: Vector StoreEmbedding GenerationAdvanced Vector Store

Knowledge Graph-Powered RAG (GraphRAG)

30% Accuracy Improvement • Vector + Graph Hybrid Search • 91% Accuracy • Multi-Hop ReasoningLLM-Generated Responses

from ontology_engine.context import AgentContext
from ontology_engine.llms import Groq, OpenAI, LiteLLM
from ontology_engine.vector_store import VectorStore
import os

# Initialize GraphRAG with hybrid retrieval
context = AgentContext(
 vector_store=VectorStore(backend="faiss"),
 knowledge_graph=kg
)

# Configure LLM provider (supports Groq, OpenAI, HuggingFace, LiteLLM)
llm_provider = Groq(
 model="llama-3.1-8b-instant",
 api_key=os.getenv("GROQ_API_KEY")
)

# Query with multi-hop reasoning and LLM-generated responses
result = context.query_with_reasoning(
 query="What IPs are associated with security alerts?",
 llm_provider=llm_provider,
 max_results=10,
 max_hops=2
)

print(f"Response: {result['response']}")
print(f"Reasoning Path: {result['reasoning_path']}")
print(f"Confidence: {result['confidence']:.3f}")

Key Features:

  • Multi-Hop Reasoning: Traverses knowledge graph up to N hops to find related entities
  • LLM-Generated Responses: Natural language answers grounded in graph context
  • Reasoning Trace: Shows entity relationship paths used in reasoning
  • Multiple LLM Providers: Supports Groq, OpenAI, HuggingFace, and LiteLLM (100+ LLMs)

Cookbook: GraphRAGReal-Time Anomaly Detection

LLM Providers Module

Unified LLM Interface100+ LLM Support via LiteLLMClean ImportsMultiple Providers

from ontology_engine.llms import Groq, OpenAI, HuggingFaceLLM, LiteLLM
import os

# Groq - Fast inference
groq = Groq(
 model="llama-3.1-8b-instant",
 api_key=os.getenv("GROQ_API_KEY")
)
response = groq.generate("What is AI?")

# OpenAI
openai = OpenAI(
 model="gpt-4",
 api_key=os.getenv("OPENAI_API_KEY")
)
response = openai.generate("What is AI?")

# HuggingFace - Local models
hf = HuggingFaceLLM(model_name="gpt2")
response = hf.generate("What is AI?")

# LiteLLM - Unified interface to 100+ LLMs
litellm = LiteLLM(
 model="openai/gpt-4o", # or "anthropic/claude-sonnet-4-20250514", "groq/llama-3.1-8b-instant", etc.
 api_key=os.getenv("OPENAI_API_KEY")
)
response = litellm.generate("What is AI?")

# Structured output
structured = groq.generate_structured("Extract entities from: Apple Inc. was founded by Steve Jobs.")

Supported Providers:

  • Groq: Fast inference with Llama models
  • OpenAI: GPT-3.5, GPT-4, and other OpenAI models
  • HuggingFace: Local LLM inference with Transformers
  • LiteLLM: Unified interface to 100+ LLM providers (OpenAI, Anthropic, Azure, Bedrock, Vertex AI, and more)

Reasoning & Inference Engine

Rule-based InferenceForward/Backward ChainingRete AlgorithmExplanation Generation

from ontology_engine.reasoning import Reasoner

# Initialize Reasoner
reasoner = Reasoner()

# Define rules and facts
rules = ["IF Parent(?a, ?b) AND Parent(?b, ?c) THEN Grandparent(?a, ?c)"]
facts = ["Parent(Alice, Bob)", "Parent(Bob, Charlie)"]

# Infer new facts (Forward Chaining)
inferred = reasoner.infer_facts(facts, rules)
print(f"Inferred: {inferred}") # ['Grandparent(Alice, Charlie)']

# Explain reasoning
from ontology_engine.reasoning import ExplanationGenerator
explainer = ExplanationGenerator()
# ... generate explanation for inferred facts

Cookbook: ReasoningRete Engine

Pipeline Orchestration & Parallel Processing

Orchestrator-Worker Pattern • Parallel Execution • Scalable Processing

from ontology_engine.pipeline import PipelineBuilder, ExecutionEngine

pipeline = PipelineBuilder() \
 .add_step("ingest", "custom", func=ingest_data) \
 .add_step("extract", "custom", func=extract_entities) \
 .add_step("build", "custom", func=build_graph) \
 .build()

result = ExecutionEngine().execute_pipeline(pipeline, parallel=True)

Production-Ready Quality Assurance

Enterprise-Grade QA • Conflict Detection • Deduplication

from ontology_engine.deduplication import DuplicateDetector
from ontology_engine.conflicts import ConflictDetector

entities = kg.get("entities", [])
conflicts = ConflictDetector().detect_conflicts(entities)
duplicates = DuplicateDetector(similarity_threshold=0.85).detect_duplicates(entities)

print(f"Conflicts: {len(conflicts)} | Duplicates: {len(duplicates)}")

Cookbook: Conflict Detection & ResolutionDeduplication

Visualization & Export

Interactive graphsMulti-format exportGraph analytics

from ontology_engine.visualization import KGVisualizer
from ontology_engine.export import GraphExporter

# Visualize knowledge graph
viz = KGVisualizer(layout="force")
fig = viz.visualize_network(kg, output="interactive")
fig.show()

# Export to multiple formats
exporter = GraphExporter()
exporter.export(kg, format="json", output_path="graph.json")
exporter.export(kg, format="graphml", output_path="graph.graphml")

Cookbook: VisualizationExport

Seed Data Integration

Foundation dataEntity resolutionDomain knowledge

from ontology_engine.seed import SeedDataManager

seed_manager = SeedDataManager()
seed_manager.seed_data.entities = [
 {"id": "s1", "text": "Supplier A", "type": "Supplier", "source": "foundation", "verified": True}
]

# Use seed data for entity resolution
resolved = seed_manager.resolve_entities(extracted_entities)

Cookbook: Seed Data

Quick Start

For comprehensive examples, see the Cookbook with interactive notebooks!

from ontology_engine.semantic_extract import NERExtractor, RelationExtractor
from ontology_engine.kg import GraphBuilder
from ontology_engine.context import AgentContext, ContextGraph
from ontology_engine.vector_store import VectorStore

# Extract entities and relationships
ner_extractor = NERExtractor(method="ml", model="en_core_web_sm")
relation_extractor = RelationExtractor(method="dependency", model="en_core_web_sm")

text = "Apple Inc. was founded by Steve Jobs in 1976."
entities = ner_extractor.extract(text)
relationships = relation_extractor.extract(text, entities=entities)

# Build knowledge graph
builder = GraphBuilder()
kg = builder.build({"entities": entities, "relationships": relationships})

# Query using GraphRAG
vector_store = VectorStore(backend="faiss", dimension=384)
context_graph = ContextGraph()
context_graph.build_from_entities_and_relationships(
 entities=kg.get('entities', []),
 relationships=kg.get('relationships', [])
)
context = AgentContext(vector_store=vector_store, knowledge_graph=context_graph)

results = context.retrieve("Who founded Apple?", max_results=5)
print(f"Found {len(results)} results")

Cookbook: Your First Knowledge Graph

Use Cases

Enterprise Knowledge Engineering — Unify data sources into knowledge graphs, breaking down silos.

AI Agents & Autonomous Systems — Build agents with persistent memory and semantic understanding.

Multi-Format Document Processing — Process multiple formats through a unified pipeline.

Data Pipeline Processing — Build scalable pipelines with parallel execution.

Intelligence & Security — Analyze networks, threat intelligence, forensic analysis.

Finance & Trading — Fraud detection, market intelligence, risk assessment.

Biomedical — Drug discovery, medical literature analysis.

Frontal's Ontology Engine Cookbook

Interactive Jupyter Notebooks designed to take you from beginner to expert.

View Full Cookbook

Featured Recipes

Recipe Description Link
GraphRAG Complete Build a production-ready Graph Retrieval Augmented Generation system. Features Graph Validation, Hybrid Retrieval, and Logical Inference. Open Notebook
RAG vs. GraphRAG Side-by-side comparison. Demonstrates the Reasoning Gap and how GraphRAG solves it with Inference Engines. Open Notebook
First Knowledge Graph Go from raw text to a queryable knowledge graph in 20 minutes. Open Notebook
Real-Time Anomalies Detect anomalies in streaming data using temporal knowledge graphs and pattern detection. Open Notebook

Core Tutorials

Industry Use Cases (14 Cookbooks)

Domain-Specific Cookbooks showcasing real-world applications with real data sources, advanced chunking strategies, temporal KGs, GraphRAG, and comprehensive Frontal's Ontology Engine module integration:

Biomedical

Finance

  • Financial Data Integration MCP - Alpha Vantage API, MCP servers, seed data, real-time ingestion
  • Fraud Detection - Transaction streams, temporal KGs, pattern detection, conflict resolution, Context Graph, Context Retriever, GraphRAG with Groq LLM

Blockchain

Cybersecurity

Intelligence & Law Enforcement

Renewable Energy

Supply Chain

Explore Use Case Examples — See real-world implementations in finance, biomedical, cybersecurity, and more. 14 comprehensive domain-specific cookbooks with real data sources, advanced chunking strategies, temporal KGs, GraphRAG, and full Frontal's Ontology Engine module integration.

Advanced Features

Incremental Updates — Real-time stream processing with Kafka, RabbitMQ, Kinesis for live updates.

Multi-Language Support — Process multiple languages with automatic detection.

Custom Ontology Import — Import and extend Schema.org and custom ontologies.

Advanced Reasoning — Forward/backward chaining, Rete-based pattern matching, and automated explanation generation.

Graph Analytics — Centrality, community detection, path finding, temporal analysis.

Custom Pipelines — Build custom pipelines with parallel execution.

API Integration — Integrate external APIs for entity enrichment.

See Advanced Examples — Advanced extraction, graph analytics, reasoning, and more.

Roadmap

Q1 2026

  • Core framework (v1.0)
  • GraphRAG engine
  • 6-stage ontology pipeline
  • Advanced reasoning v2 (Rete, Forward/Backward Chaining)
  • Quality assurance features and Quality Assurance module
  • Enhanced multi-language support
  • Evals
  • Real-time streaming improvements

Q2 2026

  • Multi-modal processing

Community & Support

Join Our Community

Channel Purpose
Discord Real-time help, showcases
GitHub Discussions Q&A, feature requests

Learning Resources

Enterprise Support

Enterprise support, professional services, and commercial licensing will be available in the future. For now, we offer community support through Discord and GitHub Discussions.

Current Support:

Future Enterprise Offerings:

  • Professional support with SLA
  • Enterprise licensing
  • Custom development services
  • Priority feature requests
  • Dedicated support channels

Stay tuned for updates!

Contributing

How to Contribute

# Fork and clone
git clone https://github.com/your-username/ontology_engine.git
cd ontology-engine

# Create branch
git checkout -b feature/your-feature

# Install dev dependencies
pip install -e ".[dev,test]"

# Make changes and test
pytest tests/
black ontology-engine/
flake8 ontology-engine/

# Commit and push
git commit -m "Add feature"
git push origin feature/your-feature

Contribution Types

  1. Code - New features, bug fixes
  2. Documentation - Improvements, tutorials
  3. Bug Reports - Create issue
  4. Feature Requests - Request feature

Contributors

Contributors

License

Frontal's Ontology Engine is licensed under the MIT License - see the LICENSE file for details.

Built by the Frontal's Ontology Engine Community

GitHubDiscord

About

Semantic Engine is a framework for defining, composing, and reasoning over domain semantics, enabling consistent semantics across services, workflows, and AI systems.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

 
 
 

Contributors

Languages