ReasonGraph

A graph-based reasoning library that discovers connections across independent documents through entity and causal extraction, embedding search, and multi-hop graph traversal.

Why ReasonGraph?

Standard RAG retrieves documents similar to your query. ReasonGraph discovers connections between documents that were written independently.

When you feed text into add_texts(), GLiNER2 automatically extracts entities and cause-effect relations that become nodes and edges in a graph. Documents that share entities or causal chains get connected -- even if they never reference each other. Multi-hop traversal then walks these connections to build reasoning chains that span multiple sources.

Installation

pip install reasongraph[all]        # everything included

Or install only what you need:

pip install reasongraph             # core: in-memory backend, NER extraction, embeddings
pip install reasongraph[sqlite]     # + SQLite backend with sqlite-vec
pip install reasongraph[gliner2]    # + GLiNER2 entity + causal extraction (recommended)
pip install reasongraph[postgres]   # + PostgreSQL + pgvector backend

Cross-Source Discovery

Two reports about different topics. Source A covers TSMC's semiconductor plant. Source B covers Arizona's water crisis. Neither mentions the other's subject.

import asyncio
from reasongraph import ReasonGraph

source_a = [  # Tech industry report
    "TSMC announced plans to build a $40 billion semiconductor fabrication plant in Phoenix, Arizona.",
    "The Phoenix fab requires 10 million gallons of purified water daily to cool wafers during the chip etching process.",
    "TSMC signed a long-term supply agreement with Apple to manufacture next-generation M-series processors at the Arizona facility.",
    "Construction delays at the Phoenix site pushed first production to late 2025, raising concerns among TSMC's major customers.",
]

source_b = [  # Environmental report -- never mentions TSMC, semiconductors, or chips
    "Arizona declared a water emergency after Lake Mead dropped to its lowest level since the 1930s, threatening water supply for millions.",
    "The Arizona Department of Water Resources ordered mandatory water cuts for all industrial users in Maricopa County, where Phoenix is located.",
    "Intel paused expansion of its Chandler, Arizona chip plant citing water availability concerns and rising operational costs.",
    "Apple warned investors that component shortages from its Asian and North American suppliers could impact iPhone production timelines through 2026.",
]

async def main():
    async with ReasonGraph() as graph:
        await graph.add_texts(source_a)
        await graph.add_texts(source_b)
        results = await graph.query("How does the Arizona water crisis affect semiconductor manufacturing?")
        for i, text in enumerate(results, 1):
            source = "A" if text in source_a else "B"
            print(f"{i}. [Source {source}] {text}")

asyncio.run(main())

1. [Source B] Intel paused expansion of its Chandler, Arizona chip plant citing water availability concerns and rising operational costs.
2. [Source B] The Arizona Department of Water Resources ordered mandatory water cuts for all industrial users in Maricopa County, where Phoenix is located.
3. [Source A] The Phoenix fab requires 10 million gallons of purified water daily to cool wafers during the chip etching process.
4. [Source B] Arizona declared a water emergency after Lake Mead dropped to its lowest level since the 1930s.
5. [Source A] TSMC announced plans to build a $40 billion semiconductor fabrication plant in Phoenix, Arizona.
6. [Source A] TSMC signed a long-term supply agreement with Apple to manufacture M-series processors at the Arizona facility.

Results come from both sources. No single document contains this chain. Here is what happens under the hood:

GLiNER2 extracts entities and causal relations from each text:

Text (abbreviated)	Entities	Causal relations
TSMC to build fab in Phoenix, Arizona...	TSMC, Phoenix, Arizona	--
Phoenix fab requires 10M gallons water...	Phoenix	--
TSMC supply agreement with Apple...	TSMC, Apple, Arizona	--
Construction delays at Phoenix site...	TSMC, Phoenix	Construction delays -> first production
Arizona water emergency, Lake Mead...	Arizona, Lake Mead	Lake Mead dropped -> water emergency
Mandatory water cuts in Maricopa County...	Arizona Dept. of Water Resources, Phoenix, Maricopa County	--
Intel paused Arizona chip plant...	Intel, Chandler, Arizona	--
Apple warned of component shortages...	Apple	component shortages -> iPhone production timelines

Three entities appear in both sources, creating bridge nodes:

Bridge entity	Source A connections	Source B connections
Arizona	TSMC fab, TSMC-Apple deal	water emergency, Intel pause, water cuts
Phoenix	TSMC fab, water usage, delays	water cuts for industrial users
Apple	TSMC supply agreement	component shortage warning

The query traversal path:

Water crisis query -> finds water-related texts from both sources via embeddings -> follows Arizona and Phoenix entity edges to discover TSMC's water-intensive fab -> follows Apple entity edge from TSMC supply agreement to Apple's component shortage warning. The causal relation Lake Mead dropped -> water emergency connects the environmental trigger to the industrial impact.

Full demo: uv run python examples/cross_source_discovery.py

Quick Start

Using a built-in dataset

from reasongraph import ReasonGraph

graph = ReasonGraph()
graph.initialize_sync()
graph.load_dataset_sync("financial")

results = graph.query_sync("What caused the 2008 financial crisis?")
for i, text in enumerate(results, 1):
    print(f"{i}. {text}")

graph.close_sync()

Output -- a connected reasoning chain, not just keyword matches:

1. Lehman Brothers filed for bankruptcy in September 2008 after massive MBS losses.
2. Loose lending standards fueled a housing price bubble across the United States.
3. Lehman's collapse triggered a global credit freeze as interbank lending stopped.
4. Mortgage-backed securities built on subprime loans collapsed when defaults surged.
5. The U.S. government enacted TARP, a $700 billion bailout to stabilize the financial system.
6. Banks issued subprime mortgages to borrowers with poor credit histories.

Async API

import asyncio
from reasongraph import ReasonGraph

async def main():
    async with ReasonGraph() as graph:
        await graph.load_dataset("financial")
        results = await graph.query("What caused the 2008 crisis?")
        for text in results:
            print(text)

asyncio.run(main())

Features

Cross-source discovery -- connect facts across independent documents through shared entities and causal relations
Automatic extraction -- GLiNER2 extracts entities and causal relations in one pass (falls back to BERT NER when gliner2 is not installed)
Hybrid search -- combine embedding similarity, keyword (trigram) matching, or both
Multi-hop traversal -- follow graph edges to discover connected reasoning chains
Cross-encoder reranking -- rerank results at each hop with ms-marco-MiniLM-L-6-v2
Built-in datasets -- load curated reasoning graphs for immediate use
Async-first -- native async API with sync convenience wrappers
Pluggable backends -- in-memory (zero-config default), SQLite, or PostgreSQL with pgvector

Built-in Datasets

Dataset	Description
`syllogisms`	Classical syllogistic reasoning chains
`causal`	Cause-effect reasoning with entity annotations
`taxonomy`	Hierarchical concept taxonomy
`financial`	Financial crisis causal chains (2008 crisis, dot-com, inflation, eurozone)
`medical`	Medical causal chains (heart disease, diabetes, infectious disease, cancer)
`analysis_patterns`	Data analysis reasoning: scenario detection, technique selection, implementation patterns

graph.load_dataset_sync("financial")

Search Modes

# Pure embedding similarity (default)
results = graph.query_sync("credit freeze", search_mode="embedding")

# Pure keyword/trigram matching
results = graph.query_sync("credit freeze", search_mode="keyword")

# Hybrid: Reciprocal Rank Fusion of embedding + trigram rankings
results = graph.query_sync("credit freeze", search_mode="hybrid")

# Tune the RRF smoothing constant (default 60, lower = more weight to top ranks)
results = graph.query_sync("credit freeze", search_mode="hybrid", rrf_k=30)

Entity and Causal Extraction

When gliner2 is installed, add_text() / add_texts() automatically use GLiNER2 for both entity extraction and causal relation detection. Without gliner2, it falls back to BERT NER (entities only).

from reasongraph import ReasonGraph, NERExtractor, GLiNER2Extractor

graph = ReasonGraph()
graph.initialize_sync()

# Default: GLiNER2 (entities + causal relations) if installed, else BERT NER
entities = graph.add_text_sync("Apple released the iPhone in 2007.")
print(entities)  # ['Apple', 'iPhone']

# Explicit: force BERT NER even if GLiNER2 is installed
entities = graph.add_text_sync("Apple released the iPhone in 2007.", extractor=NERExtractor())

# Explicit: GLiNER2 with custom entity types
gliner = GLiNER2Extractor(entity_types=["company", "product", "date"])
entities = graph.add_text_sync("Apple released the iPhone in 2007.", extractor=gliner)

# Any callable works
entities = graph.add_text_sync("some text", extractor=lambda t: ["custom"])

Backends

By default, ReasonGraph() uses a pure Python in-memory backend (MemoryBackend). This works everywhere with zero dependencies beyond numpy. For persistence, pass a file path to save/load as JSON:

from reasongraph import ReasonGraph, MemoryBackend

# In-memory only (default)
graph = ReasonGraph()

# In-memory with JSON file persistence (loads on init, saves on close)
graph = ReasonGraph(backend=MemoryBackend(file_path="graph.json"))

SQLite Backend

For larger graphs or concurrent access, use the SQLite backend with sqlite-vec for vector search. Requires pip install reasongraph[sqlite].

from reasongraph import ReasonGraph
from reasongraph.backends import SqliteBackend

graph = ReasonGraph(backend=SqliteBackend(db_path="graph.db"))

PostgreSQL Backend

from reasongraph import ReasonGraph
from reasongraph.backends import PostgresBackend

graph = ReasonGraph(backend=PostgresBackend(database_url="postgresql://user:pass@localhost/db"))

Requires pip install reasongraph[postgres] and the pgvector + pg_trgm extensions enabled on your database.

Evaluation: Mixed-Domain Reasoning

We evaluate reasoning quality by loading all 6 built-in datasets into a single graph (~130 text nodes, ~104 entity nodes, ~280 edges) and testing whether the library can trace the correct causal chains, syllogistic proofs, taxonomic hierarchies, and data analysis patterns -- without being distracted by unrelated facts from other domains.

32 test cases simulate agent-style queries like "I need to understand what caused the 2008 financial crisis", "How does insulin resistance lead to kidney failure?", or "I have two numeric columns, check if related" and check whether the returned reasoning chain matches the expected ground truth.

Per-domain results (hybrid search, top_k=5, hops=4, rerank_top_k=4):

Domain	Cases	Chain Completeness	Recall@5	Precision@5	Domain Accuracy
Causal	5	100%	100%	92%	100%
Financial	6	100%	82%	60%	100%
Medical	5	100%	92%	76%	92%
Syllogisms	5	100%	100%	92%	85%
Taxonomy	3	100%	83%	53%	92%
Analysis Patterns	8	96%	75%	45%	96%
Overall	32	99%	88%	68%	95%

32/32 cases pass (>= 50% chain completeness). Split reranking gives chain continuations (text-to-text edges) priority over bridge discoveries (entity-to-text edges), keeping traversal focused.

Search mode comparison:

Mode	Chain Completeness	Recall@5	Precision@5	Domain Accuracy
Embedding	99%	88%	68%	95%
Keyword	0%	0%	0%	0%
Hybrid	99%	88%	68%	95%

Keyword-only mode scores 0% because the eval queries are natural language questions that don't substring-match the dataset's declarative statements. This is expected -- keyword search is designed for known-term lookups, not question answering.

Reproduce: uv run python tests/eval_financial_reasoning.py

API Reference

`ReasonGraph(backend=None, embed_model=None, rerank_model=None, forget_after=30)`

Method	Description
`add_nodes(nodes)`	Add `(content, type)` tuples to the graph
`add_edges(edges)`	Add `(from, to)` content edges
`add_text(text, extractor=None)`	Add text with automatic entity extraction
`add_texts(texts, extractor=None, causal_extractor=None)`	Batch add with entity + causal extraction (auto-enabled with GLiNER2)
`query(query, top_k=5, hops=4, rerank_top_k=4, search_mode="embedding", rrf_k=60)`	Search and traverse the graph
`load_dataset(name)`	Load a built-in dataset
`delete_stale()`	Remove nodes not accessed within `forget_after` days
`get_all_nodes()` / `get_all_edges()`	Inspect graph contents

All methods are async. Sync variants are available with a _sync suffix (e.g. query_sync).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
examples		examples
src/reasongraph		src/reasongraph
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReasonGraph

Why ReasonGraph?

Installation

Cross-Source Discovery

Quick Start

Using a built-in dataset

Async API

Features

Built-in Datasets

Search Modes

Entity and Causal Extraction

Backends

SQLite Backend

PostgreSQL Backend

Evaluation: Mixed-Domain Reasoning

API Reference

`ReasonGraph(backend=None, embed_model=None, rerank_model=None, forget_after=30)`

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ReasonGraph

Why ReasonGraph?

Installation

Cross-Source Discovery

Quick Start

Using a built-in dataset

Async API

Features

Built-in Datasets

Search Modes

Entity and Causal Extraction

Backends

SQLite Backend

PostgreSQL Backend

Evaluation: Mixed-Domain Reasoning

API Reference

ReasonGraph(backend=None, embed_model=None, rerank_model=None, forget_after=30)

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`ReasonGraph(backend=None, embed_model=None, rerank_model=None, forget_after=30)`

Packages