LlamaIndex PropertyGraphStore backed by GrafeoDB, an embedded graph database with native vector search.
Build knowledge graphs from documents, query them with GQL, and run vector similarity search, all in a single .db file. No servers, no infrastructure.
uv add grafeo-llamaindexfrom llama_index.core import PropertyGraphIndex, SimpleDirectoryReader
from grafeo_llamaindex import GrafeoPropertyGraphStore
documents = SimpleDirectoryReader("./data").load_data()
graph_store = GrafeoPropertyGraphStore(db_path="./knowledge_graph.db")
index = PropertyGraphIndex.from_documents(
documents,
property_graph_store=graph_store,
embed_kg_nodes=True,
)
retriever = index.as_retriever(include_text=True)
nodes = retriever.retrieve("What are the key relationships?")- Full PropertyGraphStore: all 8 abstract methods implemented (
get,get_triplets,get_rel_map,upsert_nodes,upsert_relations,delete,structured_query,vector_query) - Structured + vector queries:
supports_structured_queries = Trueandsupports_vector_queries = Truein a single store - Embedded database: no Docker, no cloud, no external services. Just
uv add grafeo - Single-file persistence: the entire knowledge graph lives in one
.dbfile - Native HNSW vector search: embeddings stored alongside graph nodes, no separate vector DB needed
- Multi-language queries: GQL, Cypher, Gremlin, GraphQL, SPARQL and SQL/PGQ all supported
- Built-in graph algorithms: PageRank, Louvain, shortest paths, centrality and 30+ more via
graph_store.client.algorithms
from grafeo_llamaindex import GrafeoPropertyGraphStore
store = GrafeoPropertyGraphStore(
db_path=None, # str | None - path for persistent storage, None for in-memory
embedding_dimensions=1536, # int - vector dimensions for HNSW index
embedding_metric="cosine", # str - "cosine", "euclidean", "dot_product", or "manhattan"
dedup_threshold=None, # float | None - cosine similarity threshold for entity dedup
)Properties:
store.client: access the underlyinggrafeo.GrafeoDBinstance for direct queries and algorithmsstore.supports_structured_queries:Truestore.supports_vector_queries:True
Methods (PropertyGraphStore interface):
| Method | Description |
|---|---|
upsert_nodes(nodes) |
Insert or update EntityNode / ChunkNode objects |
upsert_relations(relations) |
Insert edges between existing nodes |
get(properties, ids) |
Retrieve nodes by ID or property filter |
get_triplets(entity_names, relation_names, ids) |
Get (source, relation, target) triplets |
get_rel_map(graph_nodes, depth, ignore_rels) |
BFS traversal from seed nodes |
delete(entity_names, relation_names, ids) |
Remove nodes and/or edges |
structured_query(query) |
Execute raw GQL/Cypher (or Gremlin with g. prefix) |
vector_query(query) |
HNSW similarity search over node embeddings |
get_schema() / get_schema_str() |
Inspect graph labels, edge types, and properties |
persist(path) |
Save in-memory database to disk |
close() |
Close the database connection |
The entire knowledge graph lives in a single .db file. Pass db_path to store data on disk, or omit it for in-memory use.
from grafeo_llamaindex import GrafeoPropertyGraphStore
# Create and populate
store = GrafeoPropertyGraphStore(db_path="./my_graph.db")
# ... upsert nodes and relations ...
store.close()
# Reopen later with the same path
store = GrafeoPropertyGraphStore(db_path="./my_graph.db")
print(store.node_count, store.edge_count) # data is still thereYou can also save an in-memory store to disk:
store = GrafeoPropertyGraphStore() # in-memory
# ... populate ...
store.persist("./snapshot.db")When dedup_threshold is set, upsert_nodes checks whether an incoming EntityNode's embedding is similar enough to an existing node (same label) to merge them instead of creating a duplicate.
store = GrafeoPropertyGraphStore(
dedup_threshold=0.95, # cosine similarity threshold
embedding_dimensions=1536,
)Key behavior:
- Threshold semantics: if
cosine_similarity(new, existing) >= dedup_threshold, the new node merges into the existing one (properties are overwritten, the originalcreated_attimestamp is preserved). - Label-scoped: dedup only compares nodes with the same label. A "Person" and a "Company" with identical embeddings are never merged.
- ChunkNode excluded:
ChunkNodeobjects are never deduplicated, onlyEntityNode. - Requires embedding: nodes without an embedding are never deduplicated.
- Runtime toggle: you can set
store.dedup_threshold = 0.9at any time and it takes effect on the nextupsert_nodescall.
upsert_relations silently skips relations whose source_id or target_id does not match any existing node (by name or LlamaIndex ID). A UserWarning is emitted for each skipped relation, so you can catch these with Python's warnings module if needed.
| Neo4j | FalkorDB | Grafeo | |
|---|---|---|---|
| Requires server | Yes | Yes | No (embedded) |
| Vector search | Plugin (5.x+) | Limited | Native HNSW |
| Graph algorithms | GDS plugin ($) | Built-in | Built-in (30+) |
| Query languages | Cypher | Cypher | GQL, Cypher, Gremlin, GraphQL, SPARQL, SQL/PGQ |
| Deployment | Docker/Cloud | Docker/Cloud | uv add grafeo |
| Persistence | Server-managed | Server-managed | Single .db file |
See the examples/ directory:
mock_embedding_demo.py: full demo with hand-crafted embeddings, no API key requiredbasic_graph_rag.py: build a Property Graph Index from documents and query it (requires OpenAI API key)hybrid_retrieval.py: structured queries + vector search + PageRank, all in one script
uv sync # install deps
uv run pytest -v # run tests
uv run ruff check . # lint
uv run ruff format . # format
uv run ty check # type checkApache-2.0