LanceDB-backed memory provider plugin for Hermes Agent.
Embeds a workspace-scoped LanceDB table at ~/.hermes/lancedb/memories.lance and exposes four tools to the agent: lancedb_recall, lancedb_remember, lancedb_read, lancedb_forget. Recall defaults to pure vector ANN over OpenAI embeddings, with an optional hybrid mode (vector + BM25, fused via RRF / linear / cross-encoder) per call or via config. Durable facts are extracted from sessions at pre-compress and session end. The memory store runs entirely in Hermes's Python process — no external memory service, no server (embeddings call your configured embeddings API).
Just want to install it? Jump straight to Installation (users) — about five minutes, and you can try it in an isolated profile that won't touch your existing Hermes setup.
- Vector recall by default: ANN over OpenAI embeddings — lightest, no reranker. Switch to hybrid (vector + BM25) per call or via config.
- Hybrid fusion (configurable): default is RRF;
reranker.type: lineardoes a weighted vector/FTS combination (weightbiases toward vector);reranker.type: cross-encoderadds a reranking pass (default modelcross-encoder/ettin-reranker-17m-v1, configurable). Only the cross-encoder needssentence-transformers. - Workspace isolation: every row carries an
agent_workspacetag and recall pre-filters by it. - Fact-first retrieval: recall surfaces extracted facts; raw conversation turns are stored as provenance and used only as fallback.
- Mid-session extraction: facts are pulled out via an auxiliary LLM on
on_pre_compressandon_session_end, so insights survive context compression. - Transparent forget: preview candidates, then delete by exact ID.
- Auto-compaction: periodic
table.optimize(cleanup_older_than=...)runs in the background to bound fragment and version-file growth from single-row writes.
This repo's primary purpose is the LanceDB memory plugin. The benchmark is auxiliary — it exists only to show the plugin is fast, cheap, and accurate. Hermes loads a plugin from its directory root (the repo-root __init__.py + plugin.yaml); the implementation lives in the src/ subpackage, which the entry point re-exports. If you only want the plugin, everything you need is under src/ — you never have to touch the benchmark.
| Path | What it is |
|---|---|
__init__.py |
Thin entry point — Hermes loads this; it re-exports the provider from src/ and defines register(). |
plugin.yaml |
Hermes plugin manifest (name, hooks). |
src/ |
The plugin — provider.py, store.py, retrieval.py, config.py, embeddings.py, extraction.py, tools.py, and default_config.yaml (the single source of defaults, copied into ~/.hermes/config.yaml). |
benchmarks/ |
Benchmark only (LongMemEval harness). Never imported by the plugin. |
tests/ |
Test suite. |
The plugin and the benchmark are cleanly separated: the benchmark borrows the plugin via its loader but the plugin never imports anything under benchmarks/.
- Python 3.11+
uv- Hermes Agent installed locally
- An LLM API key (OpenAI, OpenRouter, Anthropic, …)
Runtime dependencies installed into Hermes's venv: lancedb >= 0.33, openai, pyyaml. Embeddings go through an OpenAI-compatible client — by default OpenAI (text-embedding-3-small, so an OPENAI_API_KEY), but you can point it at any OpenAI-compatible endpoint via config (see Configuration reference). The default install needs no local ML stack. Only if you opt into the cross-encoder reranker (reranker.type: cross-encoder) do you also need sentence-transformers — which pulls in torch (~2 GB).
Use this section if you want LanceDB memory in your own Hermes setup. If you plan to edit the plugin's source, jump to Installation: developers.
Tip
Trying this without disturbing an existing Hermes setup? Run everything in an isolated profile. A profile gets its own config, sessions, and memory store, so nothing here touches your default Hermes. Create one first (it must exist before -p works):
hermes profile create lancedb-demoThen add -p lancedb-demo to every hermes command below — e.g. hermes -p lancedb-demo plugins install …, hermes -p lancedb-demo memory setup. When you're done, rm -rf ~/.hermes/profiles/lancedb-demo removes all trace. If you're new to Hermes and have nothing to protect, skip this and use the default profile (the commands as written).
# macOS / Linux / WSL2
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
# Windows (PowerShell)
iex (irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1)The installer handles uv, Python 3.11, Node.js, ripgrep, ffmpeg, and (on Windows) MinGit. It clones Hermes into ~/.hermes/hermes-agent/ and symlinks the binary to ~/.local/bin/hermes. After it finishes:
hermes doctor --fix # repairs symlinks, dirs, etc.
hermes setup # interactive: .env, API key, model picker
hermes doctor # final sanity checkNote
If you have AWS credentials in your shell environment, hermes doctor may log a Bedrock AccessDeniedException. This is Hermes's provider auto-detection and is ignorable if you're using OpenAI / Anthropic / OpenRouter.
hermes plugins install lancedb/hermes-agent-memoryThis shallow-clones https://github.com/lancedb/hermes-agent-memory.git into ~/.hermes/plugins/lancedb/ and renders after-install.md in a Rich panel telling you what's next. To pull updates later, re-run the same command.
Hermes loads plugins inside its own Python interpreter. Install lancedb and openai there — not into a separate venv.
# If Hermes is at a source checkout in /path/to/your/hermes-agent
uv pip install --python /path/to/your/hermes-agent/venv/bin/python3 lancedb openai pyyaml
# If you used the one-line installer
uv pip install --python ~/.hermes/hermes-agent/venv/bin/python3 lancedb openai pyyamlEmbeddings call the OpenAI API, so set OPENAI_API_KEY in your environment (or ~/.hermes/.env, or the profile's ~/.hermes/profiles/<name>/.env). Only if you enable the cross-encoder reranker (reranker.type: cross-encoder) do you also need sentence-transformers — install it the same way (uv pip install --python … sentence-transformers). Note it pulls in torch (~2 GB) and can exceed the setup-time install budget of 120s; the default plugin needs neither.
Note
These packages install into Hermes's interpreter, which is shared across all profiles — so there's no -p here, and you only install them once even if you use an isolated profile.
hermes memory setup
# pick "lancedb"This writes memory.provider: lancedb into ~/.hermes/config.yaml and writes the plugin defaults under plugins.lancedb. Embeddings use OpenAI text-embedding-3-small (1536-dim) via the API — there's no local model to download, but OPENAI_API_KEY must be set.
# ✓ LanceDB memory configured (embedding dim: 1536)
# Start a new session to activate.The most common "memory isn't working" report is simply the provider not being active — Hermes silently falls back to its built-in notes if memory.provider isn't set, and you'd never call the lancedb_* tools. Confirm it's on before you start chatting:
hermes memory status # look for: Provider: lancedb, installed ✓, available ✓
hermes plugins list # should list "lancedb"
hermes chat -q "Hello" # agent.log should contain `lancedb provider initialized`If memory status shows no provider (or the wrong one), re-run hermes memory setup and pick lancedb. (Add -p <name> to all three if you used an isolated profile.)
Use this section if you're working on the plugin's source.
git clone https://github.com/lancedb/hermes-agent-memory /path/to/your/hermes-agent-memory
cd /path/to/your/hermes-agent-memory
uv sync --extra devpyproject.toml sets [tool.uv] package = false − uv sync only manages a venv for tests, lint, and ad-hoc imports. The plugin itself is loaded by Hermes from its directory, not pip-installed.
ln -sf /path/to/your/hermes-agent-memory ~/.hermes/plugins/lancedbEdits to source files are picked up on the next Hermes session: no reinstall.
Warning
Once this symlink exists, don't also run hermes plugins install lancedb/... — the installer will refuse with Invalid plugin name 'lancedb': resolves outside the plugins directory because the path points outside ~/.hermes/plugins. The symlink is your install; just edit and restart Hermes. (Profiles are isolated, so you can still hermes -p <name> plugins install … into a separate profile.)
The dev venv only runs pytest / ruff. For end-to-end testing inside Hermes itself you still need the runtime deps installed against Hermes's Python:
uv pip install --python /path/to/your/hermes-agent/venv/bin/python3 lancedb openai pyyamluv run pytest -v
uv run ruff check .Add dev-only dependencies via:
uv add --dev pytest-mock| Tool | Purpose |
|---|---|
lancedb_recall |
Vector (default) / hybrid recall over workspace memory. Returns IDs, snippets, scores, provenance turn IDs. |
lancedb_remember |
Store a durable fact (preference, entity, event, case, pattern, general). Deduplicated by content hash. |
lancedb_read |
Fetch one memory by ID, optionally with the full provenance turns it was extracted from. |
lancedb_forget |
Two-step: action: preview to list candidates by description, then action: delete with the exact ID. |
The provider's system-prompt block instructs the model when to use each tool: lancedb_remember only when the user explicitly asks to remember, lancedb_forget preview before any delete, etc.
lancedb_recall searches workspace memory and returns the top matches. You control two things:
| You choose | Options | Set in | Scope |
|---|---|---|---|
| Search mode | vector (default) · hybrid |
lancedb_recall's mode argument; default from key plugins.lancedb.retrieval.mode in ~/.hermes/config.yaml |
per call |
| Hybrid fusion | rrf · linear · cross-encoder |
key plugins.lancedb.retrieval.reranker.type in ~/.hermes/config.yaml |
global |
Fusion only applies to hybrid mode and is config-only — the agent picks the mode per call, but the fusion is a global setting. To switch RRF → vector-biased linear, set reranker.type: linear (and reranker.weight) in ~/.hermes/config.yaml.
A pure-lexical
ftsmode (BM25 only, no embeddings) also exists as a validmodevalue, but it's a niche escape hatch and not recommended: keyword-only matching tends to surface coincidental, irrelevant rows that pollute the agent's context rather than help it. Semantic recall lives invector/hybrid, which is what these docs and the benchmark cover.
- Build a
WHEREprefilter on workspace + user + kind + category. - Run the retriever for the chosen mode:
vector— ANN overtext-embedding-3-smallembeddings (score:_distance).hybrid— run a vector leg and a BM25 full-text leg, then fuse (score:_relevance_score).
- For
hybrid, fuse byreranker.type:rrf— Reciprocal Rank Fusion (rank-based, equal-weight legs).linear— weighted vector + FTS scores;reranker.weightis the vector weight (0–1).cross-encoder— rerank an oversampled pool (rerank_top_n) with a sentence-transformers model, then slice totop_k(cached, warmed atinitialize()).
- Return the top
top_krows.
Two details: vector projects its score column, but hybrid fetches unprojected and drops the vector column in Python (naming _relevance_score in select() errors — it pushes down to the FTS leg). And if hybrid fails (e.g. the full-text leg's index isn't ready), recall logs a warning and falls back to pure vector.
You don't have to configure anything — once the provider is activated (hermes memory setup, which sets memory.provider: lancedb), the plugin runs on its shipped defaults from default_config.yaml. ~/.hermes/config.yaml is purely for overrides: keys you set there win, keys you omit fall back to the defaults. To customize, copy the blocks from default_config.yaml into your ~/.hermes/config.yaml and edit only what you want to change.
By default embeddings call the OpenAI API (OPENAI_API_KEY required); everything else is local. Don't edit default_config.yaml to change your own setup — a plugin update overwrites it; edit ~/.hermes/config.yaml.
To use a different embeddings backend, point the OpenAI-compatible client at any endpoint that speaks the same shape (no code change needed). For example, a hosted non-OpenAI model via OpenRouter:
# ~/.hermes/config.yaml
plugins:
lancedb:
embedding:
model: google/gemini-embedding-001
base_url: https://openrouter.ai/api/v1
api_key_env: OPENROUTER_API_KEY…or fully local embeddings via Ollama:
# ~/.hermes/config.yaml
plugins:
lancedb:
embedding:
model: nomic-embed-text
base_url: http://localhost:11434/v1
api_key_env: OLLAMA_API_KEY # any value works for local OllamaChanging the embedding model (or its dimension) against an existing store requires recreating the table — the plugin fails loudly on a dim mismatch rather than silently returning nothing.
| Section | Key | Default | Notes |
|---|---|---|---|
retrieval |
mode |
vector |
vector (default) or hybrid. Per-call override via the mode parameter on lancedb_recall. (A lexical-only fts value also works but is a niche, unrecommended escape hatch.) |
top_k |
10 |
Hard cap inside the retrieval layer is 50. | |
search_kinds |
[fact] |
Recall surfaces facts; turn rows are stored as provenance and used as fallback when no facts match. | |
retrieval.reranker |
type |
rrf |
Hybrid fusion: rrf | linear | cross-encoder. No-op for mode: vector / mode: fts (one ranked list). |
weight |
0.7 |
linear only: vector weight (0–1) for the weighted vector/FTS combination; higher leans on vector. |
|
model |
cross-encoder/ettin-reranker-17m-v1 |
cross-encoder only. Any HuggingFace cross-encoder ID; lazy-loaded on first use. |
|
rerank_top_n |
50 |
cross-encoder only. Enforced as max(rerank_top_n, top_k) so you never fetch fewer than you return. |
|
extraction |
enabled |
true |
Set false to skip the auxiliary LLM call. |
min_turns |
3 |
Skip extraction when the user has spoken fewer than N turns. | |
embedding |
provider |
openai |
Label; currently always selects the OpenAI-compatible client. The actual endpoint is controlled by base_url / api_key_env below. |
model |
text-embedding-3-small |
1536-dim for the default. Embedding dim must match the existing table: recreate the table if you change models (or dim) against an existing store — the plugin now fails loudly on a mismatch instead of silently returning nothing. | |
base_url |
null |
null = OpenAI's default endpoint. Set to any OpenAI-compatible embeddings endpoint — OpenRouter (https://openrouter.ai/api/v1), Nous, Together, vLLM, Ollama / LM Studio in OpenAI-compatible mode (e.g. http://localhost:11434/v1), or a self-hosted server. |
|
api_key_env |
OPENAI_API_KEY |
Name of the environment variable holding the API key. Point it at a different var to keep your embedding key separate from OPENAI_API_KEY. |
|
dimensions |
null |
Optional output dimensions for matryoshka models (text-embedding-3-*). null = the model's native dimension. |
|
max_batch |
100 |
Max inputs per embeddings request. Providers cap this differently (Gemini 100, Cohere 96, OpenAI up to 2048); the default is the safe common denominator. Lower it for a stricter provider, raise it to cut request count on OpenAI. | |
maintenance |
enabled |
true |
Set false to disable auto-compaction. |
optimize_every_commits |
50 |
Each add / delete advances table.version; auto-compaction fires when delta ≥ this value. |
|
cleanup_older_than_days |
7 |
Passed as timedelta(days=...) to table.optimize(). Set 0 or negative to skip cleanup (compaction only). |
extraction uses Hermes's auxiliary client. Point it at a cheaper model independent of your main chat model:
auxiliary:
lancedb_extraction:
provider: openrouter
model: google/gemini-3-flashHermes handles provider routing, fallback, and credit exhaustion.
| Path | Contents |
|---|---|
~/.hermes/lancedb/memories.lance/ |
LanceDB dataset directory (fragments, manifest, indexes). |
~/.hermes/lancedb/.last_optimize_version |
Sentinel file: table.version at the most recent successful optimize(). Used to decide when the next auto-compaction fires. |
~/.cache/huggingface/ |
Cross-encoder reranker model cache (managed by HuggingFace). Only present if reranker.type: cross-encoder is enabled — embeddings use the OpenAI API and cache nothing locally. |
The dataset is a single table named memories containing both fact and turn rows; the kind column distinguishes them. To poke at it directly:
uv run --project ~/.hermes/hermes-agent python -c "
import lancedb
db = lancedb.connect('~/.hermes/lancedb')
df = db.open_table('memories').to_pandas()
print(df[['kind', 'category', 'content']].head())
"Every add / delete on the table is a Lance commit. Without intervention, single-row writes (which dominate agent workloads) accumulate tiny fragments and version files indefinitely.
The plugin tracks table.version against the sentinel file at ~/.hermes/lancedb/.last_optimize_version and runs table.optimize(cleanup_older_than=timedelta(days=N)) in a daemon thread when the delta crosses optimize_every_commits. A non-blocking lock guarantees only one optimize runs at a time: re-triggers while one is in flight are skipped, and writers are never blocked.
If maintenance.enabled: false, none of this runs and the dataset will grow without bound.
hermes plugins list doesn't show lancedb. Check the symlink: ls -l ~/.hermes/plugins/lancedb should resolve to this repo (or wherever you installed it).
lancedb_* tools missing, or the agent only writes built-in memory. The provider isn't active. Run hermes memory status — you want Provider: lancedb with available ✓. If it's blank, the provider was never switched on: run hermes memory setup and pick lancedb (this sets memory.provider: lancedb in ~/.hermes/config.yaml). Confirm agent.log contains lancedb provider initialized on session start. Using a profile? Add -p <name> to these commands.
Recall fails with an auth error. Embeddings call the OpenAI API — make sure OPENAI_API_KEY is set in the environment (or ~/.hermes/.env). With reranker.type: cross-encoder, the sentence-transformers reranker model is downloaded to ~/.cache/huggingface/ on first use and preloaded during initialize() so the first user query doesn't pay the model-load cost.
Table fragments / .lance directory growing. Check maintenance.enabled: true and that ~/.hermes/lancedb/.last_optimize_version is advancing across sessions. agent.log will show lancedb optimize starting when a compaction fires.
Changed embedding.model and recall returns nothing. The new model's dim doesn't match the existing column. Delete ~/.hermes/lancedb/memories.lance/ to recreate the table on the next session.
Apache 2.0