Notion integration for omadia — semantic search over a Notion workspace, packaged as a standalone, signable plugin ZIP.
Notion's REST API POST /v1/search matches page/database titles only —
there is no full-text or semantic search endpoint. To find content by
meaning, this plugin builds and maintains its own embedding index:
cron job ──> /v1/search (enumerate pages, paginated)
──> GET /v1/pages/{id}/markdown (one call per page)
──> chunk markdown ──> embed each chunk (kernel embeddingClient)
──> JSON vector store in plugin memory
notion_semantic_search ──> embed query ──> cosine-rank chunks locally
| Tool | What it does |
|---|---|
notion_semantic_search |
Embeds the query, cosine-ranks the indexed chunks, returns the top page sections. Primary content-discovery path. |
read_notion_page |
Fetches a page as Markdown (live), following Notion's truncated/unknown_block_ids for large pages. |
reindex_notion |
Incremental crawl: re-embeds only pages whose last_edited_time changed; drops pages removed from the workspace. |
- Create an internal integration at https://www.notion.so/my-integrations and copy its token.
- Share the pages/databases you want searchable with that integration in Notion — the API only sees shared content.
- Install this plugin, paste the token into Notion API Token.
- Trigger
reindex_notiononce for the initial crawl (or wait for the cron).
Requires @omadia/embeddings (provides the embeddingClient service) and
@omadia/memory. The capability resolver activates both before this plugin;
without embeddingClient the plugin refuses to start.
nvm use # Node 22
npm install
npm run typecheck # tsc gate (no emit)
npm run build # esbuild bundle → out/omadia-integration-notion-0.1.0.zipsrc/plugin.ts is the entry; esbuild bundles all local modules into
dist/plugin.js. Host-provided peers (@omadia/plugin-api, zod, express)
are kept external — zod in particular must NOT be bundled, or the host's
zod→tool-schema bridge breaks on a second zod instance.
@omadia/plugin-api is not on npm; types/omadia-plugin-api.d.ts provides
ambient stubs so the plugin compiles offline. The host injects the real
implementations at runtime.
Upload the resulting ZIP through the omadia admin UI's plugin store.
requires: ["embeddingClient@^1"]→ctx.services.get('embeddingClient')requires: ["memoryStore@^1"]→ctx.memorybacks the index. Resolves to EITHER@omadia/memory(filesystem) or@omadia/memory-postgres(Postgres) — it depends on the capability, not a specific provider id.permissions.network.outbound: ["api.notion.com"]→ctx.http.fetch- Pins
Notion-Version: 2026-03-11(page-markdown endpoint + data sources).
The index uses a VectorStore interface with two implementations, chosen at
activation by what the deployment provides:
- pgvector (preferred). When the shared
graphPoolservice is available (Postgres deployments — i.e.@omadia/knowledge-graph-neon+@omadia/memory-postgres), embeddings live in a nativevectorcolumn and ranking isORDER BY embedding <=> $queryin the database (exact KNN). Tables:notion_pages+notion_chunks+notion_index_meta, scoped byagent_id. No JSON-float bloat, no loading the index into JS, granular per-page upserts. For very large corpora pin the vector dimension and add an hnsw index — the column is dimension-free today for model flexibility. - Sharded memory (fallback). Filesystem/in-memory deployments (no
graphPool) usectx.memory: a smallpages.jsonmanifest + onepages/<pageId>.jsonshard per page; reindex writes only changed pages, search reads shards once into an in-memory cache. Here the embedding vector dominates file size (~11 KB/chunk as JSON) — fine for a few hundred pages.
- Notion rate-limits ~3 req/s; the plugin HTTP accessor caps 60 req/min
(the tighter bound). The client paces itself and honours
Retry-After;crawl_max_pagesbounds a single run, so very large workspaces fill in over several cron ticks. - Embeddings from different models are incomparable — the store tags itself with a model label and self-clears if it changes.
omadia ships a Privacy Shield (v4) that, by default (_privacy_mode: guarded),
interns every tool result server-side and hands the LLM only an identity-free
digest — raw tool output never reaches the model. That protection is
automatic; this plugin does not (and should not) call it itself.
The catch for a document-RAG plugin: the shield is built for row/dataset-shaped
results. The content this plugin returns — prose snippets from
notion_semantic_search and full pages from read_notion_page — is exactly the
"document-shaped" case the shield cannot usefully summarise. Under guarded,
the model would receive a digest instead of the page text, which defeats the
plugin's purpose.
To make the plugin answer from real content, the operator must set
_privacy_mode on this plugin (in the post-install config editor) to either:
bypass— all of this plugin's tool output reaches the LLM (a transparency entry is recorded in the run receipt for every call), orper_toolwith_privacy_bypass_scopes=notion_semantic_search, read_notion_page— bypass only the content-returning tools; everything else stays guarded.
This is a deliberate data-governance decision: Notion page content (including any PII it contains) will then be sent to the configured LLM. Choose the LLM provider accordingly (a local/EU-hosted model for sensitive workspaces).
Two paths the shield does not cover:
- Embeddings. Indexing sends raw page text to the configured
embeddingClient. With the default local@omadia/embeddings(Ollama) this stays in-tenant; a cloud embeddings provider means content leaves your box. - The local index. Chunks + embeddings are stored unencrypted in this plugin's memory scope (at rest, inside your deployment).
MIT — Copyright (c) 2026 byte5 GmbH