From c035bea6c18e9cc56cb097f46678c46a4a910684 Mon Sep 17 00:00:00 2001
From: chetanunadkat <manvendra.tomar@seamless.se>
Date: Thu, 11 Jun 2026 12:14:05 +0530
Subject: [PATCH] docs(self-hosting): document local embedding memory behavior

Each local embedding worker's WASM linear memory only grows (up to ~4 GB)
and is reclaimed only when the pool goes fully idle. Add a 'Memory
considerations' note so self-hosters can budget peak memory
(~POOL_SIZE x 4 GB) and avoid OOM on small/continuously-ingesting hosts.

Refs #1093

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 apps/docs/self-hosting/configuration.mdx | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/apps/docs/self-hosting/configuration.mdx b/apps/docs/self-hosting/configuration.mdx
index bde7882f4..aa89af171 100644
--- a/apps/docs/self-hosting/configuration.mdx
+++ b/apps/docs/self-hosting/configuration.mdx
@@ -73,6 +73,15 @@ Local embeddings are prewarmed at startup with conservative defaults — one wor
 | `SUPERMEMORY_LOCAL_EMBEDDING_IDLE_TIMEOUT_MS` | Idle time before workers shut down | `120000` |
 | `SUPERMEMORY_SKIP_EMBEDDING_PREWARM` | Skip startup prewarm, load on first use | unset |
 
+### Memory considerations
+
+Each local embedding worker runs the model through a WebAssembly runtime whose linear memory **only grows** — it expands to fit the largest batch a worker has processed and is not returned to the OS until that worker shuts down. A single worker can grow up to ~4 GB. Practical implications:
+
+- **Peak memory scales with the pool.** Budget roughly `SUPERMEMORY_LOCAL_EMBEDDING_POOL_SIZE × up to ~4 GB` for embeddings under sustained load, on top of the rest of the server.
+- **Reclamation only happens when the pool goes fully idle** for `SUPERMEMORY_LOCAL_EMBEDDING_IDLE_TIMEOUT_MS`. On a host that ingests continuously the pool may never be fully idle, so worker memory stays at its high-water mark.
+
+On memory-constrained hosts (≤ 16 GB), keep `SUPERMEMORY_LOCAL_EMBEDDING_POOL_SIZE=1`, consider a shorter `SUPERMEMORY_LOCAL_EMBEDDING_IDLE_TIMEOUT_MS` so memory is released sooner between ingestion bursts, or point embeddings at a hosted provider instead of the local model.
+
 ## Telemetry
 
 The self-hosted binary sends no analytics — there is nothing to opt out of. The only related switch: