Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions apps/docs/docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@
"self-hosting/overview",
"self-hosting/quickstart",
"self-hosting/configuration",
"self-hosting/embedding-models",
"self-hosting/local-vs-enterprise"
]
},
Expand Down
6 changes: 6 additions & 0 deletions apps/docs/self-hosting/configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,12 @@ OPENAI_MODEL=gpt-oss:20b

Nothing to configure. Uploaded files (PDFs, images) are stored on local disk inside `$SUPERMEMORY_DATA_DIR` and served by the server at `/files/:key`.

## Local embedding model

The self-hosted server computes dense embeddings locally. The current release does not expose a supported embedding-model selector; changing your LLM provider settings changes summaries, extraction, and chunking, but not the dense embedding model used for semantic search.

For multilingual or non-English deployments, read [Local Embeddings](/self-hosting/embedding-models) before large backfills. The variables below tune embedding throughput and memory behavior only.

## Embedding performance

Local embeddings are prewarmed at startup with conservative defaults — one worker, minimal CPU footprint. Turn these up if you're ingesting heavily and prefer throughput over headroom:
Expand Down
52 changes: 52 additions & 0 deletions apps/docs/self-hosting/embedding-models.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
title: "Local Embeddings"
sidebarTitle: "Local Embeddings"
description: "How self-hosted local embeddings affect multilingual search."
icon: "languages"
---

Self-hosted Supermemory computes dense embeddings locally. These embeddings are separate from the LLM provider you configure for summaries, extraction, and chunking.

Changing `OPENAI_MODEL`, `OPENAI_BASE_URL`, `ANTHROPIC_API_KEY`, or another LLM provider setting does **not** change the local embedding model used for semantic search.

<Warning>
Current self-hosted release binaries do not expose a supported embedding-model selector. The `SUPERMEMORY_LOCAL_EMBEDDING_*` variables documented on the configuration page tune worker performance only; they do not change the embedding model or vector dimensions.
</Warning>

## Why this matters

Semantic search compares the query embedding with stored document and memory embeddings. If the embedding model is not trained for the language in your content, ingestion can still finish successfully while dense recall returns weak, wrong, or empty results.

This is most visible for multilingual or non-English deployments: exact keyword matches may still work through the lexical side of hybrid search, but paraphrased natural-language queries can fail because the dense vector space is not reliable for that language.

## Production fix shape

A production-grade multilingual fix needs more than a model-name environment variable:

- A multilingual default embedding profile for fresh self-hosted installs.
- Durable profile metadata for model id, dimensions, pooling, normalization, token limits, and model-family text formatting.
- Model-scoped vector storage and search so embeddings from different models are never compared.
- A reindex command for switching profiles, because embeddings from different models are not comparable.
- Upgrade behavior that keeps existing stores searchable until an operator explicitly reindexes.
- Release validation across multiple language families and scripts, not only one or two reported languages.

The current recommendation for the default multilingual dense model is **BGE-M3** because it is designed for multilingual retrieval, supports more than 100 languages, supports long inputs, and uses 1024-dimensional dense vectors. That 1024-dimensional output is also why the database/index layer must change together with the model loader.

## What to do today

For English-only self-hosted deployments, the current local embedding path can still be suitable.

For production multilingual self-hosting, track [GitHub issue #1104](https://github.com/supermemoryai/supermemory/issues/1104) and avoid large backfills that you expect to re-embed later. If you already have a multilingual corpus, keep the canonical content unchanged and plan for a reindex once a release includes model-scoped multilingual embedding profiles.

Avoid these workarounds in production:

- Translating memories to English at write time, because it changes the canonical user data.
- Storing bilingual duplicates, because it increases storage and pollutes extraction/search.
- Replacing model cache files under a different model name, because it hides model identity and can be overwritten by release updates.
- Truncating model output dimensions to fit the old vector schema, because it treats a schema constraint as a retrieval-quality decision.

## Related configuration

Use [Embedding performance](/self-hosting/configuration#embedding-performance) to tune worker count, batch size, WASM threads, and prewarm behavior. These settings affect throughput and memory use only.

Use [Memory limits and ingestion queue](/self-hosting/configuration#memory-limits--ingestion-queue) to control how much additional memory background ingestion can consume.
4 changes: 4 additions & 0 deletions apps/docs/self-hosting/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,10 @@ curl http://localhost:6767/v3/search \

That's it. Everything in the [Memory API](/quickstart) — documents, memories, user profiles, spaces, filtering — works identically against your local server.

<Note>
Self-hosted embeddings are local and separate from your configured LLM provider. For multilingual or non-English production deployments, review [Local Embeddings](/self-hosting/embedding-models) before backfilling a large corpus.
</Note>

## Where things live

By default, all state lives in a single directory you can back up or move:
Expand Down
3 changes: 2 additions & 1 deletion packages/docs-test/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@
"test:integrations": "bun run run.ts integrations",
"test:quickstart": "bun run run.ts quickstart",
"test:sdk": "bun run run.ts sdk",
"test:search": "bun run run.ts search"
"test:search": "bun run run.ts search",
"test:self-hosting": "bun run run.ts self-hosting"
},
"dependencies": {
"@ai-sdk/anthropic": "^3.0.15",
Expand Down
17 changes: 14 additions & 3 deletions packages/docs-test/run.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env bun
import { spawn } from "child_process"
import path from "path"
import { spawn } from "node:child_process"
import path from "node:path"

const args = process.argv.slice(2)
const filter = args[0] // e.g., "typescript", "python", "integrations", or specific file
Expand Down Expand Up @@ -55,6 +55,15 @@ function getTests(): TestFile[] {
})
}

const selfHostingTests = ["embedding-models-docs"]
for (const t of selfHostingTests) {
tests.push({
name: `self-hosting/${t}`,
path: path.join(TESTS_DIR, "self-hosting", `${t}.ts`),
type: "ts",
})
}

return tests
}

Expand Down Expand Up @@ -95,7 +104,9 @@ async function main() {
if (tests.length === 0) {
console.log("No tests matched the filter:", filter)
console.log("\nAvailable tests:")
getTests().forEach((t) => console.log(` - ${t.name} (${t.type})`))
for (const t of getTests()) {
console.log(` - ${t.name} (${t.type})`)
}
process.exit(1)
}

Expand Down
65 changes: 65 additions & 0 deletions packages/docs-test/tests/self-hosting/embedding-models-docs.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
import { readFileSync } from "node:fs"
import path from "node:path"

const repoRoot = path.resolve(import.meta.dir, "../../../..")

function readRepoFile(relativePath: string) {
return readFileSync(path.join(repoRoot, relativePath), "utf8")
}

function assert(condition: unknown, message: string) {
if (!condition) {
throw new Error(message)
}
}

const docsJson = JSON.parse(readRepoFile("apps/docs/docs.json"))
const embeddingDocs = readRepoFile(
"apps/docs/self-hosting/embedding-models.mdx",
)
const configDocs = readRepoFile("apps/docs/self-hosting/configuration.mdx")
const quickstartDocs = readRepoFile("apps/docs/self-hosting/quickstart.mdx")

const selfHostingPages = docsJson.navigation.tabs[0].anchors[1].pages[1].pages

assert(
selfHostingPages.includes("self-hosting/embedding-models"),
"Self-hosting navigation should include the local embeddings page",
)

assert(
embeddingDocs.includes("GitHub issue #1104") &&
embeddingDocs.includes(
"https://github.com/supermemoryai/supermemory/issues/1104",
),
"Local embeddings docs should link to the upstream multilingual issue",
)

assert(
embeddingDocs.includes("BGE-M3") &&
embeddingDocs.includes("more than 100 languages") &&
embeddingDocs.includes("1024-dimensional"),
"Local embeddings docs should explain the recommended multilingual model and dimension constraint",
)

assert(
embeddingDocs.includes("do not expose a supported embedding-model selector"),
"Local embeddings docs should state that current releases do not support model selection",
)

assert(
configDocs.includes("[Local Embeddings](/self-hosting/embedding-models)"),
"Configuration docs should link to local embedding guidance",
)

assert(
quickstartDocs.includes("[Local Embeddings](/self-hosting/embedding-models)"),
"Quickstart should direct multilingual users to local embedding guidance",
)

assert(
!configDocs.includes("SUPERMEMORY_LOCAL_EMBEDDING_MODEL"),
"Configuration docs must not document an unsupported embedding model env var as live config",
)

console.log("Self-hosting local embedding docs checks passed")