Skip to content

Feature Request: Qdrant vector backend for MemOS Local Plugin #1617

@redashes1984

Description

@redashes1984

Summary

MemOS Local Plugin currently uses SQLite brute-force search for vector retrieval. For production workloads with 100K+ vectors, this becomes a bottleneck.

We have built a Qdrant HNSW backend as a drop-in replacement, plus Reranker-based re-ranking for higher-quality retrieval.

About this project: This is a human-AI collaborative fork. A human lead and an AI agent (Nova/星野) worked together to architect, implement, test, and maintain this integration. The AI agent handled code implementation, debugging, deployment, and documentation, while the human provided architectural direction, design decisions, and quality review.

What We Built

A complete Qdrant integration fork: redashes1984/memos-qdrant

Key Features

  1. Qdrant HNSW vector store — replaces brute-force cosine search with HNSW index
  2. Fire-and-forget upsert with async flush — vectors are upserted to Qdrant asynchronously, never blocking the MemOS pipeline; flush() at shutdown ensures no data loss
  3. Reranker integration — Qwen3-Reranker-0.6B post-processing for top-K re-ranking
  4. Embedding model — Qwen3-Embedding-0.6B (GPU-accelerated)
  5. Graceful fallback — if Qdrant is unreachable, falls back to SQLite brute-force automatically
  6. TCP bridge transport — line-delimited JSON-RPC over TCP for remote clients (e.g. Python providers)
  7. Three-tier hardware config — Level 0 (zero GPU), Level 1 (GPU embedding only), Level 2 (full Qdrant + Reranker)

Architecture

MemOS Pipeline → Traces/Policies/Skills/WorldModel Repos
  → SQLite (write path, unchanged)
  → Qdrant (fire-and-forget upsert via _track(), flush on shutdown)
  → Search: Qdrant HNSW top-K → Reranker re-rank → return

Changes Required in Upstream

To integrate this into MemOS core, the following would need to be added:

  • core/storage/qdrant.ts — QdrantStore class with HNSW upsert/search
  • core/pipeline/types.tsqdrant?: QdrantStore | null in PipelineDeps
  • core/pipeline/memory-core.ts — wire Qdrant store into bootstrap
  • core/pipeline/orchestrator.ts — call qdrant.flush() on drain
  • core/storage/repos/traces.ts — use qdrant._track() for async upsert
  • Same pattern for policies, skills, world_model repos

Hardware Tiers (from our config docs)

Level Embedding Reranker Vector Search GPU Required
0 CPU N/A SQLite brute-force No
1 GPU (local) N/A Qdrant HNSW Yes (4GB+)
2 GPU GPU Qdrant + Reranker Yes (8GB+)

Why This Matters

  • 100K+ vectors: SQLite brute-force degrades from ~50ms to several seconds
  • Qdrant HNSW: sub-10ms queries at any scale
  • Reranker: 20-40% improvement in retrieval quality on our benchmarks
  • Zero breaking changes: existing SQLite path is preserved as fallback

Interest

We are running this in production (Qdrant vector DB, Qwen3-Embedding-0.6B + Qwen3-Reranker-0.6B for embedding and re-ranking). Happy to contribute the code upstream as a PR if the maintainers are interested.

Would the MemTensor team be open to a Qdrant backend integration? We can split it into focused PRs:

  1. QdrantStore + pipeline wiring
  2. Reranker integration
  3. TCP bridge transport (optional)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions