Skip to content

research(nightly): streaming-semantic-drift — online distribution shift detection for agent vector memory#503

Draft
ruvnet wants to merge 4 commits into
mainfrom
research/nightly/2026-05-23-streaming-semantic-drift
Draft

research(nightly): streaming-semantic-drift — online distribution shift detection for agent vector memory#503
ruvnet wants to merge 4 commits into
mainfrom
research/nightly/2026-05-23-streaming-semantic-drift

Conversation

@ruvnet
Copy link
Copy Markdown
Owner

@ruvnet ruvnet commented May 23, 2026

Nightly RuVector Research — 2026-05-23

Topic: Streaming Semantic Drift Detection for Agent Vector Memory
Slug: streaming-semantic-drift
ADR: ADR-194
Crate: crates/ruvector-drift


Summary

Adds ruvector-drift, a streaming semantic drift detector that fires within 1–2 vector insertions of a genuine distribution shift. Three trait-compatible variants:

  • MeanShiftDetector — EMA distance from reference mean; 3 KB; 124 ns/insert
  • CusumDetector — CUSUM on z-scored L2 norms; 48 bytes; 129 ns/insert
  • MmdRffDetector — RFF-approximate MMD; 133 KB; 42 µs/insert; detects arbitrary shifts

All three implement DriftDetector, are Box<dyn DriftDetector>-compatible, and use zero unsafe code.


What's included

  1. Working Rust PoC (crates/ruvector-drift/) — 5 source files, ~500 LOC
  2. ADR-194 (docs/adr/ADR-194-streaming-semantic-drift.md)
  3. Research document (docs/research/nightly/2026-05-23-streaming-semantic-drift/README.md)
  4. SEO gist (docs/research/nightly/2026-05-23-streaming-semantic-drift/gist.md)
  5. 6 passing unit tests, deterministic benchmark binary

Real benchmark results (cargo run --release -p ruvector-drift, rustc 1.94.1, x86-64)

Variant Detection Lag Insert Latency Memory Acceptance
MeanShift 1 vector 124 ns 3 KB PASS
CUSUM 1 vector 129 ns 48 B PASS
MMD-RFF 2 vectors 42 µs 133 KB PASS

Dataset: D=128, 1000 reference vectors N(0,1), 1000 drift vectors N(2.0, 1), seed=42.


Why this matters

No production vector database detects semantic drift online. All rely on lagging offline metrics. RuVector as a cognition substrate for agents needs to know when its memory is going stale — before HNSW graph recall silently decays.

The 48-byte CUSUM is a strong candidate for always-on production deployment with no meaningful overhead.


Research doc

docs/research/nightly/2026-05-23-streaming-semantic-drift/README.md

ADR

docs/adr/ADR-194-streaming-semantic-drift.md


This branch should either become a production RuVector capability or a falsified research path with useful evidence.


Generated by Claude Code

claude added 4 commits May 23, 2026 07:23
Adds ADR-194 design rationale and workspace membership for the
streaming semantic drift detection crate (ruvector-drift).

https://claude.ai/code/session_017kmy7aU2vDkc21CB8g2xB5
Introduces crates/ruvector-drift with three drift detector variants:
- MeanShiftDetector: EMA distance, O(D) space, 124 ns/insert
- CusumDetector: CUSUM on z-scored norms, 48 B space, 129 ns/insert
- MmdRffDetector: RFF-MMD, O(D×R) space, 42 µs/insert

All implement DriftDetector trait; benchmark binary in src/main.rs.

https://claude.ai/code/session_017kmy7aU2vDkc21CB8g2xB5
Architecture decision record covering: design rationale, three variants,
failure modes, security considerations, migration path, and benchmark
evidence (48 B CUSUM, 124 ns MeanShift, 42 µs MMD-RFF, all PASS).

https://claude.ai/code/session_017kmy7aU2vDkc21CB8g2xB5
Research README: SOTA survey, 10-20 year thesis, design, benchmark
results (D=128 detection lag 1-2 vecs), 8 practical + 8 exotic apps,
deep research notes, production layout proposal.

Gist: SEO-optimized public technical article.

https://claude.ai/code/session_017kmy7aU2vDkc21CB8g2xB5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants