-
Notifications
You must be signed in to change notification settings - Fork 247
Description
Problem
The current memory write pipeline is append-only across all three vector collections. This leads to two issues:
- Duplicate records in episodic_memory and event_log
When the same MemCell is processed more than once, both collections accumulate duplicate entries with the same source. There is no deduplication at any layer. This results in redundant storage and retrieval returning semantically duplicate results, which degrades ranking quality.
- Stale foresight records are never removed
ForesightRecord has a validity window (start_time / end_time). Once expired, records are filtered out at query time but never actually deleted from storage. Over time this accumulates dead data across MongoDB, Elasticsearch, and Milvus.
Proposed Fix
For episodic_memory and event_log: add a delete-before-insert step so that re-processing the same source always replaces the old records rather than appending.
For foresight: add a scheduled cleanup task that periodically removes expired records from all three stores.
Expected Outcome
Retrieval results no longer contain semantic duplicates from repeated memorization.
Foresight storage stays lean and only contains currently valid predictions.
Happy to submit a PR for this if the approach looks good to maintainers.