Skip to content

Memory write pipeline: add deduplication for episodic/event_log and expiry cleanup for foresight #95

@dugubuyan

Description

@dugubuyan

Problem

The current memory write pipeline is append-only across all three vector collections. This leads to two issues:

  1. Duplicate records in episodic_memory and event_log

When the same MemCell is processed more than once, both collections accumulate duplicate entries with the same source. There is no deduplication at any layer. This results in redundant storage and retrieval returning semantically duplicate results, which degrades ranking quality.

  1. Stale foresight records are never removed

ForesightRecord has a validity window (start_time / end_time). Once expired, records are filtered out at query time but never actually deleted from storage. Over time this accumulates dead data across MongoDB, Elasticsearch, and Milvus.

Proposed Fix

For episodic_memory and event_log: add a delete-before-insert step so that re-processing the same source always replaces the old records rather than appending.
For foresight: add a scheduled cleanup task that periodically removes expired records from all three stores.
Expected Outcome

Retrieval results no longer contain semantic duplicates from repeated memorization.
Foresight storage stays lean and only contains currently valid predictions.

Happy to submit a PR for this if the approach looks good to maintainers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions