Memory write pipeline: add deduplication for episodic/event_log and expiry cleanup for foresight

Problem

The current memory write pipeline is append-only across all three vector collections. This leads to two issues:

1. Duplicate records in episodic_memory and event_log

When the same MemCell is processed more than once, both collections accumulate duplicate entries with the same source. There is no deduplication at any layer. This results in redundant storage and retrieval returning semantically duplicate results, which degrades ranking quality.

2. Stale foresight records are never removed

ForesightRecord has a validity window (start_time / end_time). Once expired, records are filtered out at query time but never actually deleted from storage. Over time this accumulates dead data across MongoDB, Elasticsearch, and Milvus.

Proposed Fix

For episodic_memory and event_log: add a delete-before-insert step so that re-processing the same source always replaces the old records rather than appending.
For foresight: add a scheduled cleanup task that periodically removes expired records from all three stores.
Expected Outcome

Retrieval results no longer contain semantic duplicates from repeated memorization.
Foresight storage stays lean and only contains currently valid predictions.

**Happy to submit a PR for this if the approach looks good to maintainers.**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory write pipeline: add deduplication for episodic/event_log and expiry cleanup for foresight #95

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory write pipeline: add deduplication for episodic/event_log and expiry cleanup for foresight #95

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions