Noise filter misses structural memory contamination at write time (System traces / raw blobs / fragments)

## Summary
Noise filtering in `memory-lancedb-pro` appears to be too narrow at write time: it can block greetings / denials / meta-questions, but still allows structurally noisy memories into LanceDB, such as:
- `System:` / compaction / model-switch traces
- long raw quoted conversation blobs
- concatenated fragment strings / malformed snippets
- low-value duplicate-ish user utterances that were not distilled into atomic memory entries

This makes the retrieval engine look “healthy” while the memory corpus itself slowly accumulates contaminants.

## Observed Symptoms
In real-world usage, recent memories included examples like:
- quoted raw user text stored nearly verbatim
- system traces such as compaction/model-switch remnants ending up in memory rows
- concatenated fragment-like entries such as mixed filename/text shards

These were not simple greeting/boilerplate cases, so the current `noise-filter.ts` patterns did not catch them.

## Why this matters
The plugin’s retrieval stack is strong (hybrid retrieval, rerank, decay, normalization, diversity), but corpus quality still depends on ingress quality.
If ingress filtering is too weak, the plugin can remain operational while recall quality degrades over time because the stored memories are not clean / atomic / semantically stable.

## Current likely gap
From reading the code/docs, the likely ingress points are:
1. `src/tools.ts` → `memory_store`
2. `index.ts` auto-capture path (`agent_end` hook) before final persistence

`src/noise-filter.ts` currently focuses on:
- greetings / boilerplate
- denials
- meta-questions

But it does **not** appear to explicitly reject:
- `System:`-prefixed traces or internal runtime artifacts
- compaction / session-management / model-switch messages
- overly long raw conversation blobs
- malformed concatenated fragments / accidental shards
- “not yet distilled” entries that should have been compressed into a short fact/decision/preference memory instead of being stored verbatim

## Suggestion
Consider adding a stricter **ingress hygiene layer** before persistence, applied both to manual tool-store and auto-capture:

### 1) Source/artifact rejection
Reject entries matching patterns like:
- `^System:`
- compaction / model switched / session reset / tool transcript artifacts
- known internal control markers / tags

### 2) Atomicity / length gate
Reject or require transformation when:
- text is over a configurable character threshold
- contains multi-sentence raw dialogue / quote blocks
- contains suspicious concatenation signatures / repeated quote wrapping / filename-shard blobs

### 3) Distillation gate
For auto-capture especially, require candidate memory items to resemble atomic memory forms, e.g.:
- preference
- fact (pitfall/cause/fix/prevention)
- decision principle
- entity

Instead of allowing near-verbatim conversation carryover.

### 4) Optional config flags
Example ideas:
- `store.rejectSystemArtifacts: true`
- `store.maxRawLength: 500`
- `store.requireAtomicMemoryShape: true`
- `store.rejectConversationBlob: true`
- `store.rejectMalformedFragments: true`

### 5) Post-write verification hook (optional)
An optional callback / validation stage that checks whether the stored item is likely retrievable and non-noisy before accepting it permanently.

## Key point
This seems less like a retrieval problem and more like an **ingress validation / corpus hygiene** problem.

The current noise filter is useful, but from field behavior it feels optimized for **surface-level low-quality chatter**, not **structural memory contamination**.

If helpful, I can also prepare a concrete patch proposal for:
- `src/tools.ts` write-time validation
- `index.ts` auto-capture final gate
- new `noise-filter.ts` heuristics for system artifacts / raw blob rejection


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Noise filter misses structural memory contamination at write time (System traces / raw blobs / fragments) #127

Summary

Observed Symptoms

Why this matters

Current likely gap

Suggestion

1) Source/artifact rejection

2) Atomicity / length gate

3) Distillation gate

4) Optional config flags

5) Post-write verification hook (optional)

Key point

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Noise filter misses structural memory contamination at write time (System traces / raw blobs / fragments) #127

Description

Summary

Observed Symptoms

Why this matters

Current likely gap

Suggestion

1) Source/artifact rejection

2) Atomicity / length gate

3) Distillation gate

4) Optional config flags

5) Post-write verification hook (optional)

Key point

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions