feat: 检索正确性修复 + 情境化 SupportInfo — 适配 master 架构 by 398618101 · Pull Request #161 · CortexReach/memory-lancedb-pro

398618101 · 2026-03-11T04:32:25Z

概述

本 PR 整合了原先针对 main 分支的两项改进（#80 检索正确性、#160 情境化 SupportInfo），重写适配到 master 分支的 OpenViking 三层架构（L0/L1/L2 + tier + decay），不修改 OpenViking 的核心结构。

注：#80 已合入 main，#160 已关闭。由于 master 与 main 架构差异较大，本 PR 是基于 master 从零重写，非 cherry-pick。

一、检索正确性修复

在不改变 memory schema 和 Smart Memory lifecycle 的前提下，修复向量搜索正确性并补充 FTS 诊断能力。

1. 修复向量搜索正确性 (`src/store.ts`)

vectorSearch() 显式设置 .distanceType('cosine')
修复关键问题：LanceDB 默认使用 L2 距离，高维 embedding 下 score ≈ 0.0005，有效记忆被 minScore 阈值错误过滤

2. FTS 诊断 (`src/store.ts`)

新增 lastFtsError、getFtsStatus()、rebuildFtsIndex()
使 BM25/FTS 全文索引的健康状态可观测、可恢复

3. CLI 诊断 (`cli.ts`)

CLI 检索标记 source: "cli"
新增 reindex-fts 命令

4. 遥测来源 (`src/retriever.ts`)

source 类型扩展 "cli"

回归测试

新增 test/vector-search-cosine.test.mjs（4 个用例全部通过）

二、情境化 SupportInfo

将 SupportInfo 从全局标量升级为按上下文切片（ContextualSupport[]）记录偏好证据，实现情境化记忆。LLM 在 dedup 阶段输出 context_label，通过写入链路传递到 updateSupportStats，使系统能够区分"晚上偏茶"和"总体喜欢拿铁"这类情境化偏好。

Schema 升级 (`src/smart-metadata.ts`)

SupportInfo → {global_strength, total_observations, slices: ContextualSupport[]}
预定义 SUPPORT_CONTEXT_VOCABULARY（11 个标签）
normalizeContext() 归一化（中英文映射，如 "晚上"→"evening"）
MAX_SUPPORT_SLICES = 8 防膨胀
parseSupportInfo V1→V2 向后兼容（自动迁移为 general 切片）
修复 LegacyStoreCategory 缺少 reflection 的类型错误

写侧 Runtime 传递

src/extraction-prompts.ts：dedup prompt 输出 context_label + 词表规则
src/memory-categories.ts：DedupResult 增加 contextLabel 字段；DedupDecision 扩展 support/contextualize/contradict
src/smart-extractor.ts：processCandidate 新增 3 个 case 分支；新增 handleSupport/handleContextualize/handleContradict handler

6 种 Dedup 决策

决策	触发场景	系统行为
`create`	全新信息	直接存储
`merge`	补充已有记忆	合并内容
`skip`	完全重复	跳过
`support`	再次确认已有偏好（如"还是喜欢茶"）	更新 `support_info` 统计（按上下文分片），不创建新条目
`contextualize`	场景化差异（如"晚上改喝茶"）	创建关联条目，标记 `context_label`
`contradict`	矛盾更新（如"不再跑步了"）	记录矛盾证据 + 创建新条目

回归测试

新增 test/smart-metadata-v2.mjs（6 个用例全部通过）

三、与 OpenViking 架构的协作关系

当前 master 架构概览

master 分支采用 OpenViking 三层架构，核心模块包括：

模块	职责
L0/L1/L2 三层 metadata	L0=abstract 摘要、L1=overview 总览、L2=content 完整内容
TierManager	管理记忆层级（core/working/peripheral），按重要性和访问频率动态调整
DecayEngine	时间衰减引擎，降低长期未访问记忆的优先级
AccessTracker	记录记忆的访问模式和频率
SmartExtractor	LLM 驱动的 6 类记忆提取 + 3 决策 dedup
NoisePrototypeBank	基于 embedding 的噪声检测，过滤无意义输入
ReflectionStore	反思记忆的存储和生成

本 PR 与各模块的互补关系

┌─────────────────────────────────────────────┐
│   auto-capture (agent_end hook)             │
│   ↓                                         │
│   SmartExtractor.extractAndPersist          │
│   ↓                                         │
│   processCandidate                          │
│   ├── create  → storeCandidate (不变)        │
│   ├── merge   → handleMerge    (不变)        │
│   ├── skip    → (不变)                       │
│   ├── support     → 【本PR新增】              │
│   ├── contextualize → 【本PR新增】            │
│   └── contradict  → 【本PR新增】              │
│                                             │
│   写入 → MemoryStore.store / .update        │
│   ├── L0/L1/L2 metadata (不变)               │
│   ├── tier assignment  (不变)                │
│   ├── decay score      (不变)                │
│   └── support_info     【本PR新增可选字段】     │
│                                             │
│   检索 → vectorSearch (cosine修复) + FTS      │
│   ↓                                         │
│   AccessTracker.track (不变)                 │
│   TierManager.maybePromote (不变)             │
│   DecayEngine.applyDecay (不变)              │
└─────────────────────────────────────────────┘

具体互补点

本 PR 新增	与 OpenViking 的协作方式
cosine distance 修复	直接修复 `MemoryStore.vectorSearch`，DecayEngine 和 TierManager 依赖的 `vectorSearch` 同时受益，检索准确度提升
support 决策	复用 SmartExtractor 的 dedup 流程，调用已有的 `store.update()` 更新 metadata，不创建新条目，不触发 TierManager 晋升
contextualize 决策	通过 `storeCandidate` 创建新条目（与 create 相同路径），新条目自动获得 L0/L1/L2 metadata 和 tier 分配，DecayEngine 正常管理
contradict 决策	同时调用 `store.update()`（更新原记忆的矛盾证据）和 `storeCandidate`（创建新条目），两步均走已有路径
SupportInfoV2	作为 `SmartMemoryMetadata` 的可选字段，存储在 `[key: string]: unknown` 索引签名中，与 L0/L1/L2 字段并列但互不干涉
FTS 诊断	独立于 OpenViking 生命周期，不影响 tier/decay/reflection 流程

不涉及 / 不修改的模块

❌ decay-engine.ts — 未修改，衰减逻辑不变
❌ tier-manager.ts — 未修改，层级晋升/降级不变
❌ access-tracker.ts — 未修改，访问追踪不变
❌ reflection-store.ts — 未修改，反思记忆生成不变
❌ noise-prototypes.ts — 未修改，噪声检测不变
❌ index.ts — 未修改，auto-capture/auto-recall 流程不变

四、兼容性评估

✅ 完全兼容

项目	说明
向后兼容	旧记忆无 `support_info` 字段 → `parseSupportInfo` 返回默认值（strength=0.5），正常读写
V1 自动迁移	旧格式 `{confirmations, contradictions}` → 自动转为 V2 `{slices: [{context:"general",...}]}`
类型安全	修复 `LegacyStoreCategory` 缺少 `reflection` 的 TS 错误（master 预存类型不一致）
store 接口	`store()` / `update()` 接口签名不变，新字段通过 metadata JSON 透传
检索接口	`vectorSearch()` / `hybridSearch()` 返回类型不变，cosine 修复只影响内部距离计算

⚠️ 需注意的行为变化

项目	影响	风险
cosine 修复后 score 变化	修复后 score 从 ≈0.0005 升至正常范围（0.7~0.95），之前被过滤的记忆现在能被检索到	正向变化，用户体验改善
dedup 决策扩展	LLM 可能返回 `support`/`contextualize`/`contradict`，这些是 VALID_DECISIONS 扩展	兼容：未识别的决策 fallback 到 `create`（现有保护机制）
extraction prompt 变化	dedup prompt 文本变长（增加了 3 种决策说明）	可能影响 LLM token 消耗，但增量很小（约 200 tokens）

❌ 不存在的不兼容

不修改数据库 schema（LanceDB 表结构不变）
不修改 API 接口（plugin register / tool 注册不变）
不修改配置格式（pluginConfig 结构不变）
不影响现有的 create/merge/skip 行为

设计决策

不混入检索分数：support 统计仅用于元数据展示，不影响 retriever 排序
P1 范围只做 preferences：patterns/cases 后续按验证结果决定是否扩展
JSON 仍为过渡载体：上限压缩作为安全网，后续按触发条件评估迁移

Out of Scope

lifecycle/tier/upgrader 变更
schema 迁移
遥测持久化
检索分数调权

统计

9 files changed, 704 insertions(+), 47 deletions(-)

…urce typing - store.ts: add .distanceType('cosine') to vectorSearch (critical: L2 default drops valid results) - store.ts: add getFtsStatus(), rebuildFtsIndex() for BM25 health diagnostics - retriever.ts: extend source typing with 'cli' for CLI trace distinction - cli.ts: mark CLI retrievals with source='cli', add reindex-fts command - test: add vector-search-cosine.test.mjs (4 tests)

…ce tracking Extends OpenViking's smart memory architecture with context-aware support: - smart-metadata.ts: add SupportInfoV2/ContextualSupport types, normalizeContext, parseSupportInfo (V1→V2 migration), updateSupportStats; fix LegacyStoreCategory missing 'reflection' - memory-categories.ts: extend DedupDecision with support/contextualize/contradict, add contextLabel to DedupResult, supported count to ExtractionStats - extraction-prompts.ts: extend dedup prompt with 3 new decisions + context_label - smart-extractor.ts: add handleSupport/handleContextualize/handleContradict handlers in processCandidate pipeline, extract contextLabel in llmDedupDecision - test: add smart-metadata-v2.mjs (6 tests, all passing)

- smart-extractor.ts: handleMerge now accepts contextLabel and updates support stats after successful merge (aligns with support/contextualize/ contradict handlers) - smart-metadata.ts: stringifySmartMetadata caps arrays to prevent JSON bloat (sources≤20, history≤50, relations≤16) - test/context-support-e2e.mjs: 3 E2E scenarios testing support, contextualize, and contradict decisions end-to-end

AliceLJY

Review 结论：fix-then-merge

这版比 #160 收敛很多，基线改对了（master），9 文件 +704/-47，不动 index.ts 不动 config schema，零兼容性风险。cosine fix 是真实 bug 修复，contextual support 方向也对。

阻塞项

1. 新增测试没有纳入默认回归链
test/smart-metadata-v2.mjs 和 test/vector-search-cosine.test.mjs 没有挂进 package.json:38 的 npm test。如果这两个测试是这次修复的核心证据，它们应该进入默认测试链，否则 CI 不能持续保护这次改动。

2. smart-metadata-v2.mjs 没有真正测试生产代码
test/smart-metadata-v2.mjs:12 直接在测试文件里 inline 重写了逻辑，而不是 import src/smart-metadata.ts。这类测试更像文档/示意，不是有效回归测试。

3. vector-search-cosine.test.mjs 也没有真正绑定生产实现
test/vector-search-cosine.test.mjs:58 用的是自造 fakeStore.vectorSearch()，没有直接约束 src/store.ts:418 里的真实实现路径。建议至少 import 真实 MemoryStore 或围绕真实对象做更接近生产代码的测试。

建议修复项

4. context taxonomy 里"下午"映射成 evening 不够准确
见 src/smart-metadata.ts:280。词表缺 afternoon，建议补上或明确说明归并理由。

5. parseSupportInfo 对 slice 数值字段缺少校验
见 src/smart-metadata.ts:308。目前只过滤了 context 是否是字符串，没有校验 confirmations、contradictions、strength、last_observed_at 的类型和范围。

6. slice 截断后总证据量可能长期漂移
见 src/smart-metadata.ts:364。现在只把"本次被截掉的 slice"计入汇总，但更早历史上被丢掉的证据没有持久化保存，后续更新时 total_observations/global_strength 可能逐步失真。

7. dropIndex() 静默吞错
见 src/store.ts:975。建议至少记录 warning，把 index 名和错误信息带出来。

补充说明

当前分支 npm test 会红在 test/smart-extractor-branches.mjs，这不是 #161 新增问题，是 master 上已有的 LLM 依赖型 flaky test。

Blocking: 1. Add smart-metadata-v2, vector-search-cosine, context-support-e2e to npm test chain (package.json) 2. Rewrite smart-metadata-v2.mjs to import production code via jiti (normalizeContext, parseSupportInfo, updateSupportStats, etc.) 3. Rewrite vector-search-cosine.test.mjs to use real MemoryStore against temp LanceDB (no more fakeStore) Suggestions: 4. Fix '下午' mapping: evening → afternoon (add to vocabulary) 5. parseSupportInfo: validate slice numeric fields (confirmations, contradictions, strength, last_observed_at) 6. Document slice truncation drift as accepted trade-off 7. dropIndex: log warning instead of silently swallowing errors

398618101 · 2026-03-11T08:24:03Z

已调整

AliceLJY

验证通过。修复 commit da12b2c 逐条对应了 review 的 7 条反馈：

3 个阻塞项全部实质性修复（测试接入 npm test、import 生产代码、真实 MemoryStore 替代 fakeStore）
4 个建议项全部落实（下午→afternoon、slice 字段校验、漂移注释、dropIndex warn）
新增 14 个用例全部通过

npm test 唯一失败的 smart-extractor-branches.mjs:291 是 master 已有的 LLM flaky test，不是本 PR 问题。

lpf added 3 commits March 11, 2026 11:10

AliceLJY reviewed Mar 11, 2026

View reviewed changes

AliceLJY approved these changes Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: 检索正确性修复 + 情境化 SupportInfo — 适配 master 架构#161

feat: 检索正确性修复 + 情境化 SupportInfo — 适配 master 架构#161
398618101 wants to merge 4 commits intoCortexReach:masterfrom
398618101:feat/master-retrieval-and-context

398618101 commented Mar 11, 2026

Uh oh!

AliceLJY left a comment

Uh oh!

398618101 commented Mar 11, 2026

Uh oh!

AliceLJY left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

398618101 commented Mar 11, 2026

概述

一、检索正确性修复

1. 修复向量搜索正确性 (src/store.ts)

2. FTS 诊断 (src/store.ts)

3. CLI 诊断 (cli.ts)

4. 遥测来源 (src/retriever.ts)

回归测试

二、情境化 SupportInfo

Schema 升级 (src/smart-metadata.ts)

写侧 Runtime 传递

6 种 Dedup 决策

回归测试

三、与 OpenViking 架构的协作关系

当前 master 架构概览

本 PR 与各模块的互补关系

具体互补点

不涉及 / 不修改的模块

四、兼容性评估

✅ 完全兼容

⚠️ 需注意的行为变化

❌ 不存在的不兼容

设计决策

Out of Scope

统计

Uh oh!

AliceLJY left a comment

Choose a reason for hiding this comment

Review 结论：fix-then-merge

阻塞项

建议修复项

补充说明

Uh oh!

398618101 commented Mar 11, 2026

Uh oh!

AliceLJY left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. 修复向量搜索正确性 (`src/store.ts`)

2. FTS 诊断 (`src/store.ts`)

3. CLI 诊断 (`cli.ts`)

4. 遥测来源 (`src/retriever.ts`)

Schema 升级 (`src/smart-metadata.ts`)