Merged
Conversation
added 5 commits
May 3, 2026 14:31
- fts_tokenizer.py: jieba cut_for_search,索引/查询双入口,停用词+标点过滤,ASCII 词前缀通配 - fts_store.py: FTS5 虚表 + UUID 映射表,增删查 API,RRF 融合函数 - 添加 jieba>=0.42.1 与 pytest-benchmark>=4.0.0 依赖 - 35 个单测覆盖分词、索引、查询、删除、RRF 合并
…y (PR2) - fts_wrapped.py: 装饰任意 BaseMemory,写入 fire-and-forget, cross_session_search 用 RRF 融合 FTS5 与 inner backend 的两路召回 - factory.py: 根据 memory_fts5_enabled 配置自动包一层(默认关闭) - config.py: 新增 memory_fts5_enabled / rrf_k / candidate_multiplier - migrations/backfill_fts.py: 全表回填脚本,幂等 + 游标分页 - cli/commands/memory_cmd.py: 新增 `nimo memory reindex` 命令 - 9 个集成测试覆盖写入/检索/清空/factory/回填
- data_generator.py: 合成中英混合对话语料(固定 seed 可复现) - queries.py: 4 类查询负载 + 5 条靶点 + 10 条 probes - common.py: 建库/灌数据/批量建索引的共享辅助 - bench_index.py: 写入吞吐(no_fts / fts_sync / fts_async 三模式对比) - bench_query.py: 查询延迟 p50/p95/p99(LIKE vs FTS5) - bench_recall.py: recall@10(LIKE vs FTS5) - run_all.py: 一键跑全套 + 输出 Markdown 报告 - report.md: 10k 语料基线结果 — FTS5 召回 0.9 vs LIKE 0.4,长 query 快 3-4 倍 同时修复 tokenizer: FTS5 MATCH 的 '.' 也是列限定符, 扩展保留字符集避免 'qwen3.5' 这类 token 产生语法错误。
之前 _YAML_TO_SETTINGS 没有 memory.fts5_enabled 等字段的映射,
config.yaml 里写了 fts5_enabled: true 也读不到(默认 False)。
补齐:
- memory.fts5_enabled / fts5_rrf_k / fts5_candidate_multiplier
- memory.reme_light.{working_dir,llm_api_key,llm_base_url,
embedding_api_key,embedding_base_url,vector_weight,candidate_multiplier}
DEFAULT_CONFIG 同步加入 memory.fts5_* / memory.reme_light.* 默认值。
2 个回归测试。
问题: 1. ReMeLight 旧测试依赖已废弃的 _session_messages buffer 属性 2. test_factory 会被用户 ~/.nimo/config.yaml 里的 memory.fts5_enabled / memory.backend 污染 修复: 1. 重写 test_reme_light_adapter.py — 真实 db_session 走 SQLite 路径, mock _reme/_in_memory 绕开真实服务调用;移除对 buffer 回退的假设 2. test_factory.py 加 autouse fixture 强制关闭 fts5_enabled, test_create_none_uses_settings_default 也显式强制 hybrid, 让测试与全局 config 解耦 结果:tests/unit/test_memory/ + tests/unit/test_config/ 全部 201 绿
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
实现 Hermes Agent "self-improving" 闭环里的第一块 — 跨会话精准检索。在现有
BaseMemory之外加一层 SQLite FTS5 + jieba 中文分词的倒排索引,让 PA 在cross_session_search时能召回历史对话中的精确关键词(专有名词、配置参数、错误信息),弥补纯语义检索的短板。不动现有 API、不动前端,对调用方透明。
改动一览
feat(memory): add FTS5 + jieba tokenizer and FTSStore (PR1)feat(memory): wire FTS5 into cross_session_search via FTSWrappedMemory (PR2)bench: add FTS5 压测套件 + 基线报告 (PR3)fix(config): map nested memory.fts5_* and memory.reme_light.* yaml keystest(memory): 修 ReMeLight 12 个年久失修的测试 + factory 解耦全局 config设计要点
FTSWrappedMemory包裹任意BaseMemory,inner backend 不感知asyncio.create_task异步写 FTS,不阻塞主对话cut_for_search:粗+细粒度同时切分,召回与精度兼顾memory_fts5_enabled=False,升级时需用户主动开启 + 跑一次nimo memory reindex基线测试结果(10k 语料)
详见
backend/benchmarks/fts/report.md。测试
tests/unit/test_memory/test_fts_*— 44 个全绿tests/unit/test_memory/test_fts_wrapped.py— 9 个全绿tests/unit/test_memory/+tests/unit/test_config/共 201 全绿cd backend .venv/bin/pytest tests/unit/test_memory/ tests/unit/test_config/启用方式
不在本 PR 范围
风险与回滚
memory_fts5_enabled: false即可立即关闭memory_fts虚表 +memory_fts_map普通表,不动memory_records