fix: NoisePrototypeBank degeneracy guard — fix flaky smart-extractor-branches test#164
Open
fix: NoisePrototypeBank degeneracy guard — fix flaky smart-extractor-branches test#164
Conversation
…ive noise filtering with non-discriminative embeddings When the embedding model produces identical vectors for all inputs (e.g. deterministic mock embeddings in tests), every text matches every noise prototype with cosine similarity 1.0, causing the noise filter to reject all content. Smart extraction is skipped entirely, falling back to regex which also captures nothing. Add a self-diagnostic after init(): if the first two prototype vectors have cosine similarity > 0.98, the bank recognizes the embedding model is degenerate and disables itself. This fixes the flaky smart-extractor-branches.mjs test on master (all 9 scenarios now pass reliably).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
修复 master 上
smart-extractor-branches.mjs的 flaky test。当 embedding 模型对所有文本返回相同向量时(如测试用的 deterministic mock),NoisePrototypeBank 会把所有内容误判为噪声,导致智能提取被跳过。一、NoisePrototypeBank 退化检测
问题
NoisePrototypeBank.init()嵌入 14 条噪声原型,用的是同一个 embeddervoid text)isNoise()对一切输入返回 true(cos = 1.0 > 0.82)改动
src/noise-prototypes.ts:init()末尾加退化自检——比较前两个原型向量的余弦相似度,> 0.98 则禁用自身回归测试
test/smart-extractor-branches.mjs,9 个场景全部通过)npm test全量 10 个测试文件通过与现有架构的协作关系
架构图
不涉及 / 不修改的模块
index.ts— auto-capture 流程不变smart-extractor.ts—filterNoiseByEmbedding调用不变,仅 noiseBank 内部行为变化extraction-prompts.ts— prompt 模板不变兼容性评估
✅ 完全兼容
isNoise()/learn()接口和语义完全不变❌ 不存在的不兼容
设计决策
Out of Scope
createDeterministicEmbedding使其产生文本相关向量(会需要重写所有测试断言)统计