Location: src/lib/transcription/SentenceBoundaryDetector.ts (around lines 423-432, 253-255)
Finding: generateCacheKey returns a 32-bit integer hash string. Different input texts can hash to the same value (collision), so performNLP may return wrong cached sentence boundaries for a different text.
Suggested fix: Make the cache key unambiguous. Options:
- Use the full input text as the key (simplest and safest for the default small cache size of 100).
- Or at minimum prefix the hash with text length, e.g.
len:${text.length}:${hash} or use a "len:hash" compact key so cached entries map uniquely to their original text.
Update generateCacheKey accordingly and ensure performNLP and any cache lookup/add use the same key shape. If using raw text as key, ensure cache size limits (e.g. LRU) still work with potentially long keys.
Verification: generateCacheKey at 423-431 returns hash.toString() (32-bit); performNLP at 253-255 uses this key for this.cache.get(cacheKey) and cached results.
Location:
src/lib/transcription/SentenceBoundaryDetector.ts(around lines 423-432, 253-255)Finding:
generateCacheKeyreturns a 32-bit integer hash string. Different input texts can hash to the same value (collision), soperformNLPmay return wrong cached sentence boundaries for a different text.Suggested fix: Make the cache key unambiguous. Options:
len:${text.length}:${hash}or use a "len:hash" compact key so cached entries map uniquely to their original text.Update
generateCacheKeyaccordingly and ensureperformNLPand any cache lookup/add use the same key shape. If using raw text as key, ensure cache size limits (e.g. LRU) still work with potentially long keys.Verification:
generateCacheKeyat 423-431 returnshash.toString()(32-bit);performNLPat 253-255 uses this key forthis.cache.get(cacheKey)and cached results.