SentenceBoundaryDetector: unambiguous cache key to avoid wrong cached boundaries

**Location:** `src/lib/transcription/SentenceBoundaryDetector.ts` (around lines 423-432, 253-255)

**Finding:** `generateCacheKey` returns a 32-bit integer hash string. Different input texts can hash to the same value (collision), so `performNLP` may return wrong cached sentence boundaries for a different text.

**Suggested fix:** Make the cache key unambiguous. Options:
- Use the full input text as the key (simplest and safest for the default small cache size of 100).
- Or at minimum prefix the hash with text length, e.g. `len:${text.length}:${hash}` or use a "len:hash" compact key so cached entries map uniquely to their original text.

Update `generateCacheKey` accordingly and ensure `performNLP` and any cache lookup/add use the same key shape. If using raw text as key, ensure cache size limits (e.g. LRU) still work with potentially long keys.

**Verification:** `generateCacheKey` at 423-431 returns `hash.toString()` (32-bit); `performNLP` at 253-255 uses this key for `this.cache.get(cacheKey)` and cached results.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SentenceBoundaryDetector: unambiguous cache key to avoid wrong cached boundaries #151

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

SentenceBoundaryDetector: unambiguous cache key to avoid wrong cached boundaries #151

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions