-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Opportunity
Deep search (userMessages and fullText tiers) performs sequential file I/O on up to 500 session files per request with no caching between requests. Typical latency ~2.5s. The indexer already reads 256KB per file during its normal scan but discards the message content.
Analysis
An Opus subagent validated this as a significant improvement. Recommended phased approach:
Phase 1 — Cache extracted message text during indexing: Extend updateCacheEntry() in the session indexer to extract and store all user/assistant message text alongside existing metadata. Store as Map<sessionKey, { userMessages: string[], allMessages: string[] }>. At search time, search this in-memory cache instead of re-reading files. Eliminates all file I/O from the search hot path.
Phase 2 (if needed) — Trigram inverted index: For sub-linear search at >10K sessions. The scaling design doc explicitly deferred FTS as a non-goal, so Phase 1 is the right starting point.
Key numbers
- Current: ~500 sequential file opens+reads per deep search request
- Expected: <50ms for a typical deep search with in-memory cache
- Memory cost: ~50-100MB for 10K sessions (acceptable)
- The two-phase client search pattern (title first, deep second) could be simplified since both would be fast
Risks identified
- Memory usage scales with session count and conversation length; cap indexed text per session (~50KB)
- JSONL compaction can break byte-offset tracking; detect size decrease and re-index
- Index and metadata cache must stay in sync; derive both from same source data
- Sessions beyond the 10K scaling cap would still require file I/O (by design)
Action
Investigate and validate this opportunity. Determine the right implementation approach following TDD.