perf: trigram index v2 — integer postings, delta compression, query planner

## Summary

Redesign the trigram index for better build time, memory, and query planning. Current implementation uses string-heavy HashMaps with no compression or query optimization.

## Tier 1 — Highest Leverage

### 1. Integer-only postings
Replace `HashMap(Trigram, StringHashMap(PostingMask))` with `HashMap(Trigram, SortedDocIdList)`. Store doc IDs (u32) instead of string paths. Path lookup via separate `doc_id -> path` table.
- Better cache locality
- Smaller postings
- Faster intersections
- Less allocator churn

### 2. Delta/block-compressed postings
Store sorted doc IDs as delta-encoded blocks instead of raw arrays.
- `[381, 994, 2001] → [381, +613, +1007]` with varint encoding
- Block-based so you can skip/decode small chunks
- Reduces memory bandwidth (often the real bottleneck)

### 3. Query planner — smallest postings first
Current `candidates()` intersects trigrams in arbitrary order. Instead:
- Estimate posting size for each trigram
- Start from rarest trigram
- Progressively intersect
- Abort early when candidate set is small enough

### 4. Separate path/content/symbol indexes
Path queries and content queries behave differently. Keep separate trigram indexes for:
- File path trigrams (short, structured)
- File content trigrams (long, noisy)
- Symbol/name index (exact token lookup)

### 5. Incremental segment-based indexing
Instead of full rebuild:
- Track file content hash
- Only re-tokenize changed files
- Maintain segment-based index (LSM-tree style)
- Background compaction merges segments

## Tier 2

- Hybrid sparse list + Roaring bitmap for dense trigrams
- Thread-local build then merge (avoid global map contention)
- Mmap immutable segments (fast startup, OS page cache)
- Arena allocators / contiguous memory pools

## Tier 3

- SIMD intersection/decode
- Coarse shard/block-level indexes
- Bloom/sketch prefilters per file block

## Mental model

`speed = fewer candidates x cheaper intersections x less memory movement`

Not: smarter compression papers. Don't SIMD your way out of a bad data structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: trigram index v2 — integer postings, delta compression, query planner #142

Summary

Tier 1 — Highest Leverage

1. Integer-only postings

2. Delta/block-compressed postings

3. Query planner — smallest postings first

4. Separate path/content/symbol indexes

5. Incremental segment-based indexing

Tier 2

Tier 3

Mental model

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

perf: trigram index v2 — integer postings, delta compression, query planner #142

Description

Summary

Tier 1 — Highest Leverage

1. Integer-only postings

2. Delta/block-compressed postings

3. Query planner — smallest postings first

4. Separate path/content/symbol indexes

5. Incremental segment-based indexing

Tier 2

Tier 3

Mental model

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions