Problem
Current metrics capture counts and totals, but not per-phase timing breakdowns. When indexing takes 21 hours, we can't identify which phase is the bottleneck.
Current Metrics: We Know WHAT, Not WHERE
════════════════════════════════════════
┌─────────────────────────────────────────────────────────────────┐
│ batchDuration = 5000ms │
│ (total only) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ [Dequeue] [ES Bulk Request] [Commit] │
│ SQLite Network + Infer SQLite │
│ │
│ ??? ??? ??? │
│ │
│ Is this Is inference slow? Is this │
│ slow? Network latency? slow? │
│ │
└─────────────────────────────────────────────────────────────────┘
We see: "batch took 5000ms"
We don't see: "4500ms was ES inference, 300ms was dequeue, 200ms was commit"
Why Not Use ES/Kibana Monitoring?
ES Stack Monitoring shows server-side metrics:
- Bulk indexing rate
- Shard-level latency
- Cluster health
But it cannot show client-side timing:
- SQLite queue operations (enqueue/dequeue/commit)
- File parsing time
- Network latency vs ES processing
- Time waiting for inference
We need client-side instrumentation to identify whether bottlenecks are in:
- Our code (parsing, SQLite)
- Network (latency to ES)
- ES processing (inference, indexing)
What Exists Today
// metrics.ts - Current metrics
ParserMetrics:
✓ filesProcessed (counter)
✓ filesFailed (counter)
✓ chunksCreated (counter)
✓ chunkSize (histogram)
✗ parseDuration // MISSING
✗ enqueueDuration // MISSING
QueueMetrics:
✓ documentsEnqueued (counter)
✓ documentsDequeued (counter)
✗ enqueueDuration // MISSING
✗ dequeueDuration // MISSING
IndexerMetrics:
✓ batchDuration (histogram) // Total only
✓ batchSize (histogram)
✗ esRequestDuration // MISSING - ES call breakdown
✗ commitDuration // MISSING
Proposed Solution
Add timing histograms for each phase:
Proposed: Per-Phase Timing
══════════════════════════
Producer Phase:
┌─────────────────────────────────────────────────────────────────┐
│ │
│ [Read File] → [Parse AST] → [Extract] → [Enqueue] │
│ │
│ fileReadDuration parseDuration extractDuration enqueueDuration
│ (histogram) (histogram) (histogram) (histogram) │
│ │
└─────────────────────────────────────────────────────────────────┘
Worker Phase:
┌─────────────────────────────────────────────────────────────────┐
│ │
│ [Dequeue] → [ES Bulk Request] → [Commit] │
│ │
│ dequeueDuration esRequestDuration commitDuration │
│ (histogram) (histogram) (histogram) │
│ │
│ │ │
│ ▼ │
│ (includes network + inference) │
│ │
└─────────────────────────────────────────────────────────────────┘
Implementation
New Metrics
// Producer timing
producer.fileReadDuration // Time to read file from disk
producer.parseDuration // Time for tree-sitter/parser
producer.extractDuration // Time for symbol extraction
producer.enqueueDuration // Time for SQLite INSERT
// Worker timing
worker.dequeueDuration // Time for SQLite SELECT + UPDATE
worker.esRequestDuration // Time for ES bulk API call
worker.commitDuration // Time for SQLite DELETE
Instrumentation Points
| File |
Add Timing For |
producer_worker.ts |
File read, parse, extract |
sqlite_queue.ts |
Enqueue, dequeue, commit operations |
indexer_worker.ts |
ES request (wrap indexCodeChunks call) |
elasticsearch.ts |
Optionally: internal bulk timing |
Example Output
After implementation, logs/metrics would show:
Bottleneck Analysis (P95 timings):
──────────────────────────────────
Phase P95 % of Total
─────────────────────────────────────────
File Read 5ms 0.1%
Parse 15ms 0.3%
Extract 10ms 0.2%
Enqueue 20ms 0.4%
Dequeue 25ms 0.5%
ES Request 4800ms 96.0% ◄── BOTTLENECK
Commit 25ms 0.5%
─────────────────────────────────────────
Recommendation: ES inference is the bottleneck.
Consider: Batch size tuning, ES scaling, or async inference.
Acceptance Criteria
Problem
Current metrics capture counts and totals, but not per-phase timing breakdowns. When indexing takes 21 hours, we can't identify which phase is the bottleneck.
Why Not Use ES/Kibana Monitoring?
ES Stack Monitoring shows server-side metrics:
But it cannot show client-side timing:
We need client-side instrumentation to identify whether bottlenecks are in:
What Exists Today
Proposed Solution
Add timing histograms for each phase:
Implementation
New Metrics
Instrumentation Points
producer_worker.tssqlite_queue.tsindexer_worker.tsindexCodeChunkscall)elasticsearch.tsExample Output
After implementation, logs/metrics would show:
Acceptance Criteria
fileReadDuration,parseDuration,extractDuration,enqueueDurationdequeueDuration,esRequestDuration,commitDurationbatchDurationpreserved for backward compatibility