feat: ES bulk indexing optimizations (compression, refresh, replicas)

## Problem

Current ES bulk indexing doesn't use available optimizations. For large repos (21h+ indexing), these add up significantly.

```
Current: Unoptimized Bulk Indexing
══════════════════════════════════

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│  Bulk Request                                                   │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  500 docs × 5KB = 2.5MB uncompressed                    │   │
│  │                                                         │   │
│  │  ──────────────► Network ──────────────►               │   │
│  │     (2.5MB payload each request)                        │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Meanwhile, ES is also doing:                                   │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  • Auto-refresh every 1 second (I/O + CPU)              │   │
│  │  • Replicating to replica shards (network + I/O)        │   │
│  │  • Competing for resources with bulk indexing           │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

## Current State

```typescript
// elasticsearch.ts
const baseOptions: Partial<ClientOptions> = {
  requestTimeout: 90000,
  // compression: NOT SET (defaults to none)
};

const bulkOptions = {
  refresh: false,  // ✓ Good: don't wait for refresh
  operations,
  // But ES still auto-refreshes every 1 second in background
};
```

**Note:** `refresh: false` only means "don't wait synchronously." ES still refreshes automatically every second, consuming resources during bulk indexing.

## Proposed Optimizations

### 1. Enable Gzip Compression (Easy Win)

```
Before: 2.5MB per request
════════════════════════

Client ────── 2.5MB ──────► ES Server
         (uncompressed)


After: ~250KB per request
═════════════════════════

Client ────── 250KB ──────► ES Server
          (gzipped)
          
~10x smaller payloads = faster network transfer
```

**Implementation:**
```typescript
const client = new Client({
  node: elasticsearchConfig.endpoint,
  compression: 'gzip',  // ← One line change
});
```

### 2. Disable Auto-Refresh During Bulk Indexing

```
Current: Background refreshes compete with indexing
═══════════════════════════════════════════════════

Time ─────────────────────────────────────────────────────────►

Bulk 1 ──► Bulk 2 ──► Bulk 3 ──► Bulk 4 ──► Bulk 5
              │           │           │
              ▼           ▼           ▼
         [refresh]   [refresh]   [refresh]
          (1 sec)     (1 sec)     (1 sec)
              │           │           │
              └─────── CPU/IO work ───┘
                   competes with
                   bulk indexing


Proposed: Single refresh at end
════════════════════════════════

Time ─────────────────────────────────────────────────────────►

[Set refresh=-1] ──► Bulk 1 ──► Bulk 2 ──► ... ──► Done ──► [Refresh]
                                                                 │
                                                                 ▼
                                                        Single refresh
                                                        All data searchable
```

**Implementation:**
```typescript
// Before bulk indexing
await client.indices.putSettings({
  index: indexName,
  body: { 'index.refresh_interval': '-1' }
});

// ... bulk indexing ...

// After bulk indexing
await client.indices.putSettings({
  index: indexName,
  body: { 'index.refresh_interval': '1s' }
});
await client.indices.refresh({ index: indexName });
```

### 3. Disable Replicas During Bulk Indexing

```
Current: Every write replicated immediately
═══════════════════════════════════════════

Primary ──► Write ──► Replicate ──► Wait for ACK
                          │
                          ▼
                    Replica shard
                    (network + I/O)


Proposed: Replicate once at end
════════════════════════════════

[Set replicas=0] ──► Bulk writes (primary only) ──► [Set replicas=1]
                                                           │
                                                           ▼
                                                    ES replicates
                                                    everything at once
```

**Implementation:**
```typescript
// Before bulk indexing
const originalSettings = await client.indices.getSettings({ index: indexName });
await client.indices.putSettings({
  index: indexName,
  body: { 'index.number_of_replicas': 0 }
});

// ... bulk indexing ...

// After bulk indexing  
await client.indices.putSettings({
  index: indexName,
  body: { 'index.number_of_replicas': originalReplicas }
});
```

## Expected Impact

| Optimization | Effort | Impact |
|--------------|--------|--------|
| Gzip compression | One line | 5-10x smaller payloads |
| Disable refresh | ~20 lines | 2-5x faster bulk writes |
| Disable replicas | ~20 lines | ~2x faster (if replicas > 0) |

**Combined:** Potentially 2-10x faster ES indexing phase.

## Implementation Plan

### Phase 1: Compression (Low Risk)
- Add `compression: 'gzip'` to ES client options
- No lifecycle changes needed
- Can be toggled via env var if needed

### Phase 2: Index Settings Lifecycle (Medium Risk)
- Add `startBulkIndexing(indexName)` - saves settings, optimizes for bulk
- Add `finishBulkIndexing(indexName)` - restores settings, refreshes
- Wrap producer+worker in `index_command.ts`
- Handle errors (restore settings on failure)

### Code Changes

| File | Change |
|------|--------|
| `elasticsearch.ts` | Add `compression: 'gzip'` to client |
| `elasticsearch.ts` | Add `startBulkIndexing()`, `finishBulkIndexing()` |
| `index_command.ts` | Wrap indexing in lifecycle calls |
| `config.ts` | Add `ENABLE_BULK_OPTIMIZATIONS` flag (default: true) |

## Interaction with Other Issues

| Issue | Interaction |
|-------|-------------|
| #121 (filePaths aggregation) | If #121 changes from `bulk` to `update` API, verify settings optimizations still apply. Both APIs benefit from `refresh_interval=-1` and `replicas=0`. |
| #122 (parallel enqueue/dequeue) | Settings lifecycle must handle producer failure mid-way. Use try/finally to restore settings even on error. |
| #120 (auto-retry) | No conflict. Auto-retry happens after bulk indexing, settings already restored. |

## Acceptance Criteria

- [ ] Gzip compression enabled on ES client
- [ ] `refresh_interval` set to `-1` during bulk indexing
- [ ] `number_of_replicas` set to `0` during bulk indexing (configurable)
- [ ] Original settings restored after bulk indexing completes
- [ ] Settings restored even on error (try/finally)
- [ ] Final refresh ensures all data is searchable
- [ ] Can be disabled via environment variable
- [ ] Works correctly with `--clean` flag
- [ ] Works correctly with incremental indexing
- [ ] Works with both `bulk` and `update` APIs (future-proof for #121)

File	Change
`elasticsearch.ts`	Add `compression: 'gzip'` to client
`elasticsearch.ts`	Add `startBulkIndexing()`, `finishBulkIndexing()`
`index_command.ts`	Wrap indexing in lifecycle calls
`config.ts`	Add `ENABLE_BULK_OPTIMIZATIONS` flag (default: true)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ES bulk indexing optimizations (compression, refresh, replicas) #124

Problem

Current State

Proposed Optimizations

1. Enable Gzip Compression (Easy Win)

2. Disable Auto-Refresh During Bulk Indexing

3. Disable Replicas During Bulk Indexing

Expected Impact

Implementation Plan

Phase 1: Compression (Low Risk)

Phase 2: Index Settings Lifecycle (Medium Risk)

Code Changes

Interaction with Other Issues

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Optimization	Effort	Impact
Gzip compression	One line	5-10x smaller payloads
Disable refresh	~20 lines	2-5x faster bulk writes
Disable replicas	~20 lines	~2x faster (if replicas > 0)

Issue	Interaction
#121 (filePaths aggregation)	If #121 changes from `bulk` to `update` API, verify settings optimizations still apply. Both APIs benefit from `refresh_interval=-1` and `replicas=0`.
#122 (parallel enqueue/dequeue)	Settings lifecycle must handle producer failure mid-way. Use try/finally to restore settings even on error.
#120 (auto-retry)	No conflict. Auto-retry happens after bulk indexing, settings already restored.

feat: ES bulk indexing optimizations (compression, refresh, replicas) #124

Description

Problem

Current State

Proposed Optimizations

1. Enable Gzip Compression (Easy Win)

2. Disable Auto-Refresh During Bulk Indexing

3. Disable Replicas During Bulk Indexing

Expected Impact

Implementation Plan

Phase 1: Compression (Low Risk)

Phase 2: Index Settings Lifecycle (Medium Risk)

Code Changes

Interaction with Other Issues

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions