[IS] Compaction observability: segment inventory + candidate-selection trace logs

Follow-up to #616 (items 1 and 2). The Phase B checkpoint guard from #616 was landed in #618; this issue tracks the remaining **compaction-observability** improvements that #618 deliberately deferred. Item 3 of #616 (automatic exclusion of `READ_IO`-failed segments + `describe-index` exposure) is out of scope here.

## Background

During a 100M BIGANN vector benchmark in gRPC push mode, the IS accumulated 594 segments and the compaction cycle began retrying every 10 minutes against a corrupt segment without operators being able to identify which segment was failing from the logs.

The compaction cycle today logs only aggregate counts:

```
INFO: vector store vidx: tiered compaction — 594 segments → effective maxBytes=…
INFO: vector store vidx: maxInputBytes cap (4,294,967,296 bytes) trimmed candidates to 83 segment(s) (4,269,873,288 bytes)
INFO: vector store vidx: starting graph-merge compaction (83 candidate segments)
WARNING: vector store vidx: compaction failed (READ_IO)
…streaming compaction: failed to prepare source segment graph for vidx_vector_bench_603989658950146_seg627
```

There is no per-segment inventory, no list of the 83 chosen candidates, and no per-segment selection reason. When a corrupt segment is in the selected set, there is no log trail to explain why it was included.

## Requested improvements

### 1. Segment inventory log at compaction start

Before running the selection algorithm in `PersistentVectorStore.runCompactionCycle`, log:

- **INFO summary**: ``vector store <indexName>: segment inventory — <N> segments, <totalLiveVectors> vectors total``
- **FINE per-segment** (gated on `LOGGER.isLoggable(Level.FINE)` so the per-segment loop has near-zero cost in production): one line per segment with `segUuid`, `liveCount`, `estimatedSizeBytes`, `graphFileSize`, `generation`. (No proactive S3 HEAD check — the corrupted-segment quarantine in item 3 of #616, when implemented, will break the retry loop without extra S3 traffic. A HEAD-based `minioOk` flag can be added later if operators want it.)

### 2. Candidate selection trace log

After `VectorIndexCompactor.chooseSegmentsToMerge` returns and after the `maxInputBytes` cap has trimmed the list, log the **chosen** candidates:

- **INFO summary**: already present (`starting graph-merge compaction (<N> candidate segments)`) — extend it slightly to include the total bytes selected.
- **FINE per-candidate**: one line per chosen candidate with `segUuid`, `liveCount`, `graphFileSize`, selection-reason tag (`tier-0`, `micro-segment`, `byte-cap-trimmed`, etc.).

This makes it possible to grep the IS log stream for which segments were picked in each cycle, and to verify the selection policy from production logs.

## Out of scope

- **Item 3 of #616** — automatic quarantine of `READ_IO`-failed segments + `describe-index` exposure. Not pursued.
- **Item 4 of #616** — Phase B checkpoint guard. Already landed in #618.
- **Proactive S3 HEAD checks** of segment graph files. The retry-loop symptom from the original report is best addressed by quarantine (item 3, out of scope) rather than per-cycle HEAD-request fan-out.

## Implementation notes

- All changes live in `herddb-indexing-service/src/main/java/herddb/indexing/vector/PersistentVectorStore.java`, around `runCompactionCycle` (lines ~2431–2570 in the current master).
- The per-segment / per-candidate loops MUST be guarded by `LOGGER.isLoggable(Level.FINE)` — at 594 segments per cycle the unguarded build would burn measurable CPU even when the handler discards the records.
- The `segUuid` is computed by `segmentStorageKey(VectorSegment)` (line 1147) — reuse, do not duplicate the formatting.

## Tests

A plain unit test (`CompactionInventoryLoggingTest` in `herddb-indexing-service/src/test/java/herddb/indexing/vector/`) that:
- Spins up a `PersistentVectorStore` with `MemoryDataStorageManager`.
- Writes a few real segments via `addVector` + `checkpoint`.
- Attaches a JUL handler at `Level.ALL` to `PersistentVectorStore.LOGGER`.
- Calls `runCompactionCycle()`.
- Asserts the INFO inventory line is emitted with the right segment count, and that the FINE per-segment / per-candidate lines name the expected `segUuid`s.

No new `@Category(ClusterTest.class)` required — pure unit-test path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IS] Compaction observability: segment inventory + candidate-selection trace logs #620

Background

Requested improvements

1. Segment inventory log at compaction start

2. Candidate selection trace log

Out of scope

Implementation notes

Tests

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[IS] Compaction observability: segment inventory + candidate-selection trace logs #620

Description

Background

Requested improvements

1. Segment inventory log at compaction start

2. Candidate selection trace log

Out of scope

Implementation notes

Tests

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions