Skip to content

[IS] Compaction observability: segment inventory + candidate-selection trace logs #620

@eolivelli

Description

@eolivelli

Follow-up to #616 (items 1 and 2). The Phase B checkpoint guard from #616 was landed in #618; this issue tracks the remaining compaction-observability improvements that #618 deliberately deferred. Item 3 of #616 (automatic exclusion of READ_IO-failed segments + describe-index exposure) is out of scope here.

Background

During a 100M BIGANN vector benchmark in gRPC push mode, the IS accumulated 594 segments and the compaction cycle began retrying every 10 minutes against a corrupt segment without operators being able to identify which segment was failing from the logs.

The compaction cycle today logs only aggregate counts:

INFO: vector store vidx: tiered compaction — 594 segments → effective maxBytes=…
INFO: vector store vidx: maxInputBytes cap (4,294,967,296 bytes) trimmed candidates to 83 segment(s) (4,269,873,288 bytes)
INFO: vector store vidx: starting graph-merge compaction (83 candidate segments)
WARNING: vector store vidx: compaction failed (READ_IO)
…streaming compaction: failed to prepare source segment graph for vidx_vector_bench_603989658950146_seg627

There is no per-segment inventory, no list of the 83 chosen candidates, and no per-segment selection reason. When a corrupt segment is in the selected set, there is no log trail to explain why it was included.

Requested improvements

1. Segment inventory log at compaction start

Before running the selection algorithm in PersistentVectorStore.runCompactionCycle, log:

  • INFO summary: vector store <indexName>: segment inventory — <N> segments, <totalLiveVectors> vectors total
  • FINE per-segment (gated on LOGGER.isLoggable(Level.FINE) so the per-segment loop has near-zero cost in production): one line per segment with segUuid, liveCount, estimatedSizeBytes, graphFileSize, generation. (No proactive S3 HEAD check — the corrupted-segment quarantine in item 3 of [IS] Add segment inventory and candidate-selection logging to compaction cycle #616, when implemented, will break the retry loop without extra S3 traffic. A HEAD-based minioOk flag can be added later if operators want it.)

2. Candidate selection trace log

After VectorIndexCompactor.chooseSegmentsToMerge returns and after the maxInputBytes cap has trimmed the list, log the chosen candidates:

  • INFO summary: already present (starting graph-merge compaction (<N> candidate segments)) — extend it slightly to include the total bytes selected.
  • FINE per-candidate: one line per chosen candidate with segUuid, liveCount, graphFileSize, selection-reason tag (tier-0, micro-segment, byte-cap-trimmed, etc.).

This makes it possible to grep the IS log stream for which segments were picked in each cycle, and to verify the selection policy from production logs.

Out of scope

Implementation notes

  • All changes live in herddb-indexing-service/src/main/java/herddb/indexing/vector/PersistentVectorStore.java, around runCompactionCycle (lines ~2431–2570 in the current master).
  • The per-segment / per-candidate loops MUST be guarded by LOGGER.isLoggable(Level.FINE) — at 594 segments per cycle the unguarded build would burn measurable CPU even when the handler discards the records.
  • The segUuid is computed by segmentStorageKey(VectorSegment) (line 1147) — reuse, do not duplicate the formatting.

Tests

A plain unit test (CompactionInventoryLoggingTest in herddb-indexing-service/src/test/java/herddb/indexing/vector/) that:

  • Spins up a PersistentVectorStore with MemoryDataStorageManager.
  • Writes a few real segments via addVector + checkpoint.
  • Attaches a JUL handler at Level.ALL to PersistentVectorStore.LOGGER.
  • Calls runCompactionCycle().
  • Asserts the INFO inventory line is emitted with the right segment count, and that the FINE per-segment / per-candidate lines name the expected segUuids.

No new @Category(ClusterTest.class) required — pure unit-test path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions