[SPARK-57537][SS] Add metrics support for deleteRange operation in RocksDB#56598
[SPARK-57537][SS] Add metrics support for deleteRange operation in RocksDB#56598trisjain-db wants to merge 1 commit into
Conversation
fb66c64 to
2dbe50f
Compare
| cfName, | ||
| includesPrefix | ||
| ) | ||
| numKeysOnWritingVersion -= deletedKeys |
There was a problem hiding this comment.
Correctness concern: The counters are decremented before db.deleteRange() executes. If deleteRange throws (RocksDB write stall, disk full, or a closed-DB exception), the keys still exist but the counters have already been reduced - permanently under-counting for the rest of this version. Move the decrement after the successful db.deleteRange() call.
| } | ||
|
|
||
| // Count the keys that are about to be deleted (gated on trackTotalNumberOfRows). | ||
| if (conf.trackTotalNumberOfRows) { |
There was a problem hiding this comment.
deleteRange in RocksDB is O(1) - it writes a single range tombstone without touching individual keys. Scanning the full range with an iterator to count keys makes this O(n). For TTL-based state expiration (the primary deleteRange use case in structured streaming), ranges can contain millions of expired keys. When trackTotalNumberOfRows is enabled (the default), every deleteRange now iterates the full range before deleting it. Worth documenting this cost tradeoff, or considering RocksDB's approximate-count APIs (GetApproximateSizes) as an alternative for large ranges.
| val rangeReadOptions = new ReadOptions() | ||
| rangeReadOptions.setIterateUpperBound(upperBoundSlice) | ||
| val iter = db.newIterator(rangeReadOptions) | ||
| try { |
There was a problem hiding this comment.
nit: upperBoundSlice and rangeReadOptions are allocated before the try block. If db.newIterator(rangeReadOptions) throws, neither native resource is closed - the finally only runs after the try starts. Consider wrapping each in its own try-finally, or moving the allocations inside the try with a null-check in finally.
What changes were proposed in this pull request?
Added metrics update for
deleteRangeoperation (numKeysOnWritingVersionandnumInternalKeysOnWritingVersion). The metric update is gated onconf.trackTotalNumberOfRowsso callers that don't expose the metric pay no extra cost.Why are the changes needed?
deleteRangecurrently does not update the row-count metrics, which can leavenumKeysOnWritingVersionandnumInternalKeysOnWritingVersioninaccurate after range deletes. This change keeps the metrics consistent with other write paths while avoiding extra cost whentrackTotalNumberOfRowsis disabled.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Four new unit tests in
RocksDBStateStoreSuite:Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude