Conversation
📝 WalkthroughWalkthroughComprehensive feature expansion and architectural refactoring across Hudi: adds clustering instant expiration eligibility checking with heartbeat coordination, shredded Variant schema support for Parquet serialization, Spark SQL vector similarity search table-valued functions (single and batch modes), catalog-backed partition discovery, log compaction threshold-based scheduling, incremental query column pruning, and Flink compaction/clean handler abstraction with composite/factory patterns. Also consolidates CDC iterator implementations, enhances RocksDB metrics exposure, and enforces LANCE format Spark-engine-only constraint. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Spark Client
participant Planner as Vector Search Planner
participant Analyzer as Distance Function Builder
participant Executor as Spark Executor
participant Corpus as Corpus DataFrame
participant Index as Vector Index/Broadcast
Client->>Planner: buildSingleQueryPlan(corpusDf, embeddingCol, queryVector, k, metric)
Planner->>Planner: validate embedding column exists & is array-typed
Planner->>Analyzer: createDistanceUdf(metric, elementType, queryVector)
Analyzer->>Analyzer: pre-compute query vector norm
Analyzer-->>Planner: distance UDF
Planner->>Planner: filter null embeddings from corpus
Planner->>Executor: compute _hudi_distance per row via UDF
Executor->>Executor: vectorized distance calculation (cosine/L2/dot-product)
Planner->>Planner: order by _hudi_distance ASC
Planner->>Planner: limit to k results
Planner-->>Client: analyzed LogicalPlan with k nearest neighbors
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 3
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java (1)
733-742:⚠️ Potential issue | 🟠 MajorStart the clustering heartbeat before persisting the requested instant.
Lines 739-742 start the heartbeat only after
scheduleClustering(...)returns. That leaves a race where the pending clustering instant already exists but has no heartbeat yet, so a concurrent lazy-clean / partition rollback withhoodie.clustering.expiration.threshold.mins=0can treat a freshly scheduled clustering as stale. Start the heartbeat first, and stop it again if no plan is produced or scheduling fails.♻️ Proposed fix
case CLUSTER: log.info("Scheduling clustering at instant time: {} for table {}", instantTime, config.getBasePath()); - Option<HoodieClusteringPlan> clusteringPlan = table - .scheduleClustering(context, instantTime, extraMetadata); - option = clusteringPlan.map(plan -> instantTime); - if (option.isPresent() && config.isExpirationOfClusteringEnabled()) { - heartbeatClient.start(instantTime); - log.info("Started heartbeat for clustering instant {}", instantTime); - } + boolean heartbeatStarted = false; + try { + if (config.isExpirationOfClusteringEnabled()) { + heartbeatClient.start(instantTime); + heartbeatStarted = true; + log.info("Started heartbeat for clustering instant {}", instantTime); + } + Option<HoodieClusteringPlan> clusteringPlan = table + .scheduleClustering(context, instantTime, extraMetadata); + option = clusteringPlan.map(plan -> instantTime); + if (option.isEmpty() && heartbeatStarted) { + heartbeatClient.stop(instantTime); + } + } catch (Throwable t) { + if (heartbeatStarted) { + heartbeatClient.stop(instantTime); + } + throw t; + } break;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java` around lines 733 - 742, Start the clustering heartbeat before calling table.scheduleClustering(...) to avoid a race where a pending clustering instant exists without a heartbeat; in BaseHoodieTableServiceClient change the CLUSTER branch so heartbeatClient.start(instantTime) is invoked prior to calling table.scheduleClustering(context, instantTime, extraMetadata), and if scheduleClustering returns empty or throws, stop the heartbeat (heartbeatClient.stop(instantTime) or equivalent) and avoid leaving a running heartbeat. Ensure existing checks using option/isExpirationOfClusteringEnabled remain (only start/stop when config.isExpirationOfClusteringEnabled() is true) and that failures still propagate after cleanup.hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java (1)
421-447:⚠️ Potential issue | 🟠 MajorNested
VECTORs still slip throughARRAY<STRUCT<...>>andMAP<STRING, STRUCT<...>>.
createArray/createMaponly reject direct VECTOR element/value schemas, andvalidateNoVectorInNestedRecordonly walks child RECORDs. A schema likearray(record{embedding: VECTOR})ormap<string, record{embedding: VECTOR}>still passes construction even though the new contract says VECTOR inside ARRAY/MAP is unsupported. That leaves invalid schemas to fail much later instead of being rejected here.Possible fix
- private static void validateNoVectorInNestedRecord(List<HoodieSchemaField> fields, boolean nested) { - for (HoodieSchemaField field : fields) { - HoodieSchema nonNull = field.schema().getNonNullType(); - if (nested && nonNull.getType() == HoodieSchemaType.VECTOR) { - throw new HoodieSchemaException( - "VECTOR column '" + field.name() + "' must be a top-level field. " - + "Nested VECTOR columns (inside STRUCT, ARRAY, or MAP) are not supported."); - } - if (nonNull.getType() == HoodieSchemaType.RECORD) { - validateNoVectorInNestedRecord(nonNull.getFields(), true); - } - } - } + private static void validateNoVectorInNestedRecord(List<HoodieSchemaField> fields, boolean nested) { + for (HoodieSchemaField field : fields) { + validateNoNestedVector(field.schema(), nested, field.name()); + } + } + + private static void validateNoNestedVector(HoodieSchema schema, boolean nested, String fieldName) { + HoodieSchema nonNull = schema.getNonNullType(); + if (nested && nonNull.getType() == HoodieSchemaType.VECTOR) { + throw new HoodieSchemaException( + "VECTOR column '" + fieldName + "' must be a top-level field. " + + "Nested VECTOR columns (inside STRUCT, ARRAY, or MAP) are not supported."); + } + switch (nonNull.getType()) { + case RECORD: + validateNoVectorInNestedRecord(nonNull.getFields(), true); + break; + case ARRAY: + validateNoNestedVector(nonNull.getElementType(), true, fieldName); + break; + case MAP: + validateNoNestedVector(nonNull.getValueType(), true, fieldName); + break; + case UNION: + nonNull.getTypes().forEach(t -> validateNoNestedVector(t, nested, fieldName)); + break; + default: + } + }Also applies to: 484-516
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java` around lines 421 - 447, createArray and createMap only reject VECTOR when it's the direct element/value type but allow VECTOR nested inside RECORDs (e.g., ARRAY<RECORD{embedding: VECTOR}>); update createArray(HoodieSchema) and createMap(HoodieSchema) to recursively validate the element/value schema for nested VECTORs (either call or extend validateNoVectorInNestedRecord to traverse RECORD children and throw HoodieSchemaException on any nested VECTOR), ensuring any RECORD-containing element/value is scanned for VECTOR fields before building the Avro array/map schema; apply the same recursive check used elsewhere (see validateNoVectorInNestedRecord and the VECTOR-related error messages) so nested VECTORs are rejected early.
🟠 Major comments (21)
hudi-hadoop-common/src/main/java/org/apache/hudi/common/util/ParquetReaderIterator.java-58-63 (1)
58-63:⚠️ Potential issue | 🟠 MajorRoute error-path cleanup through
close()to preserve idempotent close semantics.Line 60 now closes on EOF, but error paths still close the raw
parquetReaderdirectly. That bypasses the newclosedguard and can trigger duplicateParquetReader.close()calls.💡 Proposed fix
@@ } catch (Exception e) { - FileIOUtils.closeQuietly(parquetReader); + try { + close(); + } catch (Exception ignore) { + // no-op: preserve original failure context + } throw new HoodieException("unable to read next record from parquet file ", e); } } @@ } catch (Exception e) { - FileIOUtils.closeQuietly(parquetReader); + try { + close(); + } catch (Exception ignore) { + // no-op: preserve original failure context + } throw new HoodieException("unable to read next record from parquet file ", e); } }Also applies to: 88-94
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-hadoop-common/src/main/java/org/apache/hudi/common/util/ParquetReaderIterator.java` around lines 58 - 63, The catch blocks in ParquetReaderIterator should not call parquetReader.close() directly (which bypasses the iterator's closed guard) — instead route cleanup through the iterator's close() method; locate the try/catch sections around hasNext/next (references: ParquetReaderIterator, close(), parquetReader) and replace direct parquetReader.close() invocations in error paths with a call to this.close() (or close()) while rethrowing or wrapping the original exception as before so idempotent close semantics are preserved; apply the same change to the similar error-paths around lines 88-94.hudi-hadoop-common/src/test/java/org/apache/hudi/common/util/TestParquetReaderIterator.java-68-68 (1)
68-68:⚠️ Potential issue | 🟠 MajorAvoid asserting duplicate
close()calls; this locks in unintended behavior.Line 68 currently validates double-close behavior. With idempotent iterator close semantics, this should assert a single close call.
✅ Suggested assertion update
- verify(reader, times(2)).close(); + verify(reader, times(1)).close();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-hadoop-common/src/test/java/org/apache/hudi/common/util/TestParquetReaderIterator.java` at line 68, The test currently asserts duplicate close calls on the Parquet reader (verify(reader, times(2)).close()), which locks in non-idempotent behavior; update the assertion in TestParquetReaderIterator to verify a single close instead (e.g., use verify(reader).close() or verify(reader, times(1)).close()) so the test expects idempotent iterator close semantics; keep the rest of the test logic and mocks (reader, iterator code paths) unchanged.hudi-spark-datasource/hudi-spark3.3.x/src/main/scala/org/apache/spark/sql/hudi/analysis/Spark33HoodiePruneFileSourcePartitions.scala-63-65 (1)
63-65:⚠️ Potential issue | 🟠 MajorGuard
filterFileSlicesbehind actual partition predicates.At Line 63 this mutates the shared
HoodieFileIndexbefore Line 65 proves there is anything to prune. Queries with only data filters—or no filters at all—will still mark pushdown as applied and then returnopunchanged, which can suppress later pruning passes.Suggested fix
- // [[HudiFileIndex]] is a caching one, therefore we don't need to reconstruct new relation, - // instead we simply just refresh the index and update the stats - fileIndex.filterFileSlices(dataFilters, partitionPruningFilters, isPartitionPruned = true) - if (partitionPruningFilters.nonEmpty) { + // [[HudiFileIndex]] is a caching one, therefore we don't need to reconstruct new relation, + // instead we simply refresh the index and update the stats when partition pruning actually applies + fileIndex.filterFileSlices(dataFilters, partitionPruningFilters, isPartitionPruned = true) + // Change table stats based on the sizeInBytes of pruned files val filteredStats = FilterEstimation(Filter(partitionPruningFilters.reduce(And), lr)).estimate🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-spark-datasource/hudi-spark3.3.x/src/main/scala/org/apache/spark/sql/hudi/analysis/Spark33HoodiePruneFileSourcePartitions.scala` around lines 63 - 65, Move the call that mutates the shared HoodieFileIndex so it only runs when there are actual partition predicates: wrap or relocate the fileIndex.filterFileSlices(dataFilters, partitionPruningFilters, isPartitionPruned = true) call inside the if (partitionPruningFilters.nonEmpty) block (the same block that subsequently marks pushdown applied and returns op) so that fileIndex is not modified for queries with only data filters or no filters; update any logic that marks pushdown as applied (related to the same conditional) to only execute when partitionPruningFilters is nonEmpty to avoid suppressing later pruning passes.hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java-333-336 (1)
333-336:⚠️ Potential issue | 🟠 MajorDon't use
executor != nullas the recovery-state check here.
close()shuts the executor/client resources down but leaves the fields non-null.StreamWriteFunctionWrapper.coordinatorFails()doesclose() -> resetToCheckpoint() -> start()on the same coordinator instance, so this branch can callrestoreEvents()on already-closed resources before restart. Gate this on an explicit started flag, or clear the executor/client fields inclose().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java` around lines 333 - 336, The check using executor != null is unsafe because close() leaves fields non-null; change the guard around the restoreEvents call to use an explicit lifecycle flag or cleared resources: add or use a boolean like started (set true in start() and false in close()) and replace "if (executor != null && tableState.isRLIWithBootstrap)" with "if (started && tableState.isRLIWithBootstrap)"; alternatively, ensure close() nulls/cleans executor and any client fields so the existing check is valid; make sure restoreEvents(), restore/resetToCheckpoint(), and StreamWriteFunctionWrapper.coordinatorFails() lifecycle interactions are consistent with the chosen approach.hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/ScheduleCompactionActionExecutor.java-243-251 (1)
243-251:⚠️ Potential issue | 🟠 MajorCompare the log-compaction blocks threshold against block counts, not delta-commit counts.
numDeltaCommitsSinceis a commit count, butconfig.getLogCompactionBlocksThreshold()is a block-based threshold. A single delta commit can append multiple log blocks in metadata partitions, so this can delay log compaction well past the configured limit or trigger it too early. This decision needs to use the accumulated log-block metric instead ofcountInstants().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/ScheduleCompactionActionExecutor.java` around lines 243 - 251, needLogCompact currently compares delta-commit counts (from latestDeltaCommitInfoSinceCompact and getLatestDeltaCommitInfoSinceLogCompaction) against config.getLogCompactionBlocksThreshold(), but the threshold is in log blocks not commits; update needLogCompact to use the accumulated log-block counts instead of commit counts—obtain the log-block metric for the range represented by latestDeltaCommitInfoSinceCompact and the value from getLatestDeltaCommitInfoSinceLogCompaction (or extend those info objects to carry block counts), compute the min accumulated log blocks and compare that against config.getLogCompactionBlocksThreshold() so the decision uses block counts (references: needLogCompact, latestDeltaCommitInfoSinceCompact, getLatestDeltaCommitInfoSinceLogCompaction, config.getLogCompactionBlocksThreshold()).hudi-client/hudi-client-common/src/main/java/org/apache/hudi/util/DistributedRegistryUtil.java-48-51 (1)
48-51:⚠️ Potential issue | 🟠 MajorUse
LocalRegistryexplicitly in the non-executor branch.This branch currently derives the registry class from
Registry.getRegistry(...), so the implementation chosen here depends on unrelated process-global state instead of the current config. In a JVM that has already created a different registry type under the bare wrapper-FS name, executor-disabled runs can bind to the wrong registry implementation.Suggested fix
+import org.apache.hudi.common.metrics.LocalRegistry; import org.apache.hudi.common.metrics.Registry; @@ } else { registry = Registry.getRegistryOfClass(config.getTableName(), HoodieWrapperFileSystem.REGISTRY_NAME, - Registry.getRegistry(HoodieWrapperFileSystem.REGISTRY_NAME).getClass().getName()); + LocalRegistry.class.getName()); registryMeta = Registry.getRegistryOfClass(config.getTableName(), HoodieWrapperFileSystem.REGISTRY_META_NAME, - Registry.getRegistry(HoodieWrapperFileSystem.REGISTRY_META_NAME).getClass().getName()); + LocalRegistry.class.getName()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/util/DistributedRegistryUtil.java` around lines 48 - 51, The non-executor branch in DistributedRegistryUtil currently infers the registry implementation via Registry.getRegistry(...).getClass().getName(), which can pick up a process-global registry type; change the calls to Registry.getRegistryOfClass(...) for HoodieWrapperFileSystem.REGISTRY_NAME and REGISTRY_META_NAME to explicitly pass LocalRegistry.class.getName() (i.e., use LocalRegistry as the registry implementation for both registry and registryMeta), and add the necessary import for LocalRegistry so the non-executor path always binds to the local implementation.hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/common/HoodieSparkEngineContext.java-282-288 (1)
282-288:⚠️ Potential issue | 🟠 MajorUse an unambiguous cache key for distributed registries.
Line 283 builds the cache key as
tableName + "." + registryName, which can alias distinct pairs like("db.tbl", "rocksdb")and("db", "tbl.rocksdb"). That can return the wrongRegistryinstance and cross-wire metrics between tables. ReuseRegistry.makeKey(tableName, registryName)here as well.♻️ Proposed fix
`@Override` public Registry getMetricRegistry(String tableName, String registryName) { - final String prefixedName = tableName.isEmpty() ? registryName : tableName + "." + registryName; - return DISTRIBUTED_REGISTRY_MAP.computeIfAbsent(prefixedName, key -> { + final String registryKey = Registry.makeKey(tableName, registryName); + return DISTRIBUTED_REGISTRY_MAP.computeIfAbsent(registryKey, key -> { Registry registry = Registry.getRegistryOfClass(tableName, registryName, DistributedRegistry.class.getName()); ((DistributedRegistry) registry).register(javaSparkContext); return registry; }); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/common/HoodieSparkEngineContext.java` around lines 282 - 288, The cache key in HoodieSparkEngineContext.getMetricRegistry currently concatenates tableName + "." + registryName which can alias different (tableName, registryName) pairs; change the key to use Registry.makeKey(tableName, registryName) when calling DISTRIBUTED_REGISTRY_MAP.computeIfAbsent so the map uses the unambiguous Registry key; keep the rest of the lambda intact (create via Registry.getRegistryOfClass(...), cast to DistributedRegistry, call register(javaSparkContext), and return the registry).hudi-io/src/main/java/org/apache/hudi/common/metrics/Registry.java-149-156 (1)
149-156:⚠️ Potential issue | 🟠 MajorPreserve the original registry keys when shipping registries to executors.
Lines 153-154 re-register every registry under
makeKey("", registry.getName()). That drops the originaltableName/registryNamesplit, so executor-sideRegistry.getRegistryOfClass(tableName, registryName, ...)won't find the pre-registered instance for table-scoped metrics and will create a second registry instead.♻️ Proposed fix
- static void setRegistries(Collection<Registry> registries) { - for (Registry registry : registries) { - REGISTRY_MAP.putIfAbsent(makeKey("", registry.getName()), registry); - } + static void setRegistries(Map<String, Registry> registries) { + registries.forEach(REGISTRY_MAP::putIfAbsent); }Apply the corresponding call-site change in
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/common/HoodieSparkEngineContext.java:private static void setRegistries(Map<String, Registry> registries) { - Registry.setRegistries(registries.values()); + Registry.setRegistries(registries); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-io/src/main/java/org/apache/hudi/common/metrics/Registry.java` around lines 149 - 156, setRegistries currently re-registers every Registry under makeKey("", registry.getName()), dropping the original tableName scope so executor-side Registry.getRegistryOfClass(tableName, registryName, ...) can't find pre-registered instances; fix by preserving the original key components when inserting into REGISTRY_MAP (e.g., change the putIfAbsent call to use the registry's actual table name instead of "" — use registry.getTableName() (or the registry.getKey()/appropriate accessor that returns the original tableName) with registry.getName() in makeKey) and ensure the corresponding call-site change is applied in HoodieSparkEngineContext to keep keys consistent across driver and executors.hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcImageManager.java-169-175 (1)
169-175:⚠️ Potential issue | 🟠 MajorGuard against
skipBytes()returning zero.This loop can hang forever on truncated or corrupt buffers.
DataInputStream.skipBytes(int)is not guaranteed to make progress and can return 0, leavingnumBytesunchanged and causing an infinite loop.Suggested fix
`@Override` public void skipBytesToRead(int numBytes) throws IOException { while (numBytes > 0) { int skipped = skipBytes(numBytes); + if (skipped <= 0) { + throw new IOException("Unable to skip remaining " + numBytes + " bytes"); + } numBytes -= skipped; } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcImageManager.java` around lines 169 - 175, The skipBytesToRead method can loop forever if skipBytes(numBytes) returns 0; update skipBytesToRead to detect a zero return from skipBytes and fail fast (throw an EOFException or IOException) instead of looping endlessly, or alternatively implement a fallback that reads and discards bytes (e.g., readFully into a temporary buffer) when skipBytes makes no progress; the change should be made in skipBytesToRead and reference DataInputStream.skipBytes(int) and skipBytes(numBytes) so callers of CdcImageManager.skipBytesToRead will get a clear error rather than hang.hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.java-279-285 (1)
279-285:⚠️ Potential issue | 🟠 MajorClose the spillable map when this iterator created it locally.
Lines 284-285 allocate a standalone spillable map when there is no before-slice, but Lines 334-338 never close that map. That leaks temp files / memory on the no-before-image path.
🧹 Proposed fix
private ExternalSpillableMap<String, byte[]> beforeImages; + private boolean ownsBeforeImages; @@ } else { this.beforeImages = FormatUtils.spillableMap(writeConfig, maxCompactionMemoryInBytes, getClass().getSimpleName()); + this.ownsBeforeImages = true; } @@ public void close() { logRecordIterator.close(); + if (ownsBeforeImages && beforeImages != null) { + beforeImages.close(); + beforeImages = null; + } imageManager.close(); }Also applies to: 334-338
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.java` around lines 279 - 285, The initImages method can create a standalone spillable map via FormatUtils.spillableMap when fileSplit.getBeforeFileSlice() is absent, but that map is never closed leading to leaked temp files; update initImages to record whether beforeImages was created locally (e.g., a boolean field like beforeImagesOwned) when using FormatUtils.spillableMap, and then in the iterator/class close method (the one that currently closes resources around lines 334-338) call beforeImages.close() only if beforeImagesOwned is true; ensure similar ownership tracking/closing is added for any analogous afterImages creation path if present.hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieTimeline.java-490-493 (1)
490-493:⚠️ Potential issue | 🟠 MajorThis new abstract method on a public interface is a binary-incompatible API change.
Adding a non-default abstract method to
HoodieTimelinebreaks binary compatibility. Any external implementation of this interface will fail to compile when this version is upgraded to, since they must now implement the new method. While internal implementations in this repo are already handled (BaseHoodieTimeline has the implementation), this breaks external adopters.Use a
defaultmethod instead to maintain backward compatibility:Compatibility-preserving approach
- HoodieTimeline findInstantsModifiedBeforeOrEqualsByCompletionTime(String instantTime); + default HoodieTimeline findInstantsModifiedBeforeOrEqualsByCompletionTime(String instantTime) { + throw new UnsupportedOperationException("Not implemented"); + }Then override it in
BaseHoodieTimelinewith the actual implementation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieTimeline.java` around lines 490 - 493, You added a new abstract method findInstantsModifiedBeforeOrEqualsByCompletionTime(String instantTime) to the public HoodieTimeline interface which is a binary-incompatible change; change it to a default method on HoodieTimeline (provide a reasonable default such as delegating to existing APIs or throwing UnsupportedOperationException) to preserve ABI, and then override/implement the real logic in BaseHoodieTimeline by adding a concrete override of findInstantsModifiedBeforeOrEqualsByCompletionTime that contains the current implementation. Ensure method signature and javadoc remain identical and reference HoodieTimeline and BaseHoodieTimeline when making the edits.hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java-922-940 (1)
922-940:⚠️ Potential issue | 🟠 MajorUse unique Avro record names for shredded Variant helper structs.
createShreddedFieldStructuses the field key as the record name, andcreateVariantShreddedObjecthard-codes the enclosing typed record name totyped_value. Avro named types are global by full name within a schema, so two shredded objects in the same schema can collide and the later definition can alias the first one during serialization. These helper-generated record names need to be derived from the parent path/namespace, not reused raw like this.Suggested direction
- public static HoodieSchema createShreddedFieldStruct(String fieldName, HoodieSchema fieldType) { + public static HoodieSchema createShreddedFieldStruct( + String recordName, String fieldName, String namespace, HoodieSchema fieldType) { ValidationUtils.checkArgument(fieldName != null && !fieldName.isEmpty(), "Field name cannot be null or empty"); ValidationUtils.checkArgument(fieldType != null, "Field type cannot be null"); List<HoodieSchemaField> fields = Arrays.asList( ... ); - return HoodieSchema.createRecord(fieldName, null, null, fields); + return HoodieSchema.createRecord(recordName, namespace, null, fields); } ... - HoodieSchema fieldStruct = createShreddedFieldStruct(entry.getKey(), entry.getValue()); + String recordName = name + "_" + entry.getKey() + "_shredded"; + HoodieSchema fieldStruct = createShreddedFieldStruct(recordName, entry.getKey(), namespace, entry.getValue()); ... - HoodieSchema typedValueSchema = HoodieSchema.createRecord( - Variant.VARIANT_TYPED_VALUE_FIELD, namespace, null, typedValueFields); + HoodieSchema typedValueSchema = HoodieSchema.createRecord( + name + "_typed_value", namespace, null, typedValueFields);Also applies to: 988-1007
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java` around lines 922 - 940, The helper generators createShreddedFieldStruct and createVariantShreddedObject currently produce Avro record names that can collide (createShreddedFieldStruct uses the raw fieldName and createVariantShreddedObject uses the hard-coded "typed_value"), so change both to emit globally-unique Avro record names derived from the parent's qualified path/namespace (e.g., incorporate the parent path or namespace plus the fieldName and a stable suffix like "_shredded" or "_typed_value") rather than raw names; update the calls to HoodieSchema.createRecord in createShreddedFieldStruct and the record-name used in createVariantShreddedObject to build and pass that fully-qualified unique name (ensure Variant.VARIANT_TYPED_VALUE_FIELD and Variant.VARIANT_VALUE_FIELD usage remains the same and only the record name/namespace generation is changed).hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/common/TestCatalogBackedTableMetadata.scala-45-62 (1)
45-62:⚠️ Potential issue | 🟠 MajorThis test never actually disables the metadata table.
The scenario claims to validate catalog-backed partition listing when metadata is disabled, but the table properties here never set
'hoodie.metadata.enable'='false'. If metadata is enabled by default in the test environment, this case still exercises the metadata-table path and misses the fallback behavior it is supposed to cover.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/common/TestCatalogBackedTableMetadata.scala` around lines 45 - 62, The test in TestCatalogBackedTableMetadata.scala creates the table for the scenario but never disables the metadata table; update the CREATE TABLE statement (the SQL executed where targetTable is created) to include the table property 'hoodie.metadata.enable'='false' in the tblproperties so the test actually exercises the non-metadata fallback path. Locate the SQL block that creates targetTable and add the 'hoodie.metadata.enable'='false' property alongside the existing 'primaryKey', 'type', and 'preCombineField' entries.hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelationV2.scala-221-223 (1)
221-223:⚠️ Potential issue | 🟠 MajorKeep datasource filter columns available until the incremental filters have run.
This has the same pruning problem as
IncrementalRelationV1:prunedSchemaonly reflectsrequiredColumns, but the datasource-optionfiltersare still applied afterwards. If those predicates reference non-projected columns, Line 292 drops them before evaluation and the scan fails with an unresolved-column error. Include filter columns inprunedSchema, or deferfilterRequiredColumnsFromDFuntil afterfilters.foldLeft(...).Also applies to: 292-293
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelationV2.scala` around lines 221 - 223, prunedSchema produced by IncrementalRelationUtil.getPrunedSchema in IncrementalRelationV2 only uses requiredColumns, but datasource-option filters are applied later (filters.foldLeft(...)) and filterRequiredColumnsFromDF currently drops columns needed by those predicates causing unresolved-column errors; fix by including filter-referenced columns when building prunedSchema (e.g., compute all filter columns from filters and merge them with requiredColumns before calling IncrementalRelationUtil.getPrunedSchema) or alternatively move the call to filterRequiredColumnsFromDF to after filters.foldLeft(...) so predicates see needed columns; update references around prunedSchema, filters.foldLeft, and filterRequiredColumnsFromDF accordingly.hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelationV1.scala-231-233 (1)
231-233:⚠️ Potential issue | 🟠 MajorKeep datasource filter columns in the scan schema until
PUSH_DOWN_INCR_FILTERShas been applied.
prunedSchemais built only fromrequiredColumns, but Line 309 still appliesfiltersfrom the datasource options. If one of those predicates references a non-projected data column,filterRequiredColumnsFromDFremoves it first and the scan fails with an unresolved-column error. Include filter-referenced columns inprunedSchema, or move thefilterRequiredColumnsFromDFcall to afterfilters.foldLeft(...).Also applies to: 308-310
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelationV1.scala` around lines 231 - 233, The prunedSchema (built via IncrementalRelationUtil.getPrunedSchema) currently only uses requiredColumns and thus drops columns referenced by datasource `filters` before PUSH_DOWN_INCR_FILTERS is applied, causing unresolved-column errors when those predicates are applied; fix by ensuring filter-referenced columns are retained in the scan schema: either expand the input to getPrunedSchema to include columns referenced by `filters` (union requiredColumns with the columns extracted from `filters`), or move the call to filterRequiredColumnsFromDF to after the `filters.foldLeft(...)` block so datasource filters are applied before any projection removes referenced columns; update the code around prunedSchema, IncrementalRelationUtil.getPrunedSchema, filters.foldLeft, and filterRequiredColumnsFromDF accordingly.hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java-361-369 (1)
361-369:⚠️ Potential issue | 🟠 MajorPreserve incremental-range partition filtering in the new overload.
Unlike the overload on Line 383, this path never restricts results to
TimelineUtils.getWrittenPartitions(findInstantsInRange()). Once Spark pushes partition predicates down through this method, incremental reads can pick up matching partitions that were not written in the requested instant range, which expands the scan well beyond the incremental window. Apply the same incremental-query branch here before delegating totableMetadata.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java` around lines 361 - 369, The listPartitionPaths(List<String>, Types.RecordType, Expression, List<Object>) path currently delegates to tableMetadata.getPartitionPathWithPathPrefixUsingFilterExpression without applying the incremental-range filter; update this method (listPartitionPaths) to restrict matchedPartitionPaths to only the partitions returned by TimelineUtils.getWrittenPartitions(findInstantsInRange())—the same incremental-query branch used in the other overload—before proceeding/returning so Spark partition predicate pushdown doesn't include partitions outside the requested instant range.hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/util/IncrementalRelationUtil.scala-46-73 (1)
46-73:⚠️ Potential issue | 🟠 MajorDeduplicate implicit helper columns before appending them to
prunedSchema.If a MOR table uses the same field as both a partition field and an ordering field, this code adds that
StructFieldtwice. That produces duplicate column names inprunedSchema, which can break the later.schema(prunedSchema)reads and theselect(fieldNames...)path. Track already-added names and skip duplicates when appending implicit columns.Suggested fix
def getPrunedSchema(requiredColumns: Array[String], usedSchema: StructType, metaClient: HoodieTableMetaClient) = { var prunedSchema = StructType(Seq()) + val appended = scala.collection.mutable.HashSet.empty[String] + + def appendIfPresent(col: String): Unit = { + if (!appended.contains(col)) { + usedSchema.find(_.name == col).foreach { field => + prunedSchema = prunedSchema.add(field) + appended += col + } + } + } // _hoodie_commit_time is a required field. using which query filters are applied. if (!requiredColumns.contains(HoodieRecord.COMMIT_TIME_METADATA_FIELD)) { - prunedSchema = prunedSchema.add(usedSchema(HoodieRecord.COMMIT_TIME_METADATA_FIELD)) + appendIfPresent(HoodieRecord.COMMIT_TIME_METADATA_FIELD) } // Add all the required columns as part of pruned schema requiredColumns.foreach(col => { - val field = usedSchema.find(_.name == col) - if (field.isDefined) { - prunedSchema = prunedSchema.add(field.get) - } + appendIfPresent(col) }) @@ partitionColumns.get().foreach(col => { if (!requiredColumns.contains(col)) { - val field = usedSchema.find(_.name == col) - if (field.isDefined) { - prunedSchema = prunedSchema.add(field.get) - } + appendIfPresent(col) } }) @@ orderingFields.forEach(col => { if (!requiredColumns.contains(col)) { - val field = usedSchema.find(_.name == col) - if (field.isDefined) { - prunedSchema = prunedSchema.add(field.get) - } + appendIfPresent(col) } })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/util/IncrementalRelationUtil.scala` around lines 46 - 73, The code may append the same StructField twice to prunedSchema when a column is both in tableConfig.getPartitionFields and tableConfig.getOrderingFields; modify IncrementalRelationUtil to deduplicate implicit helper columns by tracking a Set of added column names (seed it with requiredColumns and any existing prunedSchema field names), and before adding a field from usedSchema (when inspecting partitionColumns or orderingFields) check the set and skip adding if already present, otherwise add the field to prunedSchema and insert its name into the set; apply this logic around the blocks referencing tableConfig.getPartitionFields, tableConfig.getOrderingFields, prunedSchema, requiredColumns, usedSchema, and HoodieTableType.MERGE_ON_READ.hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestIncrementalQueryColumnPruning.scala-129-168 (1)
129-168:⚠️ Potential issue | 🟠 MajorThese assertions don’t actually prove scan-time column pruning.
Both tests call
.select(...)after the load, so the result schema will be narrowed even if the relation still reads every column. Theplan.contains("col1") || plan.contains("col3")check is also satisfied by the projection itself. A regression in the actualPrunedScanpath would still pass here.Please assert against the scan/read schema in the physical plan instead of the projected output schema.
Also applies to: 225-240
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestIncrementalQueryColumnPruning.scala` around lines 129 - 168, The current assertions check the projected output schema (prunedDF and prunedRecords) which can be narrowed by .select(...) even when the underlying scan reads all columns; instead inspect the physical scan node to prove scan-time pruning: use prunedDF.queryExecution.executedPlan (or prunedDF.queryExecution.toRdd / executedPlan) and assert the scan node/read schema contains only "col1" and "col3" (e.g., look for FileScan/ParquetScan/HadoopFsRelation node text like "ReadSchema" or attribute names) and does NOT mention "col2","col4", etc.; update the assertions that currently use plan.contains(...) to search executedPlan for the scan/read schema and apply the same change to the other occurrence referenced (lines ~225-240) so the test fails if the physical scan still reads unselected columns.hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/metadata/CatalogBackedTableMetadata.scala-99-106 (1)
99-106:⚠️ Potential issue | 🟠 MajorPropagate predicate filters in the filesystem fallback.
When
shouldUseCatalogPartitionsis false, this override dropspartitionPredicateExpressionsand falls back to the older super call. In the path whereCatalogBackedTableMetadatais instantiated but the table does not track partitions in the catalog, that regresses predicate-based partition pruning back to a full filesystem listing.Suggested fix
} else { - super.getPartitionPathWithPathPrefixUsingFilterExpression(relativePathPrefix, partitionFields, pushedExpr) + super.getPartitionPathWithPathPrefixUsingFilterExpression( + relativePathPrefix, + partitionFields, + pushedExpr, + partitionPredicateExpressions) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/metadata/CatalogBackedTableMetadata.scala` around lines 99 - 106, The fallback branch currently ignores partitionPredicateExpressions and does a full filesystem listing; update the else branch in getPartitionPathWithPathPrefixUsingFilterExpression to preserve predicate-based pruning by converting partitionPredicateExpressions to a Seq[Expression] (as done in the shouldUseCatalogPartitions branch) and applying the same filtering logic: call super.getPartitionPathWithPathPrefixUsingFilterExpression(...) to obtain filesystem partitions and then pass that result into filterPartitionsBasedOnRelativePathPrefixes(relativePathPrefix, <super result>) using the derived partitionPredicateExpressionSeq so partitionPredicateExpressions are honored even when shouldUseCatalogPartitions is false.hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieVectorSearchPlanBuilder.scala-108-112 (1)
108-112:⚠️ Potential issue | 🟠 MajorEnsure internal column names cannot collide with input data.
The code unconditionally uses
_hudi_distance,_hudi_query_index,_hudi_query_emb,_hudi_rank, and_hudi_query_prefix without checking whether the input corpus or query tables already contain these names. ThewithColumn()operations at lines 273, 303, 314, and 321 will silently overwrite user data if these collisions occur. The prefixing logic (lines 305–310) only guards columns shared between corpus and query, leaving gaps. Add a pre-validation check that raises an error if any reserved column names exist in the input DataFrames, or generate unique aliases before building the plan.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieVectorSearchPlanBuilder.scala` around lines 108 - 112, Reserved internal column names (DISTANCE_COL, QUERY_ID_COL, QUERY_EMB_ALIAS, RANK_COL, QUERY_COL_PREFIX) are used unconditionally and may overwrite user columns; add a pre-validation step in HoodieVectorSearchPlanBuilder (before building the plan / before any withColumn calls) that checks both input DataFrames for any columns that equal these constants or start with QUERY_COL_PREFIX and raise a clear error if found, or alternatively generate unique aliases (e.g., append a UUID or incremental suffix) for DISTANCE_COL/QUERY_EMB_ALIAS/QUERY_ID_COL/RANK_COL and use those generated symbols consistently throughout the plan-building methods (the same ones that call withColumn). Ensure the chosen approach updates all references in the plan construction so no reserved names collide with input data.hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/HoodieVectorSearchTableValuedFunction.scala-90-108 (1)
90-108:⚠️ Potential issue | 🟠 MajorAdd foldable validation before calling
eval()onkparameter.
parseKevaluates arbitrary user-provided expressions without checking if they are foldable. Non-foldable expressions—such as column references or function calls—will fail unpredictably or evaluate incorrectly during analysis. Thekparameter must be constrained to a foldable expression before evaluation.This pattern is already established elsewhere in the codebase;
evaluateQueryVectorinHoodieSparkBaseAnalysis.scalavalidates foldability before callingeval()on a similar vector search parameter.Suggested fix
private[logical] def parseK(funcName: String, expr: Expression): Int = { + if (!expr.foldable) { + throw new HoodieAnalysisException( + s"Function '$funcName': k must be a foldable positive integer expression, got: ${expr.sql}") + } val rawValue = expr.eval()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/HoodieVectorSearchTableValuedFunction.scala` around lines 90 - 108, The parseK method currently calls expr.eval() on the user-supplied k Expression without checking foldability; update parseK to first require that expr.foldable is true (throwing a HoodieAnalysisException with a clear message referencing funcName if not), then call expr.eval() and proceed with the existing number parsing and positive-int checks; follow the same pattern used by evaluateQueryVector in HoodieSparkBaseAnalysis.scala to validate foldability before evaluation so non-foldable expressions (e.g., column refs) are rejected early.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 72ad786b-9bd7-4e3f-8477-710a1e028675
📒 Files selected for processing (119)
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieClient.javahudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.javahudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.javahudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.javahudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/ScheduleCompactionActionExecutor.javahudi-client/hudi-client-common/src/main/java/org/apache/hudi/util/DistributedRegistryUtil.javahudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/TestBaseHoodieTableServiceClient.javahudi-client/hudi-client-common/src/test/java/org/apache/hudi/config/TestHoodieWriteConfig.javahudi-client/hudi-flink-client/src/main/java/org/apache/hudi/util/HoodieSchemaConverter.javahudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.javahudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/common/HoodieSparkEngineContext.javahudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/row/HoodieRowParquetWriteSupport.javahudi-client/hudi-spark-client/src/main/java/org/apache/hudi/metrics/DistributedRegistry.javahudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/avro/HoodieSparkSchemaConverters.scalahudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/hudi/SparkAdapter.scalahudi-client/hudi-spark-client/src/test/java/org/apache/hudi/common/table/log/TestLogReaderUtils.javahudi-client/hudi-spark-client/src/test/java/org/apache/hudi/metrics/TestDistributedRegistry.javahudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.javahudi-common/src/main/java/org/apache/hudi/common/engine/HoodieEngineContext.javahudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.javahudi-common/src/main/java/org/apache/hudi/common/table/log/LogReaderUtils.javahudi-common/src/main/java/org/apache/hudi/common/table/timeline/BaseHoodieTimeline.javahudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieTimeline.javahudi-common/src/main/java/org/apache/hudi/common/util/CompactionUtils.javahudi-common/src/main/java/org/apache/hudi/common/util/collection/RocksDBDAO.javahudi-common/src/main/java/org/apache/hudi/common/util/queue/DisruptorMessageQueue.javahudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.javahudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.javahudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadata.javahudi-common/src/main/java/org/apache/hudi/util/PartitionPathFilterUtil.javahudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.javahudi-common/src/test/java/org/apache/hudi/util/TestPartitionPathFilterUtil.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkRocksDBIndexMetrics.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/append/AppendWriteFunctionWithDisruptorBufferSort.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/RLIBootstrapOperator.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/event/Correspondent.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/BucketAssignFunction.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/index/IndexBackend.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/index/RocksDBIndexBackend.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/EventBuffers.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/HoodieScanContext.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/HoodieSource.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/HoodieSourceReader.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/HoodieSourceSplitReader.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/RecordLimiter.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/function/AbstractSplitReaderFunction.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/function/HoodieCdcSplitReaderFunction.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/function/HoodieSplitReaderFunction.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/split/assign/HoodieSplitAssigners.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/split/assign/HoodieSplitBucketAssigner.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcImageManager.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcInputFormat.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestStreamWriteOperatorCoordinator.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteMergeOnRead.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/append/TestAppendWriteFunctionWithBufferSort.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/partitioner/index/TestRocksDBIndexBackend.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/MockStateSnapshotContext.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/StreamWriteFunctionWrapper.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/TestEventBuffers.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/TestWriteBase.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/TestHoodieSourceSplitReader.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/TestRecordLimiter.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestAbstractSplitReaderFunction.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestHoodieCdcSplitReaderFunction.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestHoodieSplitReaderFunction.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/split/TestDefaultHoodieSplitProvider.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/split/assign/TestHoodieSplitBucketAssigner.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestHoodieDataSource.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/TestHoodieTableFactory.javahudi-hadoop-common/src/main/java/org/apache/hudi/common/util/ParquetReaderIterator.javahudi-hadoop-common/src/main/java/org/apache/hudi/hadoop/fs/HoodieWrapperFileSystem.javahudi-hadoop-common/src/test/java/org/apache/hudi/common/table/TestHoodieTableConfig.javahudi-hadoop-common/src/test/java/org/apache/hudi/common/util/TestCompactionUtils.javahudi-hadoop-common/src/test/java/org/apache/hudi/common/util/TestParquetReaderIterator.javahudi-io/src/main/java/org/apache/hudi/common/metrics/LocalRegistry.javahudi-io/src/main/java/org/apache/hudi/common/metrics/Registry.javahudi-spark-datasource/hudi-spark-common/pom.xmlhudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scalahudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scalahudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelationV1.scalahudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelationV2.scalahudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scalahudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/metadata/CatalogBackedTableMetadata.scalahudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/util/IncrementalRelationUtil.scalahudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/HoodieTableChanges.scalahudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/HoodieVectorSearchTableValuedFunction.scalahudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlCommonUtils.scalahudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieSparkBaseAnalysis.scalahudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieVectorSearchPlanBuilder.scalahudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/analysis/TableValuedFunctions.scalahudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/analysis/VectorDistanceUtils.scalahudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/streaming/HoodieStreamSourceV1.scalahudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/streaming/HoodieStreamSourceV2.scalahudi-spark-datasource/hudi-spark/pom.xmlhudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scalahudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowCommitsProcedure.scalahudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scalahudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestHoodieVectorSearchFunction.scalahudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestIncrementalQueryColumnPruning.scalahudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scalahudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestVectorDataSource.scalahudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/avro/TestSchemaConverters.scalahudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/common/TestCatalogBackedTableMetadata.scalahudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/common/TestInstantTimeValidation.scalahudi-spark-datasource/hudi-spark3-common/src/main/scala/org/apache/spark/sql/adapter/BaseSpark3Adapter.scalahudi-spark-datasource/hudi-spark3.3.x/src/main/scala/org/apache/spark/sql/hudi/analysis/Spark33HoodiePruneFileSourcePartitions.scalahudi-spark-datasource/hudi-spark4-common/src/main/scala/org/apache/spark/sql/adapter/BaseSpark4Adapter.scalahudi-spark-datasource/hudi-spark4.0.x/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scalahudi-spark-datasource/hudi-spark4.0.x/src/test/java/org/apache/hudi/io/storage/row/TestHoodieRowParquetWriteSupportVariant.javahudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.javahudi-utilities/src/test/java/org/apache/hudi/utilities/offlinejob/TestHoodieClusteringJob.javapom.xml
💤 Files with no reviewable changes (7)
- hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/util/HoodieSchemaConverter.java
- hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieClient.java
- hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowCommitsProcedure.scala
- hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
- hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/TestEventBuffers.java
- hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestStreamWriteOperatorCoordinator.java
- hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/event/Correspondent.java
| public static Predicate<String> relativePathPrefixPredicate(List<String> relativePathPrefixes) { | ||
| return path -> relativePathPrefixes.stream().anyMatch(relativePathPrefix -> | ||
| // Partition paths stored in metadata table do not have the slash at the end. | ||
| // If the relativePathPrefix is empty, return all partition paths; | ||
| // else if the relative path prefix is the same as the path, this is an exact match; | ||
| // else, we need to make sure the path is a subdirectory of relativePathPrefix, by | ||
| // checking if the path starts with relativePathPrefix appended by a slash ("/"). | ||
| StringUtils.isNullOrEmpty(relativePathPrefix) | ||
| || path.equals(relativePathPrefix) || path.startsWith(relativePathPrefix + "/")); |
There was a problem hiding this comment.
Handle null/empty prefix lists before anyMatch to avoid dropping all partitions.
At Line 29, relativePathPrefixes.stream().anyMatch(...) returns false for an empty list (and NPE for null). That makes the predicate reject every path, while the method contract/comments imply empty-prefix behavior should match all paths.
🐛 Proposed fix
public class PartitionPathFilterUtil {
public static Predicate<String> relativePathPrefixPredicate(List<String> relativePathPrefixes) {
+ if (relativePathPrefixes == null || relativePathPrefixes.isEmpty()) {
+ return path -> true;
+ }
return path -> relativePathPrefixes.stream().anyMatch(relativePathPrefix ->
// Partition paths stored in metadata table do not have the slash at the end.
// If the relativePathPrefix is empty, return all partition paths;
// else if the relative path prefix is the same as the path, this is an exact match;
// else, we need to make sure the path is a subdirectory of relativePathPrefix, by
// checking if the path starts with relativePathPrefix appended by a slash ("/").
StringUtils.isNullOrEmpty(relativePathPrefix)
- || path.equals(relativePathPrefix) || path.startsWith(relativePathPrefix + "/"));
+ || path.equals(relativePathPrefix) || path.startsWith(relativePathPrefix + "/"));
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public static Predicate<String> relativePathPrefixPredicate(List<String> relativePathPrefixes) { | |
| return path -> relativePathPrefixes.stream().anyMatch(relativePathPrefix -> | |
| // Partition paths stored in metadata table do not have the slash at the end. | |
| // If the relativePathPrefix is empty, return all partition paths; | |
| // else if the relative path prefix is the same as the path, this is an exact match; | |
| // else, we need to make sure the path is a subdirectory of relativePathPrefix, by | |
| // checking if the path starts with relativePathPrefix appended by a slash ("/"). | |
| StringUtils.isNullOrEmpty(relativePathPrefix) | |
| || path.equals(relativePathPrefix) || path.startsWith(relativePathPrefix + "/")); | |
| public static Predicate<String> relativePathPrefixPredicate(List<String> relativePathPrefixes) { | |
| if (relativePathPrefixes == null || relativePathPrefixes.isEmpty()) { | |
| return path -> true; | |
| } | |
| return path -> relativePathPrefixes.stream().anyMatch(relativePathPrefix -> | |
| // Partition paths stored in metadata table do not have the slash at the end. | |
| // If the relativePathPrefix is empty, return all partition paths; | |
| // else if the relative path prefix is the same as the path, this is an exact match; | |
| // else, we need to make sure the path is a subdirectory of relativePathPrefix, by | |
| // checking if the path starts with relativePathPrefix appended by a slash ("/"). | |
| StringUtils.isNullOrEmpty(relativePathPrefix) | |
| || path.equals(relativePathPrefix) || path.startsWith(relativePathPrefix + "/")); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hudi-common/src/main/java/org/apache/hudi/util/PartitionPathFilterUtil.java`
around lines 28 - 36, The predicate returned by relativePathPrefixPredicate
currently calls relativePathPrefixes.stream().anyMatch(...) which will throw NPE
if relativePathPrefixes is null and will return false for an empty list, causing
every path to be rejected; update relativePathPrefixPredicate to first check if
relativePathPrefixes is null or empty and, in that case, return a predicate that
always returns true, otherwise return the existing predicate that checks
StringUtils.isNullOrEmpty(relativePathPrefix) or path.equals(relativePathPrefix)
or path.startsWith(relativePathPrefix + "/"); ensure the null check prevents NPE
and preserves described empty-prefix semantics.
| if (fileSplitIterator.hasNext()) { | ||
| recordIterator = recordIteratorFunc.apply(fileSplitIterator.next()); | ||
| return recordIterator.hasNext(); | ||
| } |
There was a problem hiding this comment.
Skip empty split iterators instead of stopping the outer iterator.
Lines 121-124 return false as soon as one per-split iterator is empty. That drops every later CDC split, and the empty iterator is left open if the caller stops after that false.
🐛 Proposed fix
- if (fileSplitIterator.hasNext()) {
- recordIterator = recordIteratorFunc.apply(fileSplitIterator.next());
- return recordIterator.hasNext();
- }
+ while (fileSplitIterator.hasNext()) {
+ recordIterator = recordIteratorFunc.apply(fileSplitIterator.next());
+ if (recordIterator.hasNext()) {
+ return true;
+ }
+ recordIterator.close();
+ recordIterator = null;
+ }
return false;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (fileSplitIterator.hasNext()) { | |
| recordIterator = recordIteratorFunc.apply(fileSplitIterator.next()); | |
| return recordIterator.hasNext(); | |
| } | |
| while (fileSplitIterator.hasNext()) { | |
| recordIterator = recordIteratorFunc.apply(fileSplitIterator.next()); | |
| if (recordIterator.hasNext()) { | |
| return true; | |
| } | |
| recordIterator.close(); | |
| recordIterator = null; | |
| } | |
| return false; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.java`
around lines 121 - 124, The current hasNext logic in CdcIterators stops on the
first empty per-split iterator (fileSplitIterator, recordIterator) which drops
subsequent splits and leaves the empty iterator open; change the loop so it
repeatedly pulls the next split via recordIteratorFunc.apply(...) and if that
per-split recordIterator hasNext() returns true, assign and return true,
otherwise close the empty recordIterator (if it implements AutoCloseable or has
a close method) and continue to the next split until none remain, returning
false only when all split iterators are exhausted. Ensure you reference and
update the code paths that set and use recordIterator, fileSplitIterator, and
recordIteratorFunc so the empty iterators are closed and skipped rather than
causing an early stop.
| @Override | ||
| public boolean hasNext() { | ||
| if (nested.hasNext()) { | ||
| currentRecord = nested.next(); | ||
| currentRecord.setRowKind(RowKind.INSERT); | ||
| return true; | ||
| } | ||
| return false; | ||
| } | ||
|
|
||
| @Override | ||
| public RowData next() { | ||
| return currentRecord; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "CdcIterators.java" 2>/dev/nullRepository: yihua/hudi
Length of output: 153
🏁 Script executed:
# Find the file and get basic info
fd -t f "CdcIterators.java" --exec wc -l {} \;Repository: yihua/hudi
Length of output: 157
🏁 Script executed:
git ls-files | grep -i "CdcIterators"Repository: yihua/hudi
Length of output: 151
🏁 Script executed:
sed -n '155,180p' ./hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.javaRepository: yihua/hudi
Length of output: 603
🏁 Script executed:
sed -n '285,335p' ./hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.javaRepository: yihua/hudi
Length of output: 1801
🏁 Script executed:
sed -n '415,445p' ./hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.javaRepository: yihua/hudi
Length of output: 832
🏁 Script executed:
grep -n "class.*Iterator\|public.*hasNext\|public.*next()" ./hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.java | head -30Repository: yihua/hudi
Length of output: 1247
🏁 Script executed:
sed -n '153,187p' ./hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.javaRepository: yihua/hudi
Length of output: 938
🏁 Script executed:
sed -n '230,368p' ./hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.javaRepository: yihua/hudi
Length of output: 6167
🏁 Script executed:
sed -n '370,479p' ./hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.javaRepository: yihua/hudi
Length of output: 4090
🏁 Script executed:
sed -n '189,228p' ./hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.javaRepository: yihua/hudi
Length of output: 1249
🏁 Script executed:
sed -n '288,332p' ./hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.java | head -20Repository: yihua/hudi
Length of output: 752
🏁 Script executed:
sed -n '420,442p' ./hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.javaRepository: yihua/hudi
Length of output: 688
🏁 Script executed:
sed -n '161,174p' ./hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.javaRepository: yihua/hudi
Length of output: 349
Don't advance the underlying stream from hasNext().
Lines 162-167, 289-332, and 421-442 all consume data during hasNext(). A second hasNext() before next() will skip records, and for updates it can drop the UPDATE_BEFORE row entirely. These iterators need an idempotent prefetch flag instead of advancing on every probe.
🔁 Fix pattern
private RowData currentRecord;
+ private boolean prefetched;
`@Override`
public boolean hasNext() {
- if (nested.hasNext()) {
+ if (prefetched) {
+ return true;
+ }
+ if (nested.hasNext()) {
currentRecord = nested.next();
currentRecord.setRowKind(RowKind.INSERT);
+ prefetched = true;
return true;
}
return false;
}
`@Override`
public RowData next() {
+ prefetched = false;
return currentRecord;
}Apply the same buffering pattern to DataLogFileIterator and BaseImageIterator.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Override | |
| public boolean hasNext() { | |
| if (nested.hasNext()) { | |
| currentRecord = nested.next(); | |
| currentRecord.setRowKind(RowKind.INSERT); | |
| return true; | |
| } | |
| return false; | |
| } | |
| @Override | |
| public RowData next() { | |
| return currentRecord; | |
| } | |
| `@Override` | |
| public boolean hasNext() { | |
| if (prefetched) { | |
| return true; | |
| } | |
| if (nested.hasNext()) { | |
| currentRecord = nested.next(); | |
| currentRecord.setRowKind(RowKind.INSERT); | |
| prefetched = true; | |
| return true; | |
| } | |
| return false; | |
| } | |
| `@Override` | |
| public RowData next() { | |
| prefetched = false; | |
| return currentRecord; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.java`
around lines 161 - 174, The hasNext() implementations in CdcIterators (and
similarly DataLogFileIterator and BaseImageIterator) currently advance the
underlying iterator (nested.next()) every probe, which can skip records when
hasNext() is called multiple times; change to an idempotent prefetch pattern:
add a nullable buffer field (e.g., bufferedRecord) and a boolean/nullable check
so hasNext() only calls nested.next() to fill bufferedRecord when it is null,
returns true if bufferedRecord != null, and next() returns bufferedRecord and
clears it (or sets it to null); update references to currentRecord to use this
buffered field so repeated hasNext() calls do not advance the nested iterator.
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (11)
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkRocksDBIndexMetrics.java (1)
55-58: Fail fast if backend is missingAt Line 55–58, the constructor should guard
rocksDBIndexBackendwith a null check to avoid deferred NPEs during metric polling.Proposed patch
import org.apache.hudi.sink.partitioner.index.RocksDBIndexBackend; import org.apache.flink.metrics.Gauge; import org.apache.flink.metrics.MetricGroup; import org.rocksdb.TickerType; +import java.util.Objects; @@ public FlinkRocksDBIndexMetrics(MetricGroup metricGroup, RocksDBIndexBackend rocksDBIndexBackend) { super(metricGroup); - this.rocksDBIndexBackend = rocksDBIndexBackend; + this.rocksDBIndexBackend = Objects.requireNonNull(rocksDBIndexBackend, "rocksDBIndexBackend must not be null"); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkRocksDBIndexMetrics.java` around lines 55 - 58, The FlinkRocksDBIndexMetrics constructor currently accepts a possibly null rocksDBIndexBackend which can cause deferred NPEs; update the FlinkRocksDBIndexMetrics(MetricGroup metricGroup, RocksDBIndexBackend rocksDBIndexBackend) constructor to validate rocksDBIndexBackend is non-null and throw a clear exception (e.g., IllegalArgumentException or NullPointerException) with a descriptive message if it is null, then assign to this.rocksDBIndexBackend only after the check.hudi-io/src/test/java/org/apache/hudi/io/util/TestIOUtils.java (1)
98-100: Expand encode round-trip vectors to include negative/boundary ints.The new encode tests are good, but they currently miss the negative/sign branch of
writeVarInt. Adding a few negative boundaries will harden this coverage.✅ Suggested test vector extension
- int[] testValues = {0, 1, 127, 128, 146, 200, 255, 256, 300, 1000, 32080, 65535, 100000, - 2034958, 632492350, Integer.MAX_VALUE}; + int[] testValues = { + Integer.MIN_VALUE, -203, -128, -113, -112, -1, + 0, 1, 127, 128, 146, 200, 255, 256, 300, 1000, 32080, 65535, 100000, + 2034958, 632492350, Integer.MAX_VALUE + };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-io/src/test/java/org/apache/hudi/io/util/TestIOUtils.java` around lines 98 - 100, The test's encode/decode vector in TestIOUtils misses negative cases and the sign-handling branch of writeVarInt; update the testValues array used in the loop (the array defined in TestIOUtils) to include negative and boundary integers (e.g., -1, -128, -129, -1000, Integer.MIN_VALUE) so the round-trip via writeVarInt/readVarInt (or the corresponding encode/decode methods) exercises the negative-paths and integer boundary behavior.hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/catalog/TestHoodieHiveCatalog.java (1)
394-405: Nice negative-path test; consider asserting the shared constant directly.Using
HoodieFileFormat.LANCE_SPARK_ONLY_ERROR_MSGin the message assertion would avoid drift if wording changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/catalog/TestHoodieHiveCatalog.java` around lines 394 - 405, In testLanceFormatNotSupportedByHiveCatalog(), replace the hard-coded error string in the assertion with the shared constant HoodieFileFormat.LANCE_SPARK_ONLY_ERROR_MSG: change the containsString("Lance base file format is currently only supported with the Spark engine") check to containsString(HoodieFileFormat.LANCE_SPARK_ONLY_ERROR_MSG) (add or update the import for HoodieFileFormat if necessary) so the test asserts against the canonical constant instead of a literal.hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestHoodieDataSource.java (1)
3274-3283: Strengthen filtered LIMIT assertion to avoid false positivesLine 3276 only checks row count. If filtering regresses, this can still pass with an incorrect row. Consider asserting deterministic content for the
par1case as well.♻️ Suggested test-hardening diff
- List<Row> result1 = CollectionUtil.iterableToList( - () -> tableEnv.sqlQuery("select * from t1 where `partition` = 'par1' limit 1").execute().collect()); - assertThat(result1).hasSize(1); + List<Row> result1 = CollectionUtil.iterableToList( + () -> tableEnv.sqlQuery("select * from t1 where `partition` = 'par1' order by uuid limit 1").execute().collect()); + assertRowsEquals(result1, "[+I[id1, Danny, 23, 1970-01-01T00:00:01, par1]]");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestHoodieDataSource.java` around lines 3274 - 3283, The test currently only asserts the row count for the par1 filter (result1) which can hide wrong rows; replace the size-only check with a deterministic content assertion using the existing helper assertRowsEquals to compare result1 against the expected single-row string for partition 'par1'. Locate the code that builds result1 (the tableEnv.sqlQuery("select * from t1 where `partition` = 'par1' limit 1").execute().collect()) in ITTestHoodieDataSource and change assertThat(result1).hasSize(1) to assertRowsEquals(result1, "<expected single-row string matching the known par1 record>") so the test verifies exact row content rather than just count.hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/metrics/TestFlinkBucketAssignMetrics.java (1)
63-90: Add coverage for clamped sampling rate inputs.Please add a test for
indexLookupSamplingRate <= 0to lock in the constructor clamp behavior and prevent modulo-by-zero regressions.✅ Suggested test addition
+ `@Test` + void testSamplingRateClampToOne() { + MetricGroup metricGroup = mock(MetricGroup.class); + FlinkBucketAssignMetrics metrics = new FlinkBucketAssignMetrics(metricGroup, 0); + metrics.registerMetrics(); + + assertTrue(metrics.shouldSampleLookup()); + assertTrue(metrics.shouldSampleLookup()); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/metrics/TestFlinkBucketAssignMetrics.java` around lines 63 - 90, Add a unit test that constructs FlinkBucketAssignMetrics with indexLookupSamplingRate <= 0 (e.g., 0 and negative) to assert the constructor clamps the value and does not cause modulo-by-zero in shouldSampleLookup; call registerMetrics(), invoke shouldSampleLookup multiple times to verify deterministic sampling behavior (no exception), and exercise recordLookupDuration/markInsert/markUpdate to validate gauges still update; reference the FlinkBucketAssignMetrics constructor and shouldSampleLookup method and reuse the existing MetricGroup/gauges mocking pattern from testLookupSamplingAndCounts to capture and assert gauge values.hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeTableServiceHandler.java (1)
32-49: Consider adding an abstractclose()declaration or a default implementation.The class implements
Closeablebut doesn't declare or implementclose(). While subclasses do implement it correctly, adding an explicit declaration would make the contract clearer and provide compile-time enforcement:♻️ Suggested improvement
public abstract class CompositeTableServiceHandler<H> implements Closeable { protected final H dataTableHandler; protected final H metadataTableHandler; protected CompositeTableServiceHandler(H dataTableHandler, H metadataTableHandler) { this.dataTableHandler = dataTableHandler; this.metadataTableHandler = metadataTableHandler; } protected H getHandler(boolean isMetadataTable) { return isMetadataTable ? metadataTableHandler : dataTableHandler; } protected void forEachHandler(Consumer<H> consumer) { consumer.accept(dataTableHandler); consumer.accept(metadataTableHandler); } + + `@Override` + public abstract void close(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeTableServiceHandler.java` around lines 32 - 49, CompositeTableServiceHandler implements Closeable but doesn't declare close(); add an explicit contract by declaring public abstract void close() throws IOException in CompositeTableServiceHandler (or provide a default implementation that closes both dataTableHandler and metadataTableHandler if they are Closeable), so subclasses are forced to implement/adhere to the Closeable contract; update subclasses to `@Override` the close() accordingly.hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeCleanHandler.java (1)
45-48: Consider exception-safe close to ensure both handlers are closed.If
dataTableHandler.close()throws,metadataTableHandler.close()won't be called, potentially leaking resources.♻️ Suggested fix for exception-safe close
`@Override` public void close() { - forEachHandler(CleanHandler::close); + Exception firstException = null; + try { + dataTableHandler.close(); + } catch (Exception e) { + firstException = e; + } + try { + metadataTableHandler.close(); + } catch (Exception e) { + if (firstException != null) { + firstException.addSuppressed(e); + } else { + firstException = e; + } + } + if (firstException != null) { + throw new RuntimeException("Failed to close handlers", firstException); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeCleanHandler.java` around lines 45 - 48, CompositeCleanHandler.close currently calls forEachHandler(CleanHandler::close) which will stop if the first handler throws and prevent the other (e.g., metadataTableHandler) from closing; change close in CompositeCleanHandler to ensure exception-safe close by invoking each handler close in its own try/catch, collecting any thrown exception (store first thrown as primary and add subsequent exceptions as suppressed) and rethrowing at the end if any occurred—this ensures both dataTableHandler and metadataTableHandler (and any handlers iterated by forEachHandler) are always closed even when one throws.hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeCompactionCommitHandler.java (1)
44-47: Same exception-safety concern inclose()as noted forCompositeCleanHandler.If the data table handler's
close()throws, the metadata table handler won't be closed. Consider the same exception-safe pattern.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeCompactionCommitHandler.java` around lines 44 - 47, The CompositeCompactionCommitHandler.close() currently calls forEachHandler(CompactionCommitHandler::close) which can abort closing other handlers if one throws; modify close() to invoke close on both underlying handlers (data table and metadata table) in a fail-safe way: call each handler's close() in its own try/catch, capture the first thrown exception and add subsequent exceptions as suppressed (or if none thrown, do nothing), and rethrow the first exception after both close attempts; locate CompositeCompactionCommitHandler.close(), the forEachHandler utility, and the underlying CompactionCommitHandler implementations to implement this exception-safe closing behavior.hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DataTableCompactionCommitHandler.java (1)
97-101: Add null-safety or document the contract thatregisterMetrics()must be called beforecommitIfNecessary().The
compactionMetricsfield is accessed in multiple places (lines 133, 143, 180, 220-221) without null checks. IfcommitIfNecessary()is called beforeregisterMetrics(), an NPE will occur.While the current usage in
CompactionCommitSink.open()ensures proper initialization order, consider defensive coding:♻️ Option 1: Add null checks at usage sites
} finally { // remove commitBuffer to avoid obsolete metadata commit reset(instant); - this.compactionMetrics.markCompactionRolledBack(); + if (this.compactionMetrics != null) { + this.compactionMetrics.markCompactionRolledBack(); + } }♻️ Option 2: Document the contract via Javadoc on registerMetrics
`@Override` + /** + * Registers compaction metrics with the given metric group. + * <p><b>Must be called before {`@link` `#commitIfNecessary`(CompactionCommitEvent)}.</b></p> + */ public void registerMetrics(MetricGroup metricGroup) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DataTableCompactionCommitHandler.java` around lines 97 - 101, The compactionMetrics field in DataTableCompactionCommitHandler can be null if commitIfNecessary() (and other methods that use compactionMetrics) is called before registerMetrics(); add defensive checks and clear error messaging: in commitIfNecessary() (and any other methods that access compactionMetrics such as where compactionMetrics.* is invoked) verify compactionMetrics != null and if null throw an IllegalStateException with a message like "registerMetrics() must be called before commitIfNecessary()" (or alternately return/no-op where appropriate), and also add a short Javadoc to registerMetrics() stating the initialization contract so callers know to call it prior to commitIfNecessary().hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/compact/handler/TestCompositeHandlers.java (1)
96-143: Consider coveringrollbackCompaction()andclose()delegation too.This suite validates
collectCompactionOperations()andregisterMetrics(), but the new composite plan handler also adds recovery/cleanup behavior. A small ordered verification for rollback/close would make regressions in the failure path much harder to miss.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/compact/handler/TestCompositeHandlers.java` around lines 96 - 143, Add ordered tests to verify that CompositeCompactionPlanHandler delegates rollbackCompaction(...) and close() to both underlying CompactionPlanHandler instances in the correct order: create mocks for dataHandler and metadataHandler, construct CompositeCompactionPlanHandler(dataHandler, metadataHandler), call handler.rollbackCompaction(someCheckpointId, output) and handler.close(), then use InOrder(inOrder = inOrder(dataHandler, metadataHandler)) to verify inOrder.verify(dataHandler).rollbackCompaction(same(checkpointId), same(output)); inOrder.verify(metadataHandler).rollbackCompaction(...); and similarly verify close() is called on dataHandler then metadataHandler, finally verifyNoMoreInteractions on both handlers. Ensure you reuse the same unique symbols from the diff: CompositeCompactionPlanHandler, CompactionPlanHandler.rollbackCompaction, and close.hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompactionPlanHandler.java (1)
38-45: Document or enforce theregisterMetrics()lifecycle.This abstraction makes metric wiring a separate mandatory call, but the concrete handlers dereference
compactionMetricsduringcollectCompactionOperations(...)without a guard. A missed registration now fails as a late NPE. Please either state the ordering contract here or give implementations a no-op/fail-fast default.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompactionPlanHandler.java` around lines 38 - 45, The interface currently requires callers to call registerMetrics before collectCompactionOperations but this is undocumented and implementations dereference compactionMetrics causing NPEs; add a clear lifecycle contract Javadoc to CompactionPlanHandler.registerMetrics stating it must be invoked before collectCompactionOperations, and supply a safe default by giving registerMetrics a default no-op implementation in the interface (so implementations that don’t need metrics don’t NPE) and/or add a fail-fast null-check at the start of collectCompactionOperations (use Objects.requireNonNull on the compactionMetrics field in concrete handlers) so missing registration produces an immediate, clear error instead of a late NPE; update all concrete handlers that reference compactionMetrics to either rely on the default no-op or explicitly guard/require registration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackHelper.java`:
- Around line 800-801: The expected rollback log version is off-by-one: when
versions 1..ctx.logVersionCount are created, the rollback should be at
maxExistingVersion + 1. Update the StoragePath/filename construction used for
the expected rollback log in TestRollbackHelper (the line creating
rollbackLogPath using partitionStoragePath and FileCreateUtils.logFileName) to
use ctx.logVersionCount + 1 (instead of ctx.logVersionCount) when computing the
log version so the expected rollback file reflects version 5 for
logVersionCount==4.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkBucketAssignMetrics.java`:
- Around line 34-37: The primitive long fields lookupCount,
indexLookupDurationMs, insertCount, and updateCount in FlinkBucketAssignMetrics
are accessed from operator code and read by Gauge callbacks on Flink's metrics
thread; replace each with java.util.concurrent.atomic.AtomicLong, initialize
them in the class, update all mutations to use incrementAndGet/addAndGet/set as
appropriate, and update gauge suppliers (registered with the MetricGroup) to
call get() on the corresponding AtomicLong so reads are thread-safe and
race-free.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/common/AbstractStreamWriteFunction.java`:
- Around line 237-250: The code merges all WriteMetadataEvent entries from
writeMetadataState before checking pendingTimeline, which can allow
completed-instant statuses to be merged into a pending instant; change the
stream to filter entries against pendingTimeline first (e.g.,
stream(writeMetadataState.get().spliterator()).filter(e ->
pendingTimeline.containsInstant(e.getInstantTime()))), then reduce using
WriteMetadataEvent::mergeWithRescale, and keep the existing logic to
metadataEvent.setTaskID(taskID), sendWriteMetadataEvent(metadataEvent) and
log.info; this ensures only events for pending instants are merged and sent.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeCompactionPlanHandler.java`:
- Around line 37-54: forEachHandler currently stops on the first exception, so
modify it to invoke each handler invocation (used by rollbackCompaction, close,
registerMetrics, collectCompactionOperations) inside its own try-catch to ensure
both handlers always run; collect any thrown exceptions (first as primary) and
add subsequent ones via addSuppressed, then after iterating rethrow the primary
exception (or return normally if none). Update forEachHandler's signature/usage
accordingly so lifecycle methods (rollbackCompaction, close, etc.) will execute
both handlers unconditionally and propagate a combined exception if any handler
failed.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DataTableCompactHandler.java`:
- Around line 127-147: The compaction metric is started in doCompaction
(compactionMetrics.startCompaction()) but endCompaction() is only called on the
happy path; wrap the main body of doCompaction (including
CompactionUtil.setAvroSchema(...), compactor.compact(...), and
collector.collect(createCommitEvent(...))) in a try block and invoke
compactionMetrics.endCompaction() in a finally block so the metric is always
finished even if exceptions are thrown; rethrow the exception after finally to
preserve behavior.
- Around line 107-110: The code submits work via executor.execute(...) which
calls doCompaction(...) and the failure callback that both invoke
collector.collect(...) from a background thread; change this so
collector.collect(...) is only invoked on the operator task thread: move
compaction work to the executor but send results back to the operator thread
(e.g., post a Runnable to the MailboxExecutor or use Flink Async I/O via
AsyncDataStream.unorderedWait/orderedWait) instead of calling
collector.collect(...) directly in doCompaction or the executor failure
callback; update references around executor.execute, doCompaction, and the
failure lambda that currently calls createFailedCommitEvent and
collector.collect to instead queue the result to MailboxExecutor (or return a
Future for AsyncDataStream) so the actual collector.collect(...) runs on the
task thread.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DefaultCleanHandler.java`:
- Around line 53-61: Reserve the cleaning slot by setting isCleaning = true
before calling executor.execute so the slot is held while the task is queued;
specifically, in DefaultCleanHandler.clean() (and the analogous block around
lines 90-99/startAsyncCleaning()) move the isCleaning assignment to just before
executor.execute and guard against executor rejection by catching exceptions
from executor.execute and resetting isCleaning = false if dispatch fails,
ensuring the existing finally in the worker thread still clears isCleaning after
HoodieFlinkWriteClient.clean() completes.
- Around line 105-115: The close() method currently closes the executor then
immediately calls writeClient.clean(), which can race with async work started by
startAsyncCleaning() because that async path bypasses the executor; modify
close() to detect in-flight async cleaning and either await its completion via
writeClient.waitForCleaningFinish() before invoking writeClient.clean() or skip
the shutdown clean entirely when async cleaning is active. Specifically, in
DefaultCleanHandler.close() after executor.close() (and before
writeClient.clean()), call the writeClient API that indicates or waits for
ongoing cleaning (e.g., writeClient.waitForCleaningFinish()) and only call
writeClient.clean() if no async cleaning is in progress, or return/skip the
extra clean when waitForCleaningFinish() completes indicating the async cleanup
already handled it.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/TableServiceHandlerFactory.java`:
- Around line 36-79: The factory methods createCompactionPlanHandler,
createCompactHandler, and createCompactionCommitHandler currently fall back to
DataTable*Handler when both OptionsResolver.needsAsyncCompaction(conf) and
OptionsResolver.needsAsyncMetadataCompaction(conf) are false; change each method
to detect the "both disabled" case first and return a no-op handler instead of a
DataTable handler (e.g. return a new NoOpCompactionPlanHandler(),
NoOpCompactHandler(), NoOpCompactionCommitHandler() respectively) so callers
that use the handler unconditionally won't perform unintended work; add minimal
NoOp* handler implementations if they don't exist and wire them into the factory
before the existing composite/metadata/data branching logic.
In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/common/TestSqlConf.scala`:
- Around line 157-160: The teardown currently calls unsetEnv(...) then
DFSPropertiesConfiguration.clearGlobalProps(), but if unsetEnv throws the
clearGlobalProps() call can be skipped; wrap the calls in a try/finally inside
the after block so unsetEnv is attempted and
DFSPropertiesConfiguration.clearGlobalProps() is always executed in the finally
clause (referencing unsetEnv and DFSPropertiesConfiguration.clearGlobalProps to
locate the lines).
---
Nitpick comments:
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkRocksDBIndexMetrics.java`:
- Around line 55-58: The FlinkRocksDBIndexMetrics constructor currently accepts
a possibly null rocksDBIndexBackend which can cause deferred NPEs; update the
FlinkRocksDBIndexMetrics(MetricGroup metricGroup, RocksDBIndexBackend
rocksDBIndexBackend) constructor to validate rocksDBIndexBackend is non-null and
throw a clear exception (e.g., IllegalArgumentException or NullPointerException)
with a descriptive message if it is null, then assign to
this.rocksDBIndexBackend only after the check.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompactionPlanHandler.java`:
- Around line 38-45: The interface currently requires callers to call
registerMetrics before collectCompactionOperations but this is undocumented and
implementations dereference compactionMetrics causing NPEs; add a clear
lifecycle contract Javadoc to CompactionPlanHandler.registerMetrics stating it
must be invoked before collectCompactionOperations, and supply a safe default by
giving registerMetrics a default no-op implementation in the interface (so
implementations that don’t need metrics don’t NPE) and/or add a fail-fast
null-check at the start of collectCompactionOperations (use
Objects.requireNonNull on the compactionMetrics field in concrete handlers) so
missing registration produces an immediate, clear error instead of a late NPE;
update all concrete handlers that reference compactionMetrics to either rely on
the default no-op or explicitly guard/require registration.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeCleanHandler.java`:
- Around line 45-48: CompositeCleanHandler.close currently calls
forEachHandler(CleanHandler::close) which will stop if the first handler throws
and prevent the other (e.g., metadataTableHandler) from closing; change close in
CompositeCleanHandler to ensure exception-safe close by invoking each handler
close in its own try/catch, collecting any thrown exception (store first thrown
as primary and add subsequent exceptions as suppressed) and rethrowing at the
end if any occurred—this ensures both dataTableHandler and metadataTableHandler
(and any handlers iterated by forEachHandler) are always closed even when one
throws.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeCompactionCommitHandler.java`:
- Around line 44-47: The CompositeCompactionCommitHandler.close() currently
calls forEachHandler(CompactionCommitHandler::close) which can abort closing
other handlers if one throws; modify close() to invoke close on both underlying
handlers (data table and metadata table) in a fail-safe way: call each handler's
close() in its own try/catch, capture the first thrown exception and add
subsequent exceptions as suppressed (or if none thrown, do nothing), and rethrow
the first exception after both close attempts; locate
CompositeCompactionCommitHandler.close(), the forEachHandler utility, and the
underlying CompactionCommitHandler implementations to implement this
exception-safe closing behavior.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeTableServiceHandler.java`:
- Around line 32-49: CompositeTableServiceHandler implements Closeable but
doesn't declare close(); add an explicit contract by declaring public abstract
void close() throws IOException in CompositeTableServiceHandler (or provide a
default implementation that closes both dataTableHandler and
metadataTableHandler if they are Closeable), so subclasses are forced to
implement/adhere to the Closeable contract; update subclasses to `@Override` the
close() accordingly.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DataTableCompactionCommitHandler.java`:
- Around line 97-101: The compactionMetrics field in
DataTableCompactionCommitHandler can be null if commitIfNecessary() (and other
methods that use compactionMetrics) is called before registerMetrics(); add
defensive checks and clear error messaging: in commitIfNecessary() (and any
other methods that access compactionMetrics such as where compactionMetrics.* is
invoked) verify compactionMetrics != null and if null throw an
IllegalStateException with a message like "registerMetrics() must be called
before commitIfNecessary()" (or alternately return/no-op where appropriate), and
also add a short Javadoc to registerMetrics() stating the initialization
contract so callers know to call it prior to commitIfNecessary().
In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/metrics/TestFlinkBucketAssignMetrics.java`:
- Around line 63-90: Add a unit test that constructs FlinkBucketAssignMetrics
with indexLookupSamplingRate <= 0 (e.g., 0 and negative) to assert the
constructor clamps the value and does not cause modulo-by-zero in
shouldSampleLookup; call registerMetrics(), invoke shouldSampleLookup multiple
times to verify deterministic sampling behavior (no exception), and exercise
recordLookupDuration/markInsert/markUpdate to validate gauges still update;
reference the FlinkBucketAssignMetrics constructor and shouldSampleLookup method
and reuse the existing MetricGroup/gauges mocking pattern from
testLookupSamplingAndCounts to capture and assert gauge values.
In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/compact/handler/TestCompositeHandlers.java`:
- Around line 96-143: Add ordered tests to verify that
CompositeCompactionPlanHandler delegates rollbackCompaction(...) and close() to
both underlying CompactionPlanHandler instances in the correct order: create
mocks for dataHandler and metadataHandler, construct
CompositeCompactionPlanHandler(dataHandler, metadataHandler), call
handler.rollbackCompaction(someCheckpointId, output) and handler.close(), then
use InOrder(inOrder = inOrder(dataHandler, metadataHandler)) to verify
inOrder.verify(dataHandler).rollbackCompaction(same(checkpointId),
same(output)); inOrder.verify(metadataHandler).rollbackCompaction(...); and
similarly verify close() is called on dataHandler then metadataHandler, finally
verifyNoMoreInteractions on both handlers. Ensure you reuse the same unique
symbols from the diff: CompositeCompactionPlanHandler,
CompactionPlanHandler.rollbackCompaction, and close.
In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/catalog/TestHoodieHiveCatalog.java`:
- Around line 394-405: In testLanceFormatNotSupportedByHiveCatalog(), replace
the hard-coded error string in the assertion with the shared constant
HoodieFileFormat.LANCE_SPARK_ONLY_ERROR_MSG: change the containsString("Lance
base file format is currently only supported with the Spark engine") check to
containsString(HoodieFileFormat.LANCE_SPARK_ONLY_ERROR_MSG) (add or update the
import for HoodieFileFormat if necessary) so the test asserts against the
canonical constant instead of a literal.
In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestHoodieDataSource.java`:
- Around line 3274-3283: The test currently only asserts the row count for the
par1 filter (result1) which can hide wrong rows; replace the size-only check
with a deterministic content assertion using the existing helper
assertRowsEquals to compare result1 against the expected single-row string for
partition 'par1'. Locate the code that builds result1 (the
tableEnv.sqlQuery("select * from t1 where `partition` = 'par1' limit
1").execute().collect()) in ITTestHoodieDataSource and change
assertThat(result1).hasSize(1) to assertRowsEquals(result1, "<expected
single-row string matching the known par1 record>") so the test verifies exact
row content rather than just count.
In `@hudi-io/src/test/java/org/apache/hudi/io/util/TestIOUtils.java`:
- Around line 98-100: The test's encode/decode vector in TestIOUtils misses
negative cases and the sign-handling branch of writeVarInt; update the
testValues array used in the loop (the array defined in TestIOUtils) to include
negative and boundary integers (e.g., -1, -128, -129, -1000, Integer.MIN_VALUE)
so the round-trip via writeVarInt/readVarInt (or the corresponding encode/decode
methods) exercises the negative-paths and integer boundary behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ae81fb9a-9969-4fe8-a551-e89efe5eb175
📒 Files selected for processing (73)
hudi-cli/src/main/java/org/apache/hudi/cli/utils/InputStreamConsumer.javahudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackActionExecutor.javahudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/MarkerBasedRollbackStrategy.javahudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelper.javahudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelperFactory.javahudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelperV1.javahudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/ZeroToOneUpgradeHandler.javahudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackHelper.javahudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestMarkerBasedRollbackStrategy.javahudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileFormat.javahudi-common/src/main/java/org/apache/hudi/common/table/timeline/versioning/v1/ArchivedTimelineV1.javahudi-common/src/main/java/org/apache/hudi/common/util/collection/RocksDBDAO.javahudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileReaderFactory.javahudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileWriterFactory.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkBucketAssignMetrics.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkMdtCompactionMetrics.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkRocksDBIndexMetrics.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/CleanFunction.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/common/AbstractStreamWriteFunction.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactOperator.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactionCommitSink.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactionPlanOperator.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CleanHandler.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompactHandler.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompactionCommitHandler.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompactionPlanHandler.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeCleanHandler.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeCompactHandler.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeCompactionCommitHandler.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeCompactionPlanHandler.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeTableServiceHandler.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DataTableCompactHandler.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DataTableCompactionCommitHandler.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DataTableCompactionPlanHandler.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DefaultCleanHandler.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/MetadataTableCompactHandler.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/MetadataTableCompactionCommitHandler.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/MetadataTableCompactionPlanHandler.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/TableServiceHandlerFactory.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/event/WriteMetadataEvent.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/BucketAssignFunction.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/index/IndexBackend.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/index/RocksDBIndexBackend.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/FlinkRowDataReaderContext.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/metrics/TestFlinkBucketAssignMetrics.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/metrics/TestFlinkCompactionMetrics.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/compact/handler/TestCompositeHandlers.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/event/TestWriteMetadataEvent.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/partitioner/index/TestRocksDBIndexBackend.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestHoodieDataSource.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/TestHoodieTableFactory.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/catalog/TestHoodieHiveCatalog.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/format/TestFlinkRowDataReaderContext.javahudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySchemaResolver.javahudi-gcp/src/test/java/org/apache/hudi/gcp/bigquery/TestBigQuerySchemaResolver.javahudi-hadoop-common/src/test/java/org/apache/hudi/common/table/timeline/TestArchivedTimelineV1.javahudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HiveHoodieReaderContext.javahudi-io/src/main/java/org/apache/hudi/io/hfile/HFileMetaIndexBlock.javahudi-io/src/main/java/org/apache/hudi/io/hfile/HFileRootIndexBlock.javahudi-io/src/main/java/org/apache/hudi/io/util/IOUtils.javahudi-io/src/test/java/org/apache/hudi/io/hfile/TestHFileWriter.javahudi-io/src/test/java/org/apache/hudi/io/util/TestIOUtils.javahudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestColumnStatsIndex.scalahudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestPartitionStatsPruning.scalahudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/common/TestSqlConf.scalahudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.javahudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestSparkSchemaUtils.javahudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/util/TestHiveSchemaUtil.javahudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/util/SparkSchemaUtils.javapom.xml
✅ Files skipped from review due to trivial changes (3)
- hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompactionCommitHandler.java
- hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/metrics/TestFlinkCompactionMetrics.java
- hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/event/TestWriteMetadataEvent.java
🚧 Files skipped from review as they are similar to previous changes (7)
- hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/index/IndexBackend.java
- hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
- hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/partitioner/index/TestRocksDBIndexBackend.java
- hudi-common/src/main/java/org/apache/hudi/common/util/collection/RocksDBDAO.java
- hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/index/RocksDBIndexBackend.java
- hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/TestHoodieTableFactory.java
- hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java
| StoragePath rollbackLogPath = new StoragePath(partitionStoragePath, | ||
| FileCreateUtils.logFileName(ctx.baseInstantTimeOfLogFiles, ctx.logFileId, ctx.logVersionCount)); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the rollback log version behavior by checking RollbackHelperV1 implementation
ast-grep --pattern $'class RollbackHelperV1 {
$$$
}'
# Also check the log file version computation logic
rg -n "logVersion|LOGFILE_BASE_VERSION|nextVersion" --type java -C3 -g '*RollbackHelper*'Repository: yihua/hudi
Length of output: 23240
🏁 Script executed:
# Get the full test method around line 800-801 to see what assertion is made
sed -n '750,820p' hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackHelper.javaRepository: yihua/hudi
Length of output: 3349
🏁 Script executed:
# Check the test at line 737 (testV1MaybeDeleteAndCollectStatsWithSingleRequestPerFileGroup)
sed -n '720,750p' hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackHelper.javaRepository: yihua/hudi
Length of output: 1705
🏁 Script executed:
# Check the test at line 702 (testV1MaybeDeleteAndCollectStatsWithMultipleRequestsPerFileGroup)
sed -n '685,715p' hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackHelper.javaRepository: yihua/hudi
Length of output: 1899
🏁 Script executed:
# Find the ROLLBACK_LOG_VERSION constant definition
rg -n "ROLLBACK_LOG_VERSION" hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackHelper.javaRepository: yihua/hudi
Length of output: 500
🏁 Script executed:
# Check the SingleRequestTestContext and MultiRequestTestContext to see what log files they create
sed -n '400,470p' hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackHelper.javaRepository: yihua/hudi
Length of output: 3798
🏁 Script executed:
# Find SingleRequestTestContext and MultiRequestTestContext builders
sed -n '80,200p' hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackHelper.javaRepository: yihua/hudi
Length of output: 6332
🏁 Script executed:
# Get the full assertion at lines 802-811
sed -n '802,811p' hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackHelper.javaRepository: yihua/hudi
Length of output: 554
🏁 Script executed:
# Let's also check if there's any actual file system check on the rollbackLogPath
sed -n '788,815p' hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackHelper.javaRepository: yihua/hudi
Length of output: 1474
🏁 Script executed:
# Check if the test verifies the rollback log exists and what version it expects
rg -n "storage.exists.*rollbackLogPath\|storage.getPathInfo.*rollbackLogPath" hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackHelper.java -A2 -B2Repository: yihua/hudi
Length of output: 36
🏁 Script executed:
# Let's understand what maybeDeleteAndCollectStats actually does - check if it creates the rollback log
rg -n "getOutputStream\|nextVersion\|rolloverVersion" hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelperV1.java -A2 -B2Repository: yihua/hudi
Length of output: 36
🏁 Script executed:
# Search for where rollback logs are written in RollbackHelperV1
rg -n "getOutputStream\|HoodieLogFile\|logFileName" hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelperV1.java -A1 -B1 | head -80Repository: yihua/hudi
Length of output: 36
🏁 Script executed:
# Check what storage.getPathInfo does and if it requires the file to exist
rg -n "getPathInfo" hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackHelper.java -B2 -A2Repository: yihua/hudi
Length of output: 1405
🏁 Script executed:
# Search for HoodieLogFile instantiation and where logs are written
rg -n "new HoodieLogFile\|\.write\|getOutputStream" hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/ -A2 -B2 | head -100Repository: yihua/hudi
Length of output: 36
🏁 Script executed:
# Check the HoodieLogFormat or HoodieLogFile class for how it determines next version
fd "HoodieLogFile.java" hudi-client/hudi-client-common/src/main/java/ -exec cat {} \; | head -150Repository: yihua/hudi
Length of output: 200
🏁 Script executed:
# Find HoodieLogFile.java
fd "HoodieLogFile.java" hudi-client/hudi-client-common/src/main/java/Repository: yihua/hudi
Length of output: 36
🏁 Script executed:
# Look for where rollback logs are actually created and at what version
rg -n "HoodieLogFile\.Builder\|withLogVersion\|logVersion.*=" hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelperV1.java -A2 -B2Repository: yihua/hudi
Length of output: 36
🏁 Script executed:
# Look at how the rollback log version is computed
sed -n '290,350p' hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelperV1.javaRepository: yihua/hudi
Length of output: 4147
🏁 Script executed:
# Check if there's a comment or logic about version rollover
rg -n "rollover\|nextVersion\|logVersion\s*=" hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelperV1.java -B3 -A3 | head -60Repository: yihua/hudi
Length of output: 36
🏁 Script executed:
# Check what getPathInfo does - does it require the file to exist?
rg -n "getPathInfo" hudi-client/hudi-client-common/src/main/java/org/apache/hudi/storage/ -A3 | head -50Repository: yihua/hudi
Length of output: 163
🏁 Script executed:
# Let's trace through the actual test to understand what files should exist after rollback
# Check if there are any assertions that verify the file exists at line 801
sed -n '794,815p' hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackHelper.javaRepository: yihua/hudi
Length of output: 1096
🏁 Script executed:
# Look at lines around 135-136 where max version is stored
sed -n '130,150p' hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelperV1.javaRepository: yihua/hudi
Length of output: 1156
🏁 Script executed:
# Check the comment about rollover again and surrounding context
sed -n '325,340p' hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelperV1.javaRepository: yihua/hudi
Length of output: 991
🏁 Script executed:
# Search for getPathInfo implementation to see if it requires the file to exist
fd -type f -name "*.java" hudi-client/hudi-client-common/src/main/java | xargs rg -l "getPathInfo" | head -5Repository: yihua/hudi
Length of output: 784
🏁 Script executed:
# Check the test assertion to see if it validates file existence
rg -n "assertRollbackStatsEquals" hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackHelper.java -A10 | head -30Repository: yihua/hudi
Length of output: 1515
🏁 Script executed:
# Let me verify the exact pattern. In the MultiRequestTestContext:
# logFileId1: IntStream.of(1) creates version 1
# Expected rollback at line 704: version 2
# That's 1 file at version 1 → rollback at version 2 (1+1)
# In SingleRequestTestContext:
# IntStream.range(1, ROLLBACK_LOG_VERSION) where ROLLBACK_LOG_VERSION=20
# That's versions 1-19
# Expected rollback at line 737: version 20 (19+1)
# In V1LogBlockTestContext:
# IntStream.range(1, ctx.logVersionCount + 1) where logVersionCount=4
# That's versions 1-4
# Expected rollback at line 801: version 4 (which should be 4+1=5)
# Confirm ROLLBACK_LOG_VERSION value
rg -n "ROLLBACK_LOG_VERSION\s*=" hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackHelper.javaRepository: yihua/hudi
Length of output: 110
🏁 Script executed:
# Verify the exact log versions created in MultiRequestTestContext
sed -n '118,122p' hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackHelper.javaRepository: yihua/hudi
Length of output: 511
Fix off-by-one error in expected rollback log version.
Versions 1–4 are created via IntStream.range(1, ctx.logVersionCount + 1). Following the pattern established elsewhere in the file, where the rollback log is written to maxExistingVersion + 1:
- Line 737: versions 1–19 exist → rollback at version 20
- Line 704: version 1 exists → rollback at version 2
The expected rollback log here should be at version 5, not version 4.
Proposed fix
StoragePath rollbackLogPath = new StoragePath(partitionStoragePath,
- FileCreateUtils.logFileName(ctx.baseInstantTimeOfLogFiles, ctx.logFileId, ctx.logVersionCount));
+ FileCreateUtils.logFileName(ctx.baseInstantTimeOfLogFiles, ctx.logFileId, ctx.logVersionCount + 1));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| StoragePath rollbackLogPath = new StoragePath(partitionStoragePath, | |
| FileCreateUtils.logFileName(ctx.baseInstantTimeOfLogFiles, ctx.logFileId, ctx.logVersionCount)); | |
| StoragePath rollbackLogPath = new StoragePath(partitionStoragePath, | |
| FileCreateUtils.logFileName(ctx.baseInstantTimeOfLogFiles, ctx.logFileId, ctx.logVersionCount + 1)); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackHelper.java`
around lines 800 - 801, The expected rollback log version is off-by-one: when
versions 1..ctx.logVersionCount are created, the rollback should be at
maxExistingVersion + 1. Update the StoragePath/filename construction used for
the expected rollback log in TestRollbackHelper (the line creating
rollbackLogPath using partitionStoragePath and FileCreateUtils.logFileName) to
use ctx.logVersionCount + 1 (instead of ctx.logVersionCount) when computing the
log version so the expected rollback file reflects version 5 for
logVersionCount==4.
| private long lookupCount; | ||
| private long indexLookupDurationMs; | ||
| private long insertCount; | ||
| private long updateCount; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Read-only verification: confirm current implementation uses unsynchronized mutable longs
# and where they are read/written.
rg -n "private long (lookupCount|indexLookupDurationMs|insertCount|updateCount)|gauge\\(|\\+= 1|indexLookupDurationMs\\s*=" \
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkBucketAssignMetrics.javaRepository: yihua/hudi
Length of output: 557
🏁 Script executed:
cat -n hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkBucketAssignMetrics.javaRepository: yihua/hudi
Length of output: 2763
🏁 Script executed:
fd -type f -name "HoodieFlinkMetrics.java" | head -5Repository: yihua/hudi
Length of output: 224
🏁 Script executed:
fd --type f -name "HoodieFlinkMetrics.java"Repository: yihua/hudi
Length of output: 284
🏁 Script executed:
find . -name "HoodieFlinkMetrics.java"Repository: yihua/hudi
Length of output: 150
🏁 Script executed:
cat -n ./hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/HoodieFlinkMetrics.javaRepository: yihua/hudi
Length of output: 2337
🏁 Script executed:
rg "FlinkBucketAssignMetrics" --type java -B 3 -A 3Repository: yihua/hudi
Length of output: 11020
🏁 Script executed:
rg "AtomicLong|volatile long" --type java hudi-flink-datasource/ | head -20Repository: yihua/hudi
Length of output: 1171
🏁 Script executed:
cat -n hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/RecordLimiter.java | head -60Repository: yihua/hudi
Length of output: 2709
Make gauge backing fields thread-safe.
These counters and duration fields are mutated by operator methods and read by Gauge callbacks registered with Flink's MetricGroup, which runs on a separate metrics reporting thread. This creates a data race that can yield torn or stale metric values. Switch to AtomicLong for thread-safe access.
🔧 Suggested fix
import org.apache.flink.metrics.Gauge;
import org.apache.flink.metrics.MetricGroup;
+import java.util.concurrent.atomic.AtomicLong;
@@
- private long lookupCount;
- private long indexLookupDurationMs;
- private long insertCount;
- private long updateCount;
+ private final AtomicLong lookupCount = new AtomicLong();
+ private final AtomicLong indexLookupDurationMs = new AtomicLong();
+ private final AtomicLong insertCount = new AtomicLong();
+ private final AtomicLong updateCount = new AtomicLong();
@@
- metricGroup.gauge(INDEX_LOOKUP_DURATION_MS, (Gauge<Long>) () -> indexLookupDurationMs);
- metricGroup.gauge(INSERT_COUNT, (Gauge<Long>) () -> insertCount);
- metricGroup.gauge(UPDATE_COUNT, (Gauge<Long>) () -> updateCount);
+ metricGroup.gauge(INDEX_LOOKUP_DURATION_MS, (Gauge<Long>) indexLookupDurationMs::get);
+ metricGroup.gauge(INSERT_COUNT, (Gauge<Long>) insertCount::get);
+ metricGroup.gauge(UPDATE_COUNT, (Gauge<Long>) updateCount::get);
@@
- lookupCount += 1;
- return lookupCount % indexLookupSamplingRate == 0;
+ return lookupCount.incrementAndGet() % indexLookupSamplingRate == 0;
@@
- this.indexLookupDurationMs = lookupDurationMs;
+ this.indexLookupDurationMs.set(lookupDurationMs);
@@
- insertCount += 1;
+ insertCount.incrementAndGet();
@@
- updateCount += 1;
+ updateCount.incrementAndGet();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkBucketAssignMetrics.java`
around lines 34 - 37, The primitive long fields lookupCount,
indexLookupDurationMs, insertCount, and updateCount in FlinkBucketAssignMetrics
are accessed from operator code and read by Gauge callbacks on Flink's metrics
thread; replace each with java.util.concurrent.atomic.AtomicLong, initialize
them in the class, update all mutations to use incrementAndGet/addAndGet/set as
appropriate, and update gauge suppliers (registered with the MetricGroup) to
call get() on the corresponding AtomicLong so reads are thread-safe and
race-free.
| StreamSupport.stream(this.writeMetadataState.get().spliterator(), false) | ||
| .reduce(WriteMetadataEvent::mergeWithRescale) | ||
| .ifPresent(metadataEvent -> { | ||
| // Must filter out the completed instants in case it is a partial failover, | ||
| // the write status should not be accumulated in such case. | ||
| if (pendingTimeline.containsInstant(metadataEvent.getInstantTime())) { | ||
| // Reset taskID for event | ||
| metadataEvent.setTaskID(taskID); | ||
| // The checkpoint succeed but the meta does not commit, | ||
| // re-commit the inflight instant | ||
| sendWriteMetadataEvent(metadataEvent); | ||
| log.info("Send uncommitted write metadata event to coordinator, task[{}].", taskID); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Filter pending instants before merge to avoid cross-instant status accumulation.
At Line 237-239, events are merged first; at Line 240-243, filtering happens after. That can accumulate completed-instant writeStatuses into a pending instant during restore/rescale.
🐛 Proposed fix
+import java.util.stream.Collectors;
import java.util.stream.StreamSupport;
@@
- StreamSupport.stream(this.writeMetadataState.get().spliterator(), false)
- .reduce(WriteMetadataEvent::mergeWithRescale)
- .ifPresent(metadataEvent -> {
- // Must filter out the completed instants in case it is a partial failover,
- // the write status should not be accumulated in such case.
- if (pendingTimeline.containsInstant(metadataEvent.getInstantTime())) {
- // Reset taskID for event
- metadataEvent.setTaskID(taskID);
- // The checkpoint succeed but the meta does not commit,
- // re-commit the inflight instant
- sendWriteMetadataEvent(metadataEvent);
- log.info("Send uncommitted write metadata event to coordinator, task[{}].", taskID);
- }
- });
+ StreamSupport.stream(this.writeMetadataState.get().spliterator(), false)
+ .filter(event -> pendingTimeline.containsInstant(event.getInstantTime()))
+ .collect(Collectors.groupingBy(WriteMetadataEvent::getInstantTime))
+ .values()
+ .forEach(events -> events.stream()
+ .reduce(WriteMetadataEvent::mergeWithRescale)
+ .ifPresent(metadataEvent -> {
+ // Reset taskID for event
+ metadataEvent.setTaskID(taskID);
+ // The checkpoint succeed but the meta does not commit,
+ // re-commit the inflight instant
+ sendWriteMetadataEvent(metadataEvent);
+ log.info("Send uncommitted write metadata event to coordinator, task[{}].", taskID);
+ }));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/common/AbstractStreamWriteFunction.java`
around lines 237 - 250, The code merges all WriteMetadataEvent entries from
writeMetadataState before checking pendingTimeline, which can allow
completed-instant statuses to be merged into a pending instant; change the
stream to filter entries against pendingTimeline first (e.g.,
stream(writeMetadataState.get().spliterator()).filter(e ->
pendingTimeline.containsInstant(e.getInstantTime()))), then reduce using
WriteMetadataEvent::mergeWithRescale, and keep the existing logic to
metadataEvent.setTaskID(taskID), sendWriteMetadataEvent(metadataEvent) and
log.info; this ensures only events for pending instants are merged and sent.
| public void rollbackCompaction() { | ||
| forEachHandler(CompactionPlanHandler::rollbackCompaction); | ||
| } | ||
|
|
||
| @Override | ||
| public void registerMetrics(MetricGroup metricGroup) { | ||
| forEachHandler(compactionPlanHandler -> compactionPlanHandler.registerMetrics(metricGroup)); | ||
| } | ||
|
|
||
| @Override | ||
| public void collectCompactionOperations(long checkpointId, | ||
| Output<StreamRecord<CompactionPlanEvent>> output) { | ||
| forEachHandler(compactionPlanHandler -> compactionPlanHandler.collectCompactionOperations(checkpointId, output)); | ||
| } | ||
|
|
||
| @Override | ||
| public void close() { | ||
| forEachHandler(CompactionPlanHandler::close); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE=$(fd 'CompositeTableServiceHandler.java' hudi-flink-datasource/hudi-flink/src/main/java | head -n1)
sed -n '1,220p' "$FILE"
echo
echo "---- usages ----"
rg -n -C2 'forEachHandler\s*\(' hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handlerRepository: yihua/hudi
Length of output: 11386
🏁 Script executed:
rg -n 'forEachHandler' hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler --type java -A5 -B5 | head -100Repository: yihua/hudi
Length of output: 13741
🏁 Script executed:
# Check if there are any try-catch blocks wrapping the handler invocations
rg -n 'try\s*\{' hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeCompactionPlanHandler.java -A10Repository: yihua/hudi
Length of output: 36
Add exception handling to forEachHandler() to ensure both handlers execute in lifecycle methods.
forEachHandler() stops at the first exception, preventing the metadata handler from running if the data handler fails. For critical operations like rollbackCompaction() and close(), this leaves the second write client or metadata table in an inconsistent state. Wrap handler invocations in forEachHandler() with try-catch logic that collects exceptions and invokes both handlers unconditionally, then throws a suppressed exception if either handler failed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeCompactionPlanHandler.java`
around lines 37 - 54, forEachHandler currently stops on the first exception, so
modify it to invoke each handler invocation (used by rollbackCompaction, close,
registerMetrics, collectCompactionOperations) inside its own try-catch to ensure
both handlers always run; collect any thrown exceptions (first as primary) and
add subsequent ones via addSuppressed, then after iterating rethrow the primary
exception (or return normally if none). Update forEachHandler's signature/usage
accordingly so lifecycle methods (rollbackCompaction, close, etc.) will execute
both handlers unconditionally and propagate a combined exception if any handler
failed.
| if (executor != null) { | ||
| executor.execute( | ||
| () -> doCompaction(event, collector, needReloadMetaClient), | ||
| (errMsg, t) -> collector.collect(createFailedCommitEvent(event)), |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Apache Flink, is org.apache.flink.util.Collector safe to call from a background thread outside the operator or mailbox thread, and what is the recommended pattern for emitting results from asynchronous operator work?
💡 Result:
No, org.apache.flink.util.Collector is not safe to call directly from a background thread outside the operator or mailbox thread. Flink enforces a single-threaded execution model for user code (processElement, callbacks, onTimer, etc.) within the task thread, as per FLIP-425 and async state documentation. Direct calls from external threads risk concurrency issues, data races, or violations of this model. The recommended pattern for emitting results from asynchronous operator work is to use Flink's Async I/O API (AsyncDataStream.unorderedWait or orderedWait). Implement an AsyncFunction where asyncInvoke returns quickly and completes a provided ResultFuture.complete(collectionOfResults) from the async callback. The operator handles buffering, ordering (if ordered), timeouts, fault tolerance, and emission via its internal emitter on the task thread. This ensures thread-safety and exactly-once guarantees. For State V2 async access, callbacks like StateFuture.thenAccept execute on the single task thread, allowing safe collector use there, but not from arbitrary background threads.
Citations:
- 1: https://stackoverflow.com/questions/79790804/are-flinks-timer-service-collector-thread-safe
- 2: https://nightlies.apache.org/flink/flink-docs-stable/api/java/org/apache/flink/util/Collector.html
- 3: https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/
- 4: https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/operators/async/AsyncWaitOperator.html
- 5: https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/datastream/fault-tolerance/state_v2/
- 6: https://nightlies.apache.org/flink/flink-docs-stable/api/java/org/apache/flink/streaming/api/functions/async/AsyncFunction.html
- 7: https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state_v2/
- 8: https://nightlies.apache.org/flink/flink-docs-stable/api/java/org/apache/flink/streaming/api/functions/async/ResultFuture.html
- 9: https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/common/operators/MailboxExecutor.html
- 10: https://issues.apache.org/jira/browse/FLINK-37642
🏁 Script executed:
find . -type f -name "DataTableCompactHandler.java" | head -5Repository: yihua/hudi
Length of output: 168
🏁 Script executed:
find . -type f -name "DataTableCompactHandler.java" -exec wc -l {} \;Repository: yihua/hudi
Length of output: 172
🏁 Script executed:
fd DataTableCompactHandler.java -x wc -lRepository: yihua/hudi
Length of output: 172
🏁 Script executed:
fd DataTableCompactHandler.javaRepository: yihua/hudi
Length of output: 166
🏁 Script executed:
git ls-files | grep -i datatablecompacthandlerRepository: yihua/hudi
Length of output: 283
🏁 Script executed:
cat -n hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DataTableCompactHandler.javaRepository: yihua/hudi
Length of output: 10329
🏁 Script executed:
sed -n '100,160p' hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DataTableCompactHandler.javaRepository: yihua/hudi
Length of output: 2790
🏁 Script executed:
find . -type f -name "NonThrownExecutor.java"Repository: yihua/hudi
Length of output: 152
🏁 Script executed:
cat -n ./hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/NonThrownExecutor.javaRepository: yihua/hudi
Length of output: 7888
Fix the threading violation: collector.collect() must not be called from the executor thread.
The executor.execute() calls at lines 107-110 submit work to a background thread (NonThrownExecutor uses a single-thread executor pool). Both the success path in doCompaction() (line 147) and the failure callback (line 110) invoke collector.collect() from this background thread. Flink's Collector is not thread-safe and must only be called from the operator's task thread. Calling it from background threads violates Flink's single-threaded execution model and risks data races with checkpointing and output emission.
Use Flink's Async I/O API (AsyncDataStream.unorderedWait/orderedWait) or emit results back to the task thread via MailboxExecutor instead.
Also applies to: 127-147
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DataTableCompactHandler.java`
around lines 107 - 110, The code submits work via executor.execute(...) which
calls doCompaction(...) and the failure callback that both invoke
collector.collect(...) from a background thread; change this so
collector.collect(...) is only invoked on the operator task thread: move
compaction work to the executor but send results back to the operator thread
(e.g., post a Runnable to the MailboxExecutor or use Flink Async I/O via
AsyncDataStream.unorderedWait/orderedWait) instead of calling
collector.collect(...) directly in doCompaction or the executor failure
callback; update references around executor.execute, doCompaction, and the
failure lambda that currently calls createFailedCommitEvent and
collector.collect to instead queue the result to MailboxExecutor (or return a
Future for AsyncDataStream) so the actual collector.collect(...) runs on the
task thread.
| protected void doCompaction(CompactionPlanEvent event, | ||
| Collector<CompactionCommitEvent> collector, | ||
| boolean needReloadMetaClient) throws Exception { | ||
| compactionMetrics.startCompaction(); | ||
| HoodieFlinkMergeOnReadTableCompactor<?> compactor = new HoodieFlinkMergeOnReadTableCompactor<>(); | ||
| HoodieTableMetaClient metaClient = table.getMetaClient(); | ||
| if (needReloadMetaClient) { | ||
| // reload the timeline | ||
| metaClient.reload(); | ||
| } | ||
| // schema evolution | ||
| CompactionUtil.setAvroSchema(writeClient.getConfig(), metaClient); | ||
| List<WriteStatus> writeStatuses = compactor.compact( | ||
| writeClient.getConfig(), | ||
| event.getOperation(), | ||
| event.getCompactionInstantTime(), | ||
| table.getTaskContextSupplier(), | ||
| createReaderContext(needReloadMetaClient), | ||
| table); | ||
| compactionMetrics.endCompaction(); | ||
| collector.collect(createCommitEvent(event, writeStatuses)); |
There was a problem hiding this comment.
Always finish the compaction metric in a finally block.
startCompaction() runs for every attempt, but endCompaction() only runs after the happy path completes. Any exception from schema setup, the compactor, or even collector.collect(...) leaves the metric stuck in-progress and will skew the new compaction metrics after the first failure.
Proposed fix
protected void doCompaction(CompactionPlanEvent event,
Collector<CompactionCommitEvent> collector,
boolean needReloadMetaClient) throws Exception {
compactionMetrics.startCompaction();
- HoodieFlinkMergeOnReadTableCompactor<?> compactor = new HoodieFlinkMergeOnReadTableCompactor<>();
- HoodieTableMetaClient metaClient = table.getMetaClient();
- if (needReloadMetaClient) {
- // reload the timeline
- metaClient.reload();
- }
- // schema evolution
- CompactionUtil.setAvroSchema(writeClient.getConfig(), metaClient);
- List<WriteStatus> writeStatuses = compactor.compact(
- writeClient.getConfig(),
- event.getOperation(),
- event.getCompactionInstantTime(),
- table.getTaskContextSupplier(),
- createReaderContext(needReloadMetaClient),
- table);
- compactionMetrics.endCompaction();
- collector.collect(createCommitEvent(event, writeStatuses));
+ try {
+ HoodieFlinkMergeOnReadTableCompactor<?> compactor = new HoodieFlinkMergeOnReadTableCompactor<>();
+ HoodieTableMetaClient metaClient = table.getMetaClient();
+ if (needReloadMetaClient) {
+ metaClient.reload();
+ }
+ CompactionUtil.setAvroSchema(writeClient.getConfig(), metaClient);
+ List<WriteStatus> writeStatuses = compactor.compact(
+ writeClient.getConfig(),
+ event.getOperation(),
+ event.getCompactionInstantTime(),
+ table.getTaskContextSupplier(),
+ createReaderContext(needReloadMetaClient),
+ table);
+ collector.collect(createCommitEvent(event, writeStatuses));
+ } finally {
+ compactionMetrics.endCompaction();
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| protected void doCompaction(CompactionPlanEvent event, | |
| Collector<CompactionCommitEvent> collector, | |
| boolean needReloadMetaClient) throws Exception { | |
| compactionMetrics.startCompaction(); | |
| HoodieFlinkMergeOnReadTableCompactor<?> compactor = new HoodieFlinkMergeOnReadTableCompactor<>(); | |
| HoodieTableMetaClient metaClient = table.getMetaClient(); | |
| if (needReloadMetaClient) { | |
| // reload the timeline | |
| metaClient.reload(); | |
| } | |
| // schema evolution | |
| CompactionUtil.setAvroSchema(writeClient.getConfig(), metaClient); | |
| List<WriteStatus> writeStatuses = compactor.compact( | |
| writeClient.getConfig(), | |
| event.getOperation(), | |
| event.getCompactionInstantTime(), | |
| table.getTaskContextSupplier(), | |
| createReaderContext(needReloadMetaClient), | |
| table); | |
| compactionMetrics.endCompaction(); | |
| collector.collect(createCommitEvent(event, writeStatuses)); | |
| protected void doCompaction(CompactionPlanEvent event, | |
| Collector<CompactionCommitEvent> collector, | |
| boolean needReloadMetaClient) throws Exception { | |
| compactionMetrics.startCompaction(); | |
| try { | |
| HoodieFlinkMergeOnReadTableCompactor<?> compactor = new HoodieFlinkMergeOnReadTableCompactor<>(); | |
| HoodieTableMetaClient metaClient = table.getMetaClient(); | |
| if (needReloadMetaClient) { | |
| metaClient.reload(); | |
| } | |
| CompactionUtil.setAvroSchema(writeClient.getConfig(), metaClient); | |
| List<WriteStatus> writeStatuses = compactor.compact( | |
| writeClient.getConfig(), | |
| event.getOperation(), | |
| event.getCompactionInstantTime(), | |
| table.getTaskContextSupplier(), | |
| createReaderContext(needReloadMetaClient), | |
| table); | |
| collector.collect(createCommitEvent(event, writeStatuses)); | |
| } finally { | |
| compactionMetrics.endCompaction(); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DataTableCompactHandler.java`
around lines 127 - 147, The compaction metric is started in doCompaction
(compactionMetrics.startCompaction()) but endCompaction() is only called on the
happy path; wrap the main body of doCompaction (including
CompactionUtil.setAvroSchema(...), compactor.compact(...), and
collector.collect(createCommitEvent(...))) in a try block and invoke
compactionMetrics.endCompaction() in a finally block so the metric is always
finished even if exceptions are thrown; rethrow the exception after finally to
preserve behavior.
| public void clean() { | ||
| executor.execute(() -> { | ||
| this.isCleaning = true; | ||
| try { | ||
| this.writeClient.clean(); | ||
| } finally { | ||
| this.isCleaning = false; | ||
| } | ||
| }, "wait for cleaning finish"); |
There was a problem hiding this comment.
Reserve the cleaning slot before dispatching work.
clean() only flips isCleaning after the executor thread starts running, so the first checkpoint can still enter startAsyncCleaning() and kick off a second clean while the initial one is merely queued. That makes overlapping clean operations possible against the same HoodieFlinkWriteClient.
Suggested fix
+import java.util.concurrent.atomic.AtomicBoolean;
+
public class DefaultCleanHandler implements CleanHandler {
private final HoodieFlinkWriteClient writeClient;
- private volatile boolean isCleaning;
+ private final AtomicBoolean isCleaning = new AtomicBoolean(false);
private final NonThrownExecutor executor;
@@
public void clean() {
+ if (!isCleaning.compareAndSet(false, true)) {
+ return;
+ }
executor.execute(() -> {
- this.isCleaning = true;
try {
this.writeClient.clean();
} finally {
- this.isCleaning = false;
+ isCleaning.set(false);
}
- }, "wait for cleaning finish");
+ }, "clean");
}
@@
public void startAsyncCleaning() {
- if (!isCleaning) {
+ if (isCleaning.compareAndSet(false, true)) {
try {
this.writeClient.startAsyncCleaning();
- this.isCleaning = true;
} catch (Throwable throwable) {
+ isCleaning.set(false);
// catch the exception to not affect the normal checkpointing
log.warn("Failed to start async cleaning", throwable);
}
}
}Also applies to: 90-99
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DefaultCleanHandler.java`
around lines 53 - 61, Reserve the cleaning slot by setting isCleaning = true
before calling executor.execute so the slot is held while the task is queued;
specifically, in DefaultCleanHandler.clean() (and the analogous block around
lines 90-99/startAsyncCleaning()) move the isCleaning assignment to just before
executor.execute and guard against executor rejection by catching exceptions
from executor.execute and resetting isCleaning = false if dispatch fails,
ensuring the existing finally in the worker thread still clears isCleaning after
HoodieFlinkWriteClient.clean() completes.
| @Override | ||
| public void close() { | ||
| if (executor != null) { | ||
| try { | ||
| executor.close(); | ||
| } catch (Exception e) { | ||
| throw new HoodieException("Failed to close executor of clean handler.", e); | ||
| } | ||
| } | ||
| this.writeClient.clean(); | ||
| } |
There was a problem hiding this comment.
Wait for in-flight async cleaning before the shutdown clean.
startAsyncCleaning() bypasses executor, so executor.close() does not wait for that work. Line 114 can therefore race a new writeClient.clean() against an already running async clean. Either block on writeClient.waitForCleaningFinish() when cleaning is in progress, or skip the extra shutdown clean entirely.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DefaultCleanHandler.java`
around lines 105 - 115, The close() method currently closes the executor then
immediately calls writeClient.clean(), which can race with async work started by
startAsyncCleaning() because that async path bypasses the executor; modify
close() to detect in-flight async cleaning and either await its completion via
writeClient.waitForCleaningFinish() before invoking writeClient.clean() or skip
the shutdown clean entirely when async cleaning is active. Specifically, in
DefaultCleanHandler.close() after executor.close() (and before
writeClient.clean()), call the writeClient API that indicates or waits for
ongoing cleaning (e.g., writeClient.waitForCleaningFinish()) and only call
writeClient.clean() if no async cleaning is in progress, or return/skip the
extra clean when waitForCleaningFinish() completes indicating the async cleanup
already handled it.
| public static CompactionPlanHandler createCompactionPlanHandler(Configuration conf, RuntimeContext runtimeContext) { | ||
| HoodieFlinkWriteClient writeClient = FlinkWriteClients.createWriteClient(conf, runtimeContext); | ||
| boolean needsDataCompaction = OptionsResolver.needsAsyncCompaction(conf); | ||
| boolean needsMetadataCompaction = OptionsResolver.needsAsyncMetadataCompaction(conf); | ||
| if (needsDataCompaction && needsMetadataCompaction) { | ||
| return new CompositeCompactionPlanHandler( | ||
| new DataTableCompactionPlanHandler(writeClient), | ||
| new MetadataTableCompactionPlanHandler(StreamerUtil.createMetadataWriteClient(writeClient))); | ||
| } | ||
| if (needsMetadataCompaction) { | ||
| return new MetadataTableCompactionPlanHandler(StreamerUtil.createMetadataWriteClient(writeClient)); | ||
| } | ||
| return new DataTableCompactionPlanHandler(writeClient); | ||
| } | ||
|
|
||
| public static CompactHandler createCompactHandler(Configuration conf, RuntimeContext runtimeContext, int taskId) { | ||
| HoodieFlinkWriteClient writeClient = FlinkWriteClients.createWriteClient(conf, runtimeContext); | ||
| boolean needsDataCompaction = OptionsResolver.needsAsyncCompaction(conf); | ||
| boolean needsMetadataCompaction = OptionsResolver.needsAsyncMetadataCompaction(conf); | ||
| if (needsDataCompaction && needsMetadataCompaction) { | ||
| return new CompositeCompactHandler( | ||
| new DataTableCompactHandler(writeClient, taskId), | ||
| new MetadataTableCompactHandler(StreamerUtil.createMetadataWriteClient(writeClient), taskId)); | ||
| } | ||
| if (needsMetadataCompaction) { | ||
| return new MetadataTableCompactHandler(StreamerUtil.createMetadataWriteClient(writeClient), taskId); | ||
| } | ||
| return new DataTableCompactHandler(writeClient, taskId); | ||
| } | ||
|
|
||
| public static CompactionCommitHandler createCompactionCommitHandler(Configuration conf, RuntimeContext runtimeContext) { | ||
| HoodieFlinkWriteClient writeClient = FlinkWriteClients.createWriteClient(conf, runtimeContext); | ||
| boolean needsDataCompaction = OptionsResolver.needsAsyncCompaction(conf); | ||
| boolean needsMetadataCompaction = OptionsResolver.needsAsyncMetadataCompaction(conf); | ||
| if (needsDataCompaction && needsMetadataCompaction) { | ||
| return new CompositeCompactionCommitHandler( | ||
| new DataTableCompactionCommitHandler(conf, writeClient), | ||
| new MetadataTableCompactionCommitHandler(conf, StreamerUtil.createMetadataWriteClient(writeClient))); | ||
| } | ||
| if (needsMetadataCompaction) { | ||
| return new MetadataTableCompactionCommitHandler(conf, StreamerUtil.createMetadataWriteClient(writeClient)); | ||
| } | ||
| return new DataTableCompactionCommitHandler(conf, writeClient); | ||
| } |
There was a problem hiding this comment.
Do not default to data-table handlers when both compaction modes are off.
All three factory methods fall through to a DataTable*Handler when needsAsyncCompaction(conf) and needsAsyncMetadataCompaction(conf) are both false. After this refactor the callers use the returned handler unconditionally, so the disabled case stops being a no-op and starts doing rollback/planning/commit work on the data table instead.
Suggested fix
+ private static void validateCompactionEnabled(boolean needsDataCompaction, boolean needsMetadataCompaction) {
+ if (!needsDataCompaction && !needsMetadataCompaction) {
+ throw new IllegalStateException("No compaction mode is enabled");
+ }
+ }
+
public static CompactionPlanHandler createCompactionPlanHandler(Configuration conf, RuntimeContext runtimeContext) {
HoodieFlinkWriteClient writeClient = FlinkWriteClients.createWriteClient(conf, runtimeContext);
boolean needsDataCompaction = OptionsResolver.needsAsyncCompaction(conf);
boolean needsMetadataCompaction = OptionsResolver.needsAsyncMetadataCompaction(conf);
+ validateCompactionEnabled(needsDataCompaction, needsMetadataCompaction);
if (needsDataCompaction && needsMetadataCompaction) {
return new CompositeCompactionPlanHandler(
new DataTableCompactionPlanHandler(writeClient),
@@
public static CompactHandler createCompactHandler(Configuration conf, RuntimeContext runtimeContext, int taskId) {
HoodieFlinkWriteClient writeClient = FlinkWriteClients.createWriteClient(conf, runtimeContext);
boolean needsDataCompaction = OptionsResolver.needsAsyncCompaction(conf);
boolean needsMetadataCompaction = OptionsResolver.needsAsyncMetadataCompaction(conf);
+ validateCompactionEnabled(needsDataCompaction, needsMetadataCompaction);
if (needsDataCompaction && needsMetadataCompaction) {
return new CompositeCompactHandler(
new DataTableCompactHandler(writeClient, taskId),
@@
public static CompactionCommitHandler createCompactionCommitHandler(Configuration conf, RuntimeContext runtimeContext) {
HoodieFlinkWriteClient writeClient = FlinkWriteClients.createWriteClient(conf, runtimeContext);
boolean needsDataCompaction = OptionsResolver.needsAsyncCompaction(conf);
boolean needsMetadataCompaction = OptionsResolver.needsAsyncMetadataCompaction(conf);
+ validateCompactionEnabled(needsDataCompaction, needsMetadataCompaction);
if (needsDataCompaction && needsMetadataCompaction) {
return new CompositeCompactionCommitHandler(
new DataTableCompactionCommitHandler(conf, writeClient),
@@
- return new DataTableCompactionCommitHandler(conf, writeClient);
+ return new DataTableCompactionCommitHandler(conf, writeClient);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public static CompactionPlanHandler createCompactionPlanHandler(Configuration conf, RuntimeContext runtimeContext) { | |
| HoodieFlinkWriteClient writeClient = FlinkWriteClients.createWriteClient(conf, runtimeContext); | |
| boolean needsDataCompaction = OptionsResolver.needsAsyncCompaction(conf); | |
| boolean needsMetadataCompaction = OptionsResolver.needsAsyncMetadataCompaction(conf); | |
| if (needsDataCompaction && needsMetadataCompaction) { | |
| return new CompositeCompactionPlanHandler( | |
| new DataTableCompactionPlanHandler(writeClient), | |
| new MetadataTableCompactionPlanHandler(StreamerUtil.createMetadataWriteClient(writeClient))); | |
| } | |
| if (needsMetadataCompaction) { | |
| return new MetadataTableCompactionPlanHandler(StreamerUtil.createMetadataWriteClient(writeClient)); | |
| } | |
| return new DataTableCompactionPlanHandler(writeClient); | |
| } | |
| public static CompactHandler createCompactHandler(Configuration conf, RuntimeContext runtimeContext, int taskId) { | |
| HoodieFlinkWriteClient writeClient = FlinkWriteClients.createWriteClient(conf, runtimeContext); | |
| boolean needsDataCompaction = OptionsResolver.needsAsyncCompaction(conf); | |
| boolean needsMetadataCompaction = OptionsResolver.needsAsyncMetadataCompaction(conf); | |
| if (needsDataCompaction && needsMetadataCompaction) { | |
| return new CompositeCompactHandler( | |
| new DataTableCompactHandler(writeClient, taskId), | |
| new MetadataTableCompactHandler(StreamerUtil.createMetadataWriteClient(writeClient), taskId)); | |
| } | |
| if (needsMetadataCompaction) { | |
| return new MetadataTableCompactHandler(StreamerUtil.createMetadataWriteClient(writeClient), taskId); | |
| } | |
| return new DataTableCompactHandler(writeClient, taskId); | |
| } | |
| public static CompactionCommitHandler createCompactionCommitHandler(Configuration conf, RuntimeContext runtimeContext) { | |
| HoodieFlinkWriteClient writeClient = FlinkWriteClients.createWriteClient(conf, runtimeContext); | |
| boolean needsDataCompaction = OptionsResolver.needsAsyncCompaction(conf); | |
| boolean needsMetadataCompaction = OptionsResolver.needsAsyncMetadataCompaction(conf); | |
| if (needsDataCompaction && needsMetadataCompaction) { | |
| return new CompositeCompactionCommitHandler( | |
| new DataTableCompactionCommitHandler(conf, writeClient), | |
| new MetadataTableCompactionCommitHandler(conf, StreamerUtil.createMetadataWriteClient(writeClient))); | |
| } | |
| if (needsMetadataCompaction) { | |
| return new MetadataTableCompactionCommitHandler(conf, StreamerUtil.createMetadataWriteClient(writeClient)); | |
| } | |
| return new DataTableCompactionCommitHandler(conf, writeClient); | |
| } | |
| private static void validateCompactionEnabled(boolean needsDataCompaction, boolean needsMetadataCompaction) { | |
| if (!needsDataCompaction && !needsMetadataCompaction) { | |
| throw new IllegalStateException("No compaction mode is enabled"); | |
| } | |
| } | |
| public static CompactionPlanHandler createCompactionPlanHandler(Configuration conf, RuntimeContext runtimeContext) { | |
| HoodieFlinkWriteClient writeClient = FlinkWriteClients.createWriteClient(conf, runtimeContext); | |
| boolean needsDataCompaction = OptionsResolver.needsAsyncCompaction(conf); | |
| boolean needsMetadataCompaction = OptionsResolver.needsAsyncMetadataCompaction(conf); | |
| validateCompactionEnabled(needsDataCompaction, needsMetadataCompaction); | |
| if (needsDataCompaction && needsMetadataCompaction) { | |
| return new CompositeCompactionPlanHandler( | |
| new DataTableCompactionPlanHandler(writeClient), | |
| new MetadataTableCompactionPlanHandler(StreamerUtil.createMetadataWriteClient(writeClient))); | |
| } | |
| if (needsMetadataCompaction) { | |
| return new MetadataTableCompactionPlanHandler(StreamerUtil.createMetadataWriteClient(writeClient)); | |
| } | |
| return new DataTableCompactionPlanHandler(writeClient); | |
| } | |
| public static CompactHandler createCompactHandler(Configuration conf, RuntimeContext runtimeContext, int taskId) { | |
| HoodieFlinkWriteClient writeClient = FlinkWriteClients.createWriteClient(conf, runtimeContext); | |
| boolean needsDataCompaction = OptionsResolver.needsAsyncCompaction(conf); | |
| boolean needsMetadataCompaction = OptionsResolver.needsAsyncMetadataCompaction(conf); | |
| validateCompactionEnabled(needsDataCompaction, needsMetadataCompaction); | |
| if (needsDataCompaction && needsMetadataCompaction) { | |
| return new CompositeCompactHandler( | |
| new DataTableCompactHandler(writeClient, taskId), | |
| new MetadataTableCompactHandler(StreamerUtil.createMetadataWriteClient(writeClient), taskId)); | |
| } | |
| if (needsMetadataCompaction) { | |
| return new MetadataTableCompactHandler(StreamerUtil.createMetadataWriteClient(writeClient), taskId); | |
| } | |
| return new DataTableCompactHandler(writeClient, taskId); | |
| } | |
| public static CompactionCommitHandler createCompactionCommitHandler(Configuration conf, RuntimeContext runtimeContext) { | |
| HoodieFlinkWriteClient writeClient = FlinkWriteClients.createWriteClient(conf, runtimeContext); | |
| boolean needsDataCompaction = OptionsResolver.needsAsyncCompaction(conf); | |
| boolean needsMetadataCompaction = OptionsResolver.needsAsyncMetadataCompaction(conf); | |
| validateCompactionEnabled(needsDataCompaction, needsMetadataCompaction); | |
| if (needsDataCompaction && needsMetadataCompaction) { | |
| return new CompositeCompactionCommitHandler( | |
| new DataTableCompactionCommitHandler(conf, writeClient), | |
| new MetadataTableCompactionCommitHandler(conf, StreamerUtil.createMetadataWriteClient(writeClient))); | |
| } | |
| if (needsMetadataCompaction) { | |
| return new MetadataTableCompactionCommitHandler(conf, StreamerUtil.createMetadataWriteClient(writeClient)); | |
| } | |
| return new DataTableCompactionCommitHandler(conf, writeClient); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/TableServiceHandlerFactory.java`
around lines 36 - 79, The factory methods createCompactionPlanHandler,
createCompactHandler, and createCompactionCommitHandler currently fall back to
DataTable*Handler when both OptionsResolver.needsAsyncCompaction(conf) and
OptionsResolver.needsAsyncMetadataCompaction(conf) are false; change each method
to detect the "both disabled" case first and return a no-op handler instead of a
DataTable handler (e.g. return a new NoOpCompactionPlanHandler(),
NoOpCompactHandler(), NoOpCompactionCommitHandler() respectively) so callers
that use the handler unconditionally won't perform unintended work; add minimal
NoOp* handler implementations if they don't exist and wire them into the factory
before the existing composite/metadata/data branching logic.
| after { | ||
| unsetEnv(DFSPropertiesConfiguration.CONF_FILE_DIR_ENV_NAME) | ||
| DFSPropertiesConfiguration.clearGlobalProps() | ||
| } |
There was a problem hiding this comment.
Make teardown cleanup failure-safe with try/finally.
If unsetEnv fails on Line 158, DFSPropertiesConfiguration.clearGlobalProps() on Line 159 is skipped, which can leak state into later tests.
Proposed fix
after {
- unsetEnv(DFSPropertiesConfiguration.CONF_FILE_DIR_ENV_NAME)
- DFSPropertiesConfiguration.clearGlobalProps()
+ try {
+ unsetEnv(DFSPropertiesConfiguration.CONF_FILE_DIR_ENV_NAME)
+ } finally {
+ DFSPropertiesConfiguration.clearGlobalProps()
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| after { | |
| unsetEnv(DFSPropertiesConfiguration.CONF_FILE_DIR_ENV_NAME) | |
| DFSPropertiesConfiguration.clearGlobalProps() | |
| } | |
| after { | |
| try { | |
| unsetEnv(DFSPropertiesConfiguration.CONF_FILE_DIR_ENV_NAME) | |
| } finally { | |
| DFSPropertiesConfiguration.clearGlobalProps() | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/common/TestSqlConf.scala`
around lines 157 - 160, The teardown currently calls unsetEnv(...) then
DFSPropertiesConfiguration.clearGlobalProps(), but if unsetEnv throws the
clearGlobalProps() call can be skipped; wrap the calls in a try/finally inside
the after block so unsetEnv is attempted and
DFSPropertiesConfiguration.clearGlobalProps() is always executed in the finally
clause (referencing unsetEnv and DFSPropertiesConfiguration.clearGlobalProps to
locate the lines).
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/partitioner/index/TestRocksDBIndexBackend.java (1)
69-76: Consider extracting shared gauge-capture setup helper.The same Mockito gauge-capture block appears three times; a small helper would reduce duplication and make these tests easier to maintain.
Also applies to: 107-113, 164-170
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/partitioner/index/TestRocksDBIndexBackend.java` around lines 69 - 76, The test repeats the Mockito gauge-capture block (using MetricGroup, gauges map, and doAnswer(...).when(metricGroup).gauge(...)) in TestRocksDBIndexBackend; extract that into a small reusable helper method (e.g., a private static method like captureGauges(MetricGroup metricGroup, Map<String,Gauge<?>> gauges) or a helper that returns the populated map) and replace the three inline blocks (around the gauge setup at lines shown) with calls to this helper to remove duplication and centralize the doAnswer(...) configuration for metricGroup.gauge.hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java (1)
356-357: Move these instant-list dumps back to DEBUG.
scanInternal()sits on the hot path for MOR reads/compaction. Emitting full instant sets at INFO on every scan will generate a lot of executor log volume, especially for long-lived file groups.Proposed change
- LOG.info("Ordered instant times seen {}", orderedInstantsList); - LOG.info("Targeted instants that are rolled back are {}", targetRollbackInstants); + LOG.debug("Ordered instant times seen {}", orderedInstantsList); + LOG.debug("Targeted instants that are rolled back are {}", targetRollbackInstants); - LOG.info("Total valid instants found are {}. Instants are {}", validBlockInstants.size(), validBlockInstants); + LOG.debug("Total valid instants found are {}. Instants are {}", validBlockInstants.size(), validBlockInstants); if (ignoredBlockCount > 0) { - LOG.info("Ignored {} log blocks from {} instants not in the range: {}", ignoredBlockCount, ignoredInstants.size(), ignoredInstants); + LOG.debug("Ignored {} log blocks from {} instants not in the range: {}", ignoredBlockCount, ignoredInstants.size(), ignoredInstants); }Also applies to: 414-417
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java` around lines 356 - 357, The INFO-level logging of the full instant lists in BaseHoodieLogRecordReader should be lowered to DEBUG to avoid hot-path log noise: inside scanInternal() replace the LOG.info calls that print orderedInstantsList and targetRollbackInstants (and the similar INFO dumps around the later block that logs those same collections) with LOG.debug so the detailed instant sets are emitted only at debug level; update both occurrences referencing orderedInstantsList and targetRollbackInstants in BaseHoodieLogRecordReader.scanInternal().hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieReadStats.java (1)
58-61: KeeptotalLogBlocksandtotalValidLogBlockssemantically separate.
totalLogBlocksis still fed from the raw scanned-block counter inBaseHoodieLogRecordReader, which is incremented before corrupt/inflight/range filtering. Describing it as "valid" here makes it overlap withtotalValidLogBlocksand muddies what each metric means.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieReadStats.java` around lines 58 - 61, The field comments for totalLogBlocks and totalValidLogBlocks are conflating semantics; update the comment for totalLogBlocks in HoodieReadStats to reflect that it counts raw scanned log blocks (incremented in BaseHoodieLogRecordReader before corrupt/inflight/range filtering) and is NOT "valid"—leave totalValidLogBlocks described as post-filtering valid blocks. Ensure the comment mentions BaseHoodieLogRecordReader and the distinction between pre-filter raw scanned blocks (totalLogBlocks) and post-filter valid blocks (totalValidLogBlocks) so future readers don't confuse the two metrics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@hudi-common/src/main/java/org/apache/hudi/common/table/read/buffer/LogScanningRecordBufferLoader.java`:
- Around line 62-65: The current setTotalLogSizeCompacted call in
LogScanningRecordBufferLoader uses getBlocksStats() and sums
BLOCK_SIZE_IN_BYTES, which undercounts delete-only blocks because
BaseHoodieLogRecordReader.processDataBlock does not populate BLOCK_SIZE_IN_BYTES
for delete blocks; change the computation to derive totalLogSizeCompacted from
the aggregated valid compacted log bytes/files produced by the reader (e.g., use
the reader's compacted file/bytes metadata or file-level stats that include
delete-only blocks) rather than re-summing data-block BLOCK_SIZE_IN_BYTES;
update the logic around readStats.setTotalLogSizeCompacted to sum the compacted
log file byte sizes (from the reader's file-level metrics or valid compacted
file list) so delete-heavy and mixed slices are counted correctly.
In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/partitioner/index/TestRocksDBIndexBackend.java`:
- Around line 129-141: The test currently writes an excessively large payload in
TestRocksDBIndexBackend by looping 32K times with 4K-sized strings; instead,
limit the writes and stop early when SST metrics become positive: in the loop
that calls rocksDBIndexBackend.update(...) reduce the repeated payload size
(e.g., 256–512 chars) and cap iterations (e.g., 1K or less), and after each
update check FlinkRocksDBIndexMetrics.ROCKSDB_DISK_TOTAL_SST_FILES_SIZE and
ROCKSDB_DISK_LIVE_SST_FILES_SIZE gauges and break once either metric > 0; this
keeps the test bounded and still verifies SST flush behavior without huge
memory/CI cost.
---
Nitpick comments:
In
`@hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java`:
- Around line 356-357: The INFO-level logging of the full instant lists in
BaseHoodieLogRecordReader should be lowered to DEBUG to avoid hot-path log
noise: inside scanInternal() replace the LOG.info calls that print
orderedInstantsList and targetRollbackInstants (and the similar INFO dumps
around the later block that logs those same collections) with LOG.debug so the
detailed instant sets are emitted only at debug level; update both occurrences
referencing orderedInstantsList and targetRollbackInstants in
BaseHoodieLogRecordReader.scanInternal().
In
`@hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieReadStats.java`:
- Around line 58-61: The field comments for totalLogBlocks and
totalValidLogBlocks are conflating semantics; update the comment for
totalLogBlocks in HoodieReadStats to reflect that it counts raw scanned log
blocks (incremented in BaseHoodieLogRecordReader before corrupt/inflight/range
filtering) and is NOT "valid"—leave totalValidLogBlocks described as
post-filtering valid blocks. Ensure the comment mentions
BaseHoodieLogRecordReader and the distinction between pre-filter raw scanned
blocks (totalLogBlocks) and post-filter valid blocks (totalValidLogBlocks) so
future readers don't confuse the two metrics.
In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/partitioner/index/TestRocksDBIndexBackend.java`:
- Around line 69-76: The test repeats the Mockito gauge-capture block (using
MetricGroup, gauges map, and doAnswer(...).when(metricGroup).gauge(...)) in
TestRocksDBIndexBackend; extract that into a small reusable helper method (e.g.,
a private static method like captureGauges(MetricGroup metricGroup,
Map<String,Gauge<?>> gauges) or a helper that returns the populated map) and
replace the three inline blocks (around the gauge setup at lines shown) with
calls to this helper to remove duplication and centralize the doAnswer(...)
configuration for metricGroup.gauge.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7e02edcf-0aae-4ba3-915e-64fda4516988
📒 Files selected for processing (13)
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedAppendHandle.javahudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedMergeHandle.javahudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.javahudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieReadStats.javahudi-common/src/main/java/org/apache/hudi/common/table/read/buffer/LogScanningRecordBufferLoader.javahudi-common/src/main/java/org/apache/hudi/common/util/collection/RocksDBDAO.javahudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieReadStats.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkRocksDBIndexMetrics.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/BucketAssignFunction.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/index/IndexBackend.javahudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/index/RocksDBIndexBackend.javahudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/partitioner/index/TestRocksDBIndexBackend.javahudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/table/TestHoodieMergeOnReadTable.java
🚧 Files skipped from review as they are similar to previous changes (3)
- hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/BucketAssignFunction.java
- hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/index/IndexBackend.java
- hudi-common/src/main/java/org/apache/hudi/common/util/collection/RocksDBDAO.java
| readStats.setTotalLogSizeCompacted(logRecordReader.getBlocksStats().stream() | ||
| .filter(blockScanMetrics -> blockScanMetrics.containsKey(BLOCK_SIZE_IN_BYTES)) | ||
| .mapToLong(blockScanMetrics -> ((Number) blockScanMetrics.get(BLOCK_SIZE_IN_BYTES)).longValue()) | ||
| .sum()); |
There was a problem hiding this comment.
totalLogSizeCompacted now under-counts delete-heavy slices.
blocksStats is only populated from BaseHoodieLogRecordReader.processDataBlock(...); valid delete blocks never contribute BLOCK_SIZE_IN_BYTES. A delete-only slice therefore records 0 here, and mixed slices under-report bytes before this value is written into HoodieWriteStat. Please derive this from all valid compacted log bytes/files instead of re-summing data-block stats.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@hudi-common/src/main/java/org/apache/hudi/common/table/read/buffer/LogScanningRecordBufferLoader.java`
around lines 62 - 65, The current setTotalLogSizeCompacted call in
LogScanningRecordBufferLoader uses getBlocksStats() and sums
BLOCK_SIZE_IN_BYTES, which undercounts delete-only blocks because
BaseHoodieLogRecordReader.processDataBlock does not populate BLOCK_SIZE_IN_BYTES
for delete blocks; change the computation to derive totalLogSizeCompacted from
the aggregated valid compacted log bytes/files produced by the reader (e.g., use
the reader's compacted file/bytes metadata or file-level stats that include
delete-only blocks) rather than re-summing data-block BLOCK_SIZE_IN_BYTES;
update the logic around readStats.setTotalLogSizeCompacted to sum the compacted
log file byte sizes (from the reader's file-level metrics or valid compacted
file list) so delete-heavy and mixed slices are counted correctly.
| // Keep writing larger records to trigger auto-flushes memtable into SST. | ||
| int nextKey = 32; | ||
| for (int i = 0; i < 32768; i++) { | ||
| String par = "par-" + nextKey + "-" + "x".repeat(4096); | ||
| rocksDBIndexBackend.update("id" + nextKey, | ||
| new HoodieRecordGlobalLocation(par, "00" + nextKey, UUID.randomUUID().toString())); | ||
| nextKey++; | ||
| } | ||
| long totalSstSize = | ||
| ((Number) gauges.get(FlinkRocksDBIndexMetrics.ROCKSDB_DISK_TOTAL_SST_FILES_SIZE).getValue()).longValue(); | ||
| long liveSstSize = | ||
| ((Number) gauges.get(FlinkRocksDBIndexMetrics.ROCKSDB_DISK_LIVE_SST_FILES_SIZE).getValue()).longValue(); | ||
| assertTrue(totalSstSize > 0 || liveSstSize > 0); |
There was a problem hiding this comment.
Test payload volume is too large for a unit test.
Lines 129-141 allocate/process a very large dataset (~100MB+ string content plus object overhead), which can make CI slow/flaky. Prefer bounded writes with early exit once SST metrics become positive.
💡 Proposed bounded-write approach
- // Keep writing larger records to trigger auto-flushes memtable into SST.
- int nextKey = 32;
- for (int i = 0; i < 32768; i++) {
- String par = "par-" + nextKey + "-" + "x".repeat(4096);
- rocksDBIndexBackend.update("id" + nextKey,
- new HoodieRecordGlobalLocation(par, "00" + nextKey, UUID.randomUUID().toString()));
- nextKey++;
- }
- long totalSstSize =
- ((Number) gauges.get(FlinkRocksDBIndexMetrics.ROCKSDB_DISK_TOTAL_SST_FILES_SIZE).getValue()).longValue();
- long liveSstSize =
- ((Number) gauges.get(FlinkRocksDBIndexMetrics.ROCKSDB_DISK_LIVE_SST_FILES_SIZE).getValue()).longValue();
- assertTrue(totalSstSize > 0 || liveSstSize > 0);
+ // Bounded writes with early stop once SST metrics move.
+ int nextKey = 32;
+ boolean sstObserved = false;
+ for (int i = 0; i < 4096; i++) {
+ String par = "par-" + nextKey + "-" + "x".repeat(2048);
+ rocksDBIndexBackend.update("id" + nextKey,
+ new HoodieRecordGlobalLocation(par, "00" + nextKey, UUID.randomUUID().toString()));
+ nextKey++;
+
+ if (i % 128 == 0) {
+ long totalSstSize =
+ ((Number) gauges.get(FlinkRocksDBIndexMetrics.ROCKSDB_DISK_TOTAL_SST_FILES_SIZE).getValue()).longValue();
+ long liveSstSize =
+ ((Number) gauges.get(FlinkRocksDBIndexMetrics.ROCKSDB_DISK_LIVE_SST_FILES_SIZE).getValue()).longValue();
+ if (totalSstSize > 0 || liveSstSize > 0) {
+ sstObserved = true;
+ break;
+ }
+ }
+ }
+ assertTrue(sstObserved, "Expected SST metrics to become positive after bounded writes");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/partitioner/index/TestRocksDBIndexBackend.java`
around lines 129 - 141, The test currently writes an excessively large payload
in TestRocksDBIndexBackend by looping 32K times with 4K-sized strings; instead,
limit the writes and stop early when SST metrics become positive: in the loop
that calls rocksDBIndexBackend.update(...) reduce the repeated payload size
(e.g., 256–512 chars) and cap iterations (e.g., 1K or less), and after each
update check FlinkRocksDBIndexMetrics.ROCKSDB_DISK_TOTAL_SST_FILES_SIZE and
ROCKSDB_DISK_LIVE_SST_FILES_SIZE gauges and break once either metric > 0; this
keeps the test bounded and still verifies SST flush behavior without huge
memory/CI cost.
Mirror of apache#18484 for automated bot review.
Original author: @cshuo
Base branch: master
Summary by CodeRabbit
New Features
Improvements
Bug Fixes