Pipe: merge batched aligned chunks in scan parser#18010
Merged
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18010 +/- ##
============================================
+ Coverage 41.24% 41.42% +0.18%
Complexity 318 318
============================================
Files 5272 5281 +9
Lines 367956 369190 +1234
Branches 47610 47770 +160
============================================
+ Hits 151769 152946 +1177
- Misses 216187 216244 +57 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
jt2594838
reviewed
Jun 25, 2026
|
Caideyipi
added a commit
to Caideyipi/iotdb
that referenced
this pull request
Jun 26, 2026
* Pipe: merge batched aligned chunks in scan parser * Test pipe batched aligned chunk memory boundaries * Pipe: fix batched aligned scan parser memory split * Update TsFileInsertionEventParserTest.java * Rename pending aligned chunk consumer (cherry picked from commit f96fc58)
MileaRobertStefan
pushed a commit
to MileaRobertStefan/iotdb
that referenced
this pull request
Jun 26, 2026
* Pipe: merge batched aligned chunks in scan parser * Test pipe batched aligned chunk memory boundaries * Pipe: fix batched aligned scan parser memory split * Update TsFileInsertionEventParserTest.java * Rename pending aligned chunk consumer
jt2594838
pushed a commit
that referenced
this pull request
Jun 30, 2026
* Pipe: merge batched aligned chunks in scan parser * Test pipe batched aligned chunk memory boundaries * Pipe: fix batched aligned scan parser memory split * Update TsFileInsertionEventParserTest.java * Rename pending aligned chunk consumer (cherry picked from commit f96fc58)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Description
This PR improves the pipe TsFile scan parser for legal aligned TsFiles whose value chunks are physically written in column batches, such as files produced by batched aligned compaction.
The current scan parser emits an aligned tablet when the value chunk occurrence index changes. For batched aligned compaction output, value chunks can be laid out as:
This layout is valid, but the previous parser behavior makes the emitted tablets inherit the physical compaction batch width, commonly 10 columns from
compaction_max_aligned_series_num_in_one_batch, even when pipe reader memory allows a wider aligned tablet. That increases the number of tablets and hurts pipe performance.This PR changes the scan parser to cache pending aligned value chunk groups by time chunk index and emit them only when memory limits or chunk group boundaries require it. With enough memory, consecutive physical value column batches for the same aligned chunks are merged into wider aligned tablets instead of being split at the compaction batch boundary.
It also defines
pipeDataStructureTabletRowSize <= 0as disabling the row-count cap for pipe tablets. In that mode, tablet row count is calculated only frompipe_data_structure_tablet_size_in_bytes, so users can rely on the memory-size limit instead of the fixed row-count limit.Changes
TsFileInsertionEventScanParser.pipeDataStructureTabletRowSizeas no row-count cap inPipeMemoryWeightUtil.0/negative values.Validation
mvn spotless:apply -pl iotdb-core/datanodegit diff --checkI also tried:
mvn -Ddevelocity.off=true -pl iotdb-core/datanode -DskipTests compilemvn -Ddevelocity.off=true -Dmaven.main.skip=true -pl iotdb-core/datanode -Dtest=TsFileInsertionEventParserTest#testScanParserMergesBatchedAlignedValueChunkGroups+testPipeTabletRowSizeCanBeDisabledByNonPositiveValue testmvn -pl iotdb-core/datanode -Dtest=TsFileInsertionEventParserTest#testScanParserMergesBatchedAlignedValueChunkGroups+testScanParserFlushesBatchedAlignedValueChunkGroupsByMemoryLimit+testPipeTabletRowSizeCanBeDisabledByNonPositiveValue testThese Maven compile/test attempts are blocked in this workspace by existing datanode-wide compile issues outside this PR, including generated query fill/aggregation classes and
IOUtils.readFullyunresolved symbols in unrelated files. The focused tests did not get executed because compilation fails before Surefire runs.