Skip to content

Pipe: merge batched aligned chunks in scan parser#18010

Merged
jt2594838 merged 5 commits into
masterfrom
fix/pipe-merge-batched-aligned-chunks
Jun 26, 2026
Merged

Pipe: merge batched aligned chunks in scan parser#18010
jt2594838 merged 5 commits into
masterfrom
fix/pipe-merge-batched-aligned-chunks

Conversation

@Caideyipi

@Caideyipi Caideyipi commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Description

This PR improves the pipe TsFile scan parser for legal aligned TsFiles whose value chunks are physically written in column batches, such as files produced by batched aligned compaction.

The current scan parser emits an aligned tablet when the value chunk occurrence index changes. For batched aligned compaction output, value chunks can be laid out as:

  • time chunk 0, time chunk 1
  • value columns 0-9 for chunk 0 and chunk 1
  • value columns 10-19 for chunk 0 and chunk 1
  • ...

This layout is valid, but the previous parser behavior makes the emitted tablets inherit the physical compaction batch width, commonly 10 columns from compaction_max_aligned_series_num_in_one_batch, even when pipe reader memory allows a wider aligned tablet. That increases the number of tablets and hurts pipe performance.

This PR changes the scan parser to cache pending aligned value chunk groups by time chunk index and emit them only when memory limits or chunk group boundaries require it. With enough memory, consecutive physical value column batches for the same aligned chunks are merged into wider aligned tablets instead of being split at the compaction batch boundary.

It also defines pipeDataStructureTabletRowSize <= 0 as disabling the row-count cap for pipe tablets. In that mode, tablet row count is calculated only from pipe_data_structure_tablet_size_in_bytes, so users can rely on the memory-size limit instead of the fixed row-count limit.

Changes

  • Cache aligned value chunks in pending groups keyed by time chunk index in TsFileInsertionEventScanParser.
  • Preserve chunk/page memory protection when merging multiple physical aligned value chunk groups.
  • Keep cached value chunk replay subject to the same memory threshold checks.
  • Treat non-positive pipeDataStructureTabletRowSize as no row-count cap in PipeMemoryWeightUtil.
  • Add tests for batched aligned value chunk layout merging, memory-boundary flushing, and disabling the tablet row-size cap with 0/negative values.

Validation

  • mvn spotless:apply -pl iotdb-core/datanode
  • git diff --check

I also tried:

  • mvn -Ddevelocity.off=true -pl iotdb-core/datanode -DskipTests compile
  • mvn -Ddevelocity.off=true -Dmaven.main.skip=true -pl iotdb-core/datanode -Dtest=TsFileInsertionEventParserTest#testScanParserMergesBatchedAlignedValueChunkGroups+testPipeTabletRowSizeCanBeDisabledByNonPositiveValue test
  • mvn -pl iotdb-core/datanode -Dtest=TsFileInsertionEventParserTest#testScanParserMergesBatchedAlignedValueChunkGroups+testScanParserFlushesBatchedAlignedValueChunkGroupsByMemoryLimit+testPipeTabletRowSizeCanBeDisabledByNonPositiveValue test

These Maven compile/test attempts are blocked in this workspace by existing datanode-wide compile issues outside this PR, including generated query fill/aggregation classes and IOUtils.readFully unresolved symbols in unrelated files. The focused tests did not get executed because compilation fails before Surefire runs.

@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.75676% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.42%. Comparing base (511d08f) to head (028deef).
⚠️ Report is 20 commits behind head on master.

Files with missing lines Patch % Lines
...le/parser/scan/TsFileInsertionEventScanParser.java 96.25% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18010      +/-   ##
============================================
+ Coverage     41.24%   41.42%   +0.18%     
  Complexity      318      318              
============================================
  Files          5272     5281       +9     
  Lines        367956   369190    +1234     
  Branches      47610    47770     +160     
============================================
+ Hits         151769   152946    +1177     
- Misses       216187   216244      +57     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sonarqubecloud

Copy link
Copy Markdown

@jt2594838 jt2594838 merged commit f96fc58 into master Jun 26, 2026
43 of 45 checks passed
@jt2594838 jt2594838 deleted the fix/pipe-merge-batched-aligned-chunks branch June 26, 2026 06:19
Caideyipi added a commit to Caideyipi/iotdb that referenced this pull request Jun 26, 2026
* Pipe: merge batched aligned chunks in scan parser

* Test pipe batched aligned chunk memory boundaries

* Pipe: fix batched aligned scan parser memory split

* Update TsFileInsertionEventParserTest.java

* Rename pending aligned chunk consumer

(cherry picked from commit f96fc58)
MileaRobertStefan pushed a commit to MileaRobertStefan/iotdb that referenced this pull request Jun 26, 2026
* Pipe: merge batched aligned chunks in scan parser

* Test pipe batched aligned chunk memory boundaries

* Pipe: fix batched aligned scan parser memory split

* Update TsFileInsertionEventParserTest.java

* Rename pending aligned chunk consumer
jt2594838 pushed a commit that referenced this pull request Jun 30, 2026
* Pipe: merge batched aligned chunks in scan parser

* Test pipe batched aligned chunk memory boundaries

* Pipe: fix batched aligned scan parser memory split

* Update TsFileInsertionEventParserTest.java

* Rename pending aligned chunk consumer

(cherry picked from commit f96fc58)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants