Skip to content

Use zstd::bulk API in IPC and Parquet with context reuse for compression and decompression#9400

Merged
Dandandan merged 6 commits intoapache:mainfrom
Dandandan:zstd-bulk-context-reuse
Feb 12, 2026
Merged

Use zstd::bulk API in IPC and Parquet with context reuse for compression and decompression#9400
Dandandan merged 6 commits intoapache:mainfrom
Dandandan:zstd-bulk-context-reuse

Conversation

@Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Feb 12, 2026

Which issue does this PR close?

Rationale for this change

Switch parquet and IPC zstd codec from the streaming API (zstd::Encoder/Decoder) to the bulk API (zstd::bulk::Compressor/Decompressor) with reusable contexts. This avoids the overhead of reinitializing zstd contexts on every compress/decompress call, yielding ~8-11% speedup on benchmarks.

Parquet: Store Compressor and Decompressor in ZSTDCodec, reused across calls. IPC: Add DecompressionContext (mirroring existing CompressionContext) with a reusable bulk Decompressor, threaded through RecordBatchDecoder.

  Benchmark: cargo bench -p parquet --features experimental --bench compression -- "Zstd"                                                                                                     
  ┌────────────────────────────────┬──────────┬───────────┬────────┐         
  │           Benchmark            │   Main   │ Optimized │ Change │
  ├────────────────────────────────┼──────────┼───────────┼────────┤
  │ compress ZSTD - alphanumeric   │ 866 µs   │ 789 µs    │ -9.6%  │
  ├────────────────────────────────┼──────────┼───────────┼────────┤
  │ decompress ZSTD - alphanumeric │ 1.125 ms │ 1.007 ms  │ -8.8%  │
  ├────────────────────────────────┼──────────┼───────────┼────────┤
  │ compress ZSTD - words          │ 2.869 ms │ 2.590 ms  │ -9.7%  │
  ├────────────────────────────────┼──────────┼───────────┼────────┤
  │ decompress ZSTD - words        │ 1.001 ms │ 848 µs    │ -10.6% │
  └────────────────────────────────┴──────────┴───────────┴────────┘
  IPC Reader Decompression (10 batches)

  Benchmark: cargo bench -p arrow-ipc --features zstd --bench ipc_reader -- "zstd"
  ┌─────────────────────────────────────────┬──────────┬───────────┬────────┐
  │                Benchmark                │   Main   │ Optimized │ Change │
  ├─────────────────────────────────────────┼──────────┼───────────┼────────┤
  │ StreamReader/read_10/zstd               │ 2.756 ms │ 2.540 ms  │ -7.8%  │
  ├─────────────────────────────────────────┼──────────┼───────────┼────────┤
  │ StreamReader/no_validation/read_10/zstd │ 2.601 ms │ 2.352 ms  │ -9.6%  │
  └─────────────────────────────────────────┴──────────┴───────────┴────────┘

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Switch parquet and IPC zstd codec from the streaming API (zstd::Encoder/Decoder)
to the bulk API (zstd::bulk::Compressor/Decompressor) with reusable contexts.
This avoids the overhead of reinitializing zstd contexts on every compress/decompress
call, yielding ~8-11% speedup on benchmarks.

Parquet: Store Compressor and Decompressor in ZSTDCodec, reused across calls.
IPC: Add DecompressionContext (mirroring existing CompressionContext) with a
reusable bulk Decompressor, threaded through RecordBatchDecoder.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added parquet Changes to the parquet crate arrow Changes to the arrow crate labels Feb 12, 2026
@Dandandan
Copy link
Contributor Author

run benchmarks ipc_reader

@alamb-ghbot
Copy link

🤖 Hi @Dandandan, thanks for the request (#9400 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

  • Standard: (none)
  • Criterion: array_from, array_iter, arrow_reader, arrow_reader_clickbench, arrow_reader_row_filter, arrow_statistics, arrow_writer, bitwise_kernel, boolean_kernels, buffer_bit_ops, builder, cast_kernels, coalesce_kernels, comparison_kernels, concatenate_kernel, csv_writer, encoding, filter_kernels, interleave_kernels, json-reader, metadata, row_format, sort_kernel, take_kernels, union_array, variant_builder, variant_kernels, variant_validation, view_types, zip_kernels

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

You can also set environment variables on subsequent lines:

run benchmark tpch_mem
DATAFUSION_RUNTIME_MEMORY_LIMIT=1G

Unsupported benchmarks: ipc_reader.

@Dandandan Dandandan marked this pull request as ready for review February 12, 2026 13:18
Comment on lines +514 to +515
compressor: zstd::bulk::Compressor<'static>,
decompressor: zstd::bulk::Decompressor<'static>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to do even better than this and initialize a thread-local compressor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically perhaps but for a lot of pages it should be reused already a lot of times.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW that is what the underlying compressor docs seem to suggest as well:

Image

However, given this one is already better than main i think we could merge this as is and then look into a thread-local as a follow on

One downside of a thread-local is that it isn't clear to me how it would ever be cleared 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah just slapping a thread_local! would not be handy as we don't know when it's good to clear it.

Perhaps an API could be changed/designed to reuse context/allocations like this for multiple parquet reader instances on the same thread, though I don't think the gain would be large.

@alamb
Copy link
Contributor

alamb commented Feb 12, 2026

run benchmarks ipc_reader

I will add this to the available benchmarks

@alamb
Copy link
Contributor

alamb commented Feb 12, 2026

run benchmark ipc_reader

@apache apache deleted a comment from alamb-ghbot Feb 12, 2026
@alamb-ghbot

This comment was marked as outdated.

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing zstd-bulk-context-reuse (74a2b61) to d6168e5 diff
BENCH_NAME=ipc_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench ipc_reader
BENCH_FILTER=
BENCH_BRANCH_NAME=zstd-bulk-context-reuse
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                       main                                   zstd-bulk-context-reuse
-----                                                       ----                                   -----------------------
arrow_ipc_reader/FileReader/no_validation/read_10           1.00   413.9±13.75µs        ? ?/sec    1.05    436.1±7.02µs        ? ?/sec
arrow_ipc_reader/FileReader/no_validation/read_10/mmap      1.00     62.0±0.86µs        ? ?/sec    1.38     85.7±0.64µs        ? ?/sec
arrow_ipc_reader/FileReader/read_10                         1.00    713.0±5.39µs        ? ?/sec    1.04    742.6±8.05µs        ? ?/sec
arrow_ipc_reader/FileReader/read_10/mmap                    1.00    602.0±8.71µs        ? ?/sec    1.04   627.8±17.43µs        ? ?/sec
arrow_ipc_reader/StreamReader/no_validation/read_10         1.00   404.7±14.00µs        ? ?/sec    1.08   435.4±13.52µs        ? ?/sec
arrow_ipc_reader/StreamReader/no_validation/read_10/zstd    1.16      3.2±0.01ms        ? ?/sec    1.00      2.7±0.02ms        ? ?/sec
arrow_ipc_reader/StreamReader/read_10                       1.00    695.3±6.83µs        ? ?/sec    1.05    729.2±7.52µs        ? ?/sec
arrow_ipc_reader/StreamReader/read_10/zstd                  1.14      3.5±0.01ms        ? ?/sec    1.00      3.1±0.03ms        ? ?/sec

@Dandandan
Copy link
Contributor Author

Dandandan commented Feb 12, 2026

group                                                       main                                   zstd-bulk-context-reuse
-----                                                       ----                                   -----------------------
arrow_ipc_reader/StreamReader/no_validation/read_10/zstd    1.16      3.2±0.01ms        ? ?/sec    1.00      2.7±0.02ms        ? 
arrow_ipc_reader/StreamReader/read_10/zstd                  1.14      3.5±0.01ms        ? ?/sec    1.00      3.1±0.03ms        ? ?/sec

Those show the difference (others should be unchanged).

@alamb alamb changed the title Use zstd::bulk API with context reuse for compression and decompression Use zstd::bulk API in IPC with context reuse for compression and decompression Feb 12, 2026
@Dandandan Dandandan changed the title Use zstd::bulk API in IPC with context reuse for compression and decompression Use zstd::bulk API in IPC and Parquet with context reuse for compression and decompression Feb 12, 2026
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dandandan and @thinkharderdev -- this is a nice find

Comment on lines +514 to +515
compressor: zstd::bulk::Compressor<'static>,
decompressor: zstd::bulk::Decompressor<'static>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW that is what the underlying compressor docs seem to suggest as well:

Image

However, given this one is already better than main i think we could merge this as is and then look into a thread-local as a follow on

One downside of a thread-local is that it isn't clear to me how it would ever be cleared 🤔

@Dandandan Dandandan merged commit 7d16cd0 into apache:main Feb 12, 2026
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Zstd context reuse

4 participants