Skip to content

Buffer builder hot path opts#9393

Open
cetra3 wants to merge 10 commits intoapache:mainfrom
pydantic:buffer-builder-hot-path-opts
Open

Buffer builder hot path opts#9393
cetra3 wants to merge 10 commits intoapache:mainfrom
pydantic:buffer-builder-hot-path-opts

Conversation

@cetra3
Copy link
Contributor

@cetra3 cetra3 commented Feb 11, 2026

Which issue does this PR close?

None at the moment

Rationale for this change

Buffer Builder had a few bounds checks when adjusting the inner MutableBuffer.

What changes are included in this PR?

Adjusts the buffer builder struct to be a little more slimmed down, deferring to the MutableBuffer for a lot of things.

Are these changes tested?

  • cargo test -p arrow-buffer -- 244 tests passed
  • cargo test -p arrow-select -p arrow-ord -p arrow-array -- all tests passed
  • cargo +nightly miri test -p arrow-buffer -- buffer::mutable -- all tests passed
  • cargo +nightly miri test -p arrow-buffer -- builder -- all tests passed

Are there any user-facing changes?

Nope

cetra3 and others added 2 commits February 11, 2026 19:37
The `half::f16` type no longer implements `rand::distributions::Standard`,
so benchmarks using `rng.random::<f16>()` fail to compile. Use
`f16::from_f32(rng.random::<f32>())` instead, and route Float16 array
generation through `create_f16_array` which already handles this.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace `Layout` field in MutableBuffer with separate `capacity` and
`align` fields, keeping the struct at 32 bytes (same as original) while
enabling O(1) capacity checks in `reserve()` without going through
`Layout::size()`. Layout is reconstructed on cold paths only
(alloc/dealloc/realloc).

BufferBuilder changes:
- Cache `buffer.capacity()` in a local before the extend loop
- Replace `std::mem::size_of::<T>()` with a `const` local in hot paths
- Add `#[inline(always)]` to `BufferBuilder::advance` and
  `BufferBuilder::append`

Kernel changes (sort/concat/interleave):
- Replace `Vec<T>` + `Buffer::from(vec)` with direct
  `MutableBuffer`/`BufferBuilder` usage to avoid the memcpy on
  Vec→Buffer conversion

Benchmark results vs clean baseline (--quick):
- sort with indices: 9-14% faster
- sort primitive run: 24% faster
- interleave dict: 8-13% faster
- take stringview: up to 28% faster
- buffer create (mutable): 9% faster
- buffer create (from_slice): 9% faster

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the arrow Changes to the arrow crate label Feb 11, 2026
@adriangb
Copy link
Contributor

@Dandandan @rluvaton any interest in reviewing this perf improvement?

@adriangb
Copy link
Contributor

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing buffer-builder-hot-path-opts (2e70130) to 7dbe58a diff
BENCH_NAME=concatenate_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench concatenate_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=buffer-builder-hot-path-opts
Results will be posted here when complete

@rluvaton
Copy link
Member

YES, will review now

/// Only used in cold paths (alloc/dealloc/realloc).
#[inline]
fn layout(&self) -> Layout {
debug_assert!(self.align.is_power_of_two());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is in cold path, should we use assert instead of debug_assert?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above -- I don't understand the rationale for inlining the Layout

pub fn append(&mut self, v: T) {
self.reserve(1);
self.buffer.push(v);
self.buffer.reserve(std::mem::size_of::<T>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we revert this? as the reserve already does that and it should be inlined

Suggested change
self.buffer.reserve(std::mem::size_of::<T>());
self.reserve(1);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this is going to be faster since we don't need to multiply it by n

pub fn append_n(&mut self, n: usize, v: T) {
self.reserve(n);
self.extend(std::iter::repeat_n(v, n))
self.buffer.reserve(n * std::mem::size_of::<T>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we revert this as this is the same as self.reserve(n)?

Suggested change
self.buffer.reserve(n * std::mem::size_of::<T>());
self.reserve(n);

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                          buffer-builder-hot-path-opts           main
-----                                                          ----------------------------           ----
concat 1024 arrays boolean 4                                   1.01     21.6±0.62µs        ? ?/sec    1.00     21.3±0.56µs        ? ?/sec
concat 1024 arrays i32 4                                       1.00     13.7±0.05µs        ? ?/sec    1.00     13.7±0.15µs        ? ?/sec
concat 1024 arrays str 4                                       1.00     37.4±0.36µs        ? ?/sec    1.00     37.3±0.34µs        ? ?/sec
concat boolean 1024                                            1.00    303.7±3.64ns        ? ?/sec    1.00    303.8±4.44ns        ? ?/sec
concat boolean 8192 over 100 arrays                            1.00      5.1±0.10µs        ? ?/sec    1.00      5.1±0.03µs        ? ?/sec
concat boolean nulls 1024                                      1.03    549.9±3.30ns        ? ?/sec    1.00    533.8±6.76ns        ? ?/sec
concat boolean nulls 8192 over 100 arrays                      1.00     18.3±0.17µs        ? ?/sec    1.00     18.2±0.21µs        ? ?/sec
concat fixed size lists                                        1.00   752.0±33.73µs        ? ?/sec    1.00   754.7±25.03µs        ? ?/sec
concat i32 1024                                                1.00    356.1±1.96ns        ? ?/sec    1.00    357.1±4.99ns        ? ?/sec
concat i32 8192 over 100 arrays                                1.05   210.3±10.64µs        ? ?/sec    1.00    199.8±5.07µs        ? ?/sec
concat i32 nulls 1024                                          1.00    567.4±2.48ns        ? ?/sec    1.01    575.4±7.28ns        ? ?/sec
concat i32 nulls 8192 over 100 arrays                          1.00    231.8±9.95µs        ? ?/sec    1.06    246.4±9.35µs        ? ?/sec
concat str 1024                                                1.12     14.7±0.59µs        ? ?/sec    1.00     13.0±1.26µs        ? ?/sec
concat str 8192 over 100 arrays                                1.01    105.9±0.89ms        ? ?/sec    1.00    105.2±1.06ms        ? ?/sec
concat str nulls 1024                                          1.20      6.6±0.37µs        ? ?/sec    1.00      5.5±0.67µs        ? ?/sec
concat str nulls 8192 over 100 arrays                          1.00     52.8±0.97ms        ? ?/sec    1.00     52.6±0.74ms        ? ?/sec
concat str_dict 1024                                           1.00      2.6±0.03µs        ? ?/sec    1.00      2.7±0.01µs        ? ?/sec
concat str_dict_sparse 1024                                    1.04      7.2±0.02µs        ? ?/sec    1.00      6.9±0.02µs        ? ?/sec
concat struct with int32 and dicts size=1024 count=2           1.00      6.4±0.08µs        ? ?/sec    1.05      6.7±0.61µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0               1.00     77.5±0.84µs        ? ?/sec    1.00     77.6±0.65µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0.2             1.00     79.1±1.04µs        ? ?/sec    1.06     83.8±0.40µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0                1.00     76.9±0.56µs        ? ?/sec    1.15     88.8±0.48µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.00     78.9±0.54µs        ? ?/sec    1.15     90.8±0.37µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0      1.01     46.8±3.22µs        ? ?/sec    1.00     46.4±3.60µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0.2    1.06     50.7±1.51µs        ? ?/sec    1.00     48.0±2.79µs        ? ?/sec

Comment on lines 398 to 402
let before = self.buffer.len();
self.buffer.extend(iter);
let added_bytes = self.buffer.len() - before;
debug_assert_eq!(added_bytes % std::mem::size_of::<T>(), 0);
self.len += added_bytes / std::mem::size_of::<T>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: before if the iterator panicked in the 3rd item or something it would not leave the builder in inconsistent state, now it will as the length does not match the buffer anymore

Comment on lines 77 to 78
let mut mutable_buffer =
MutableBuffer::from_len_zeroed(primitive_values.len() * std::mem::size_of::<T::Native>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this assume that T::default_value() is 0 which is the case, but I verified by reading the code that it is just a placeholder so it is ok 👍🏻

@rluvaton
Copy link
Member

Thank you for the PR, it looks like it contain maybe optimization that I'm not sure whether all of them are actually improving or not, it would be better if you could create a pr per optimization and we can run the benchmarks for each one to verify that it does in fact improve perf

@rluvaton
Copy link
Member

run benchmark sort concat take interleave

@alamb-ghbot
Copy link

🤖 Hi @rluvaton, thanks for the request (#9393 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

  • Standard: (none)
  • Criterion: array_from, array_iter, arrow_reader, arrow_reader_clickbench, arrow_reader_row_filter, arrow_statistics, arrow_writer, bitwise_kernel, boolean_kernels, buffer_bit_ops, builder, cast_kernels, coalesce_kernels, comparison_kernels, concatenate_kernel, csv_writer, encoding, filter_kernels, interleave_kernels, json-reader, metadata, row_format, sort_kernel, take_kernels, union_array, variant_builder, variant_kernels, variant_validation, view_types, zip_kernels

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...
Unsupported benchmarks: sort, concat, take, interleave.

@rluvaton
Copy link
Member

run benchmark builder cast_kernels comparison_kernels concatenate_kernel filter_kernels interleave_kernels sort_kernel take_kernels zip_kernels

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing buffer-builder-hot-path-opts (2e70130) to 7dbe58a diff
BENCH_NAME=builder
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench builder
BENCH_FILTER=
BENCH_BRANCH_NAME=buffer-builder-hot-path-opts
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                          buffer-builder-hot-path-opts           main
-----                                          ----------------------------           ----
bench_bool/bench_bool                          1.00   1462.2±7.38µs   342.0 MB/sec    1.00  1468.1±36.20µs   340.6 MB/sec
bench_decimal128_builder                       1.24    101.4±0.46µs        ? ?/sec    1.00     82.0±0.59µs        ? ?/sec
bench_decimal256_builder                       1.25    104.9±0.23µs        ? ?/sec    1.00     84.2±0.90µs        ? ?/sec
bench_decimal32_builder                        1.00     48.6±0.22µs        ? ?/sec    1.00     48.5±0.69µs        ? ?/sec
bench_decimal64_builder                        1.10     51.4±0.32µs        ? ?/sec    1.00     46.6±0.54µs        ? ?/sec
bench_primitive/bench_primitive                1.00    165.3±5.58µs    23.6 GB/sec    1.03    170.8±4.91µs    22.9 GB/sec
bench_primitive/bench_string                   1.00      7.9±0.16ms   827.6 MB/sec    1.00      7.9±0.32ms   823.8 MB/sec
bench_primitive_nulls/bench_primitive_nulls    1.00   1285.6±9.49µs        ? ?/sec    1.25  1603.8±16.79µs        ? ?/sec

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing buffer-builder-hot-path-opts (2e70130) to 7dbe58a diff
BENCH_NAME=cast_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench cast_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=buffer-builder-hot-path-opts
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                              buffer-builder-hot-path-opts           main
-----                                                              ----------------------------           ----
cast binary view to string                                         1.08     78.1±0.15µs        ? ?/sec    1.00     72.1±0.44µs        ? ?/sec
cast binary view to string view                                    1.06    106.3±0.94µs        ? ?/sec    1.00    100.6±0.72µs        ? ?/sec
cast binary view to wide string                                    1.07     70.4±0.46µs        ? ?/sec    1.00     65.7±0.24µs        ? ?/sec
cast date32 to date64 512                                          1.04    310.3±4.58ns        ? ?/sec    1.00    298.9±0.64ns        ? ?/sec
cast date64 to date32 512                                          1.01    512.2±5.64ns        ? ?/sec    1.00    506.0±1.57ns        ? ?/sec
cast decimal128 to decimal128 512                                  1.00   615.1±12.22ns        ? ?/sec    1.00   617.5±18.17ns        ? ?/sec
cast decimal128 to decimal128 512 lower precision                  1.00      5.2±0.04µs        ? ?/sec    1.01      5.3±0.29µs        ? ?/sec
cast decimal128 to decimal128 512 with lower scale (infallible)    1.00      6.9±0.01µs        ? ?/sec    1.00      6.9±0.02µs        ? ?/sec
cast decimal128 to decimal128 512 with same scale                  1.06     84.2±1.19ns        ? ?/sec    1.00     79.6±1.62ns        ? ?/sec
cast decimal128 to decimal256 512                                  1.00      2.3±0.03µs        ? ?/sec    1.00      2.3±0.04µs        ? ?/sec
cast decimal256 to decimal128 512                                  1.00     48.0±0.14µs        ? ?/sec    1.01     48.6±1.18µs        ? ?/sec
cast decimal256 to decimal256 512                                  1.03     11.2±0.11µs        ? ?/sec    1.00     10.9±0.15µs        ? ?/sec
cast decimal256 to decimal256 512 with same scale                  1.07     84.8±0.20ns        ? ?/sec    1.00     79.6±0.27ns        ? ?/sec
cast decimal32 to decimal32 512                                    1.00      2.3±0.02µs        ? ?/sec    1.04      2.4±0.01µs        ? ?/sec
cast decimal32 to decimal32 512 lower precision                    1.01      2.9±0.08µs        ? ?/sec    1.00      2.9±0.01µs        ? ?/sec
cast decimal32 to decimal64 512                                    1.00    326.3±6.54ns        ? ?/sec    1.06    346.8±0.67ns        ? ?/sec
cast decimal64 to decimal32 512                                    1.00      2.8±0.16µs        ? ?/sec    1.01      2.8±0.01µs        ? ?/sec
cast decimal64 to decimal64 512                                    1.01    391.8±6.30ns        ? ?/sec    1.00    388.0±9.45ns        ? ?/sec
cast dict to string view                                           1.00     46.2±0.47µs        ? ?/sec    1.00     46.2±0.09µs        ? ?/sec
cast f32 to string 512                                             1.00     18.5±0.05µs        ? ?/sec    1.00     18.4±0.06µs        ? ?/sec
cast f64 to string 512                                             1.03     22.0±0.04µs        ? ?/sec    1.00     21.5±0.04µs        ? ?/sec
cast float32 to int32 512                                          1.00   1218.5±3.41ns        ? ?/sec    1.14   1389.7±3.67ns        ? ?/sec
cast float64 to float32 512                                        1.00    795.0±4.81ns        ? ?/sec    1.16   922.6±51.00ns        ? ?/sec
cast float64 to uint64 512                                         1.00  1465.2±17.13ns        ? ?/sec    1.07  1574.1±11.09ns        ? ?/sec
cast i64 to string 512                                             1.00     14.2±0.04µs        ? ?/sec    1.02     14.4±0.57µs        ? ?/sec
cast int32 to float32 512                                          1.00   734.3±28.27ns        ? ?/sec    1.18    866.4±6.03ns        ? ?/sec
cast int32 to float64 512                                          1.06    887.8±6.43ns        ? ?/sec    1.00    840.5±1.70ns        ? ?/sec
cast int32 to int32 512                                            1.00    180.2±2.55ns        ? ?/sec    1.01    181.3±1.84ns        ? ?/sec
cast int32 to int64 512                                            1.00    739.2±3.76ns        ? ?/sec    1.14    841.4±1.17ns        ? ?/sec
cast int32 to uint32 512                                           1.00   1267.9±8.77ns        ? ?/sec    1.08  1371.3±29.15ns        ? ?/sec
cast int64 to int32 512                                            1.00  1463.5±32.83ns        ? ?/sec    1.15  1680.6±18.78ns        ? ?/sec
cast no runs of int32s to ree<int32>                               1.00     85.8±0.24µs        ? ?/sec    1.00     85.7±0.30µs        ? ?/sec
cast runs of 10 string to ree<int32>                               1.00     15.8±0.04µs        ? ?/sec    1.00     15.7±0.10µs        ? ?/sec
cast runs of 1000 int32s to ree<int32>                             1.00      7.8±0.09µs        ? ?/sec    1.00      7.8±0.02µs        ? ?/sec
cast string single run to ree<int32>                               1.00     22.9±0.42µs        ? ?/sec    1.00     22.9±0.37µs        ? ?/sec
cast string to binary view 512                                     1.04      3.5±0.07µs        ? ?/sec    1.00      3.4±0.01µs        ? ?/sec
cast string view to binary view                                    1.02     81.4±1.31ns        ? ?/sec    1.00     79.9±0.59ns        ? ?/sec
cast string view to dict                                           1.00    210.8±3.78µs        ? ?/sec    1.03    217.5±4.69µs        ? ?/sec
cast string view to string                                         1.03     49.6±0.24µs        ? ?/sec    1.00     48.2±0.49µs        ? ?/sec
cast string view to wide string                                    1.05     52.0±0.30µs        ? ?/sec    1.00     49.6±0.39µs        ? ?/sec
cast time32s to time32ms 512                                       1.02    293.2±0.97ns        ? ?/sec    1.00    287.3±0.57ns        ? ?/sec
cast time32s to time64us 512                                       1.06    309.2±0.90ns        ? ?/sec    1.00    292.6±0.44ns        ? ?/sec
cast time64ns to time32s 512                                       1.00    510.9±1.40ns        ? ?/sec    1.00    509.0±4.13ns        ? ?/sec
cast timestamp_ms to i64 512                                       1.00    245.3±2.36ns        ? ?/sec    1.02   250.2±16.89ns        ? ?/sec
cast timestamp_ms to timestamp_ns 512                              1.00  1788.3±29.89ns        ? ?/sec    1.04  1861.7±22.03ns        ? ?/sec
cast timestamp_ns to timestamp_s 512                               1.00    178.2±1.69ns        ? ?/sec    1.01    179.6±1.62ns        ? ?/sec
cast utf8 to date32 512                                            1.00     11.2±0.02µs        ? ?/sec    1.02     11.5±0.11µs        ? ?/sec
cast utf8 to date64 512                                            1.00     42.9±0.47µs        ? ?/sec    1.07     46.1±0.16µs        ? ?/sec
cast utf8 to f32                                                   1.03     12.4±0.06µs        ? ?/sec    1.00     12.0±0.03µs        ? ?/sec
cast wide string to binary view 512                                1.00      5.8±0.11µs        ? ?/sec    1.04      6.0±0.12µs        ? ?/sec

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing buffer-builder-hot-path-opts (2e70130) to 7dbe58a diff
BENCH_NAME=comparison_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench comparison_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=buffer-builder-hot-path-opts
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                                                    buffer-builder-hot-path-opts           main
-----                                                                                                    ----------------------------           ----
StringArray: regexp_matches_utf8 scalar benchmarks/regexp_matches_utf8 scalar complex                    1.00      2.8±0.05ms        ? ?/sec    1.01      2.9±0.02ms        ? ?/sec
StringArray: regexp_matches_utf8 scalar benchmarks/regexp_matches_utf8 scalar contains                   1.00      3.1±0.06ms        ? ?/sec    1.00      3.1±0.05ms        ? ?/sec
StringArray: regexp_matches_utf8 scalar benchmarks/regexp_matches_utf8 scalar ends with                  1.00      2.6±0.05ms        ? ?/sec    1.01      2.6±0.13ms        ? ?/sec
StringArray: regexp_matches_utf8 scalar benchmarks/regexp_matches_utf8 scalar starts with                1.00      2.1±0.03ms        ? ?/sec    1.02      2.2±0.09ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar complex        1.00      2.9±0.03ms        ? ?/sec    1.01      2.9±0.03ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar contains       1.02      3.2±0.05ms        ? ?/sec    1.00      3.1±0.05ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar ends with      1.01      2.6±0.07ms        ? ?/sec    1.00      2.6±0.04ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar starts with    1.01      2.2±0.05ms        ? ?/sec    1.00      2.2±0.03ms        ? ?/sec
eq Float32                                                                                               1.00     44.2±0.13µs        ? ?/sec    1.00     44.2±0.15µs        ? ?/sec
eq Int32                                                                                                 1.00     44.2±0.09µs        ? ?/sec    1.00     44.4±0.73µs        ? ?/sec
eq MonthDayNano                                                                                          1.03     94.9±7.58µs        ? ?/sec    1.00     91.7±3.18µs        ? ?/sec
eq StringArray StringArray                                                                               1.00     30.7±0.23ms        ? ?/sec    1.03     31.5±0.42ms        ? ?/sec
eq StringViewArray StringViewArray                                                                       1.00     26.5±0.18ms        ? ?/sec    1.00     26.5±0.18ms        ? ?/sec
eq StringViewArray StringViewArray inlined bytes                                                         1.00     22.1±0.13ms        ? ?/sec    1.01     22.2±0.26ms        ? ?/sec
eq dictionary[10] string[4])                                                                             1.09    878.2±3.98µs        ? ?/sec    1.00    803.8±3.79µs        ? ?/sec
eq long same prefix strings StringArray                                                                  1.00    560.4±7.40µs        ? ?/sec    1.01    567.6±7.82µs        ? ?/sec
eq long same prefix strings StringViewArray                                                              1.00    781.9±7.22µs        ? ?/sec    1.07    836.2±5.68µs        ? ?/sec
eq scalar Float32                                                                                        1.00     44.2±0.07µs        ? ?/sec    1.00     44.2±0.12µs        ? ?/sec
eq scalar Int32                                                                                          1.00     44.2±0.05µs        ? ?/sec    1.00     44.2±0.22µs        ? ?/sec
eq scalar MonthDayNano                                                                                   1.42     71.9±2.14µs        ? ?/sec    1.00     50.7±0.51µs        ? ?/sec
eq scalar StringArray                                                                                    1.14     27.6±0.36ms        ? ?/sec    1.00     24.3±0.52ms        ? ?/sec
eq scalar StringViewArray 13 bytes                                                                       1.04     17.9±0.07ms        ? ?/sec    1.00     17.2±0.13ms        ? ?/sec
eq scalar StringViewArray 4 bytes                                                                        1.00     16.0±0.14ms        ? ?/sec    1.03     16.5±0.16ms        ? ?/sec
eq scalar StringViewArray 6 bytes                                                                        1.00     16.1±0.15ms        ? ?/sec    1.02     16.4±0.17ms        ? ?/sec
eq_dyn_utf8_scalar dictionary[10] string[4])                                                             1.00     77.7±0.13µs        ? ?/sec    1.00     77.9±1.44µs        ? ?/sec
gt Float32                                                                                               1.01     57.4±0.14µs        ? ?/sec    1.00     57.0±0.46µs        ? ?/sec
gt Int32                                                                                                 1.00     44.2±0.08µs        ? ?/sec    1.00     44.3±0.12µs        ? ?/sec
gt scalar Float32                                                                                        1.00     45.8±0.10µs        ? ?/sec    1.00     45.8±0.09µs        ? ?/sec
gt scalar Int32                                                                                          1.00     44.1±0.09µs        ? ?/sec    1.00     44.1±0.09µs        ? ?/sec
gt_eq Float32                                                                                            1.00     57.2±0.25µs        ? ?/sec    1.00     57.0±0.19µs        ? ?/sec
gt_eq Int32                                                                                              1.00     44.3±0.27µs        ? ?/sec    1.00     44.3±0.19µs        ? ?/sec
gt_eq scalar Float32                                                                                     1.00     46.4±0.14µs        ? ?/sec    1.00     46.3±0.43µs        ? ?/sec
gt_eq scalar Int32                                                                                       1.00     44.2±0.08µs        ? ?/sec    1.00     44.1±0.07µs        ? ?/sec
gt_eq_dyn_utf8_scalar scalar dictionary[10] string[4])                                                   1.00     77.7±0.11µs        ? ?/sec    1.00     77.9±1.23µs        ? ?/sec
ilike_utf8 scalar complex                                                                                1.00      3.7±0.09ms        ? ?/sec    1.02      3.8±0.12ms        ? ?/sec
ilike_utf8 scalar contains                                                                               1.05      4.7±0.09ms        ? ?/sec    1.00      4.5±0.06ms        ? ?/sec
ilike_utf8 scalar ends with                                                                              1.00  1112.2±35.59µs        ? ?/sec    1.06  1181.6±51.80µs        ? ?/sec
ilike_utf8 scalar equals                                                                                 1.10   668.3±17.11µs        ? ?/sec    1.00    609.2±6.16µs        ? ?/sec
ilike_utf8 scalar starts with                                                                            1.00  1021.5±43.95µs        ? ?/sec    1.09  1113.0±44.43µs        ? ?/sec
ilike_utf8_scalar_dyn dictionary[10] string[4])                                                          1.00     78.4±0.43µs        ? ?/sec    1.00     78.4±1.46µs        ? ?/sec
like_utf8 scalar complex                                                                                 1.04      3.1±0.24ms        ? ?/sec    1.00      2.9±0.08ms        ? ?/sec
like_utf8 scalar contains                                                                                1.03  1824.6±41.08µs        ? ?/sec    1.00  1766.2±17.27µs        ? ?/sec
like_utf8 scalar ends with                                                                               1.03   426.9±18.06µs        ? ?/sec    1.00   413.1±13.06µs        ? ?/sec
like_utf8 scalar equals                                                                                  1.00     93.8±4.32µs        ? ?/sec    1.14    107.1±0.39µs        ? ?/sec
like_utf8 scalar starts with                                                                             1.06   378.0±16.55µs        ? ?/sec    1.00   357.4±17.27µs        ? ?/sec
like_utf8_scalar_dyn dictionary[10] string[4])                                                           1.00     78.2±0.75µs        ? ?/sec    1.00     78.2±1.41µs        ? ?/sec
like_utf8view scalar complex                                                                             1.00    234.2±3.08ms        ? ?/sec    1.01    236.0±1.90ms        ? ?/sec
like_utf8view scalar contains                                                                            1.01    159.6±0.97ms        ? ?/sec    1.00    157.4±1.22ms        ? ?/sec
like_utf8view scalar ends with 13 bytes                                                                  1.00     46.6±0.62ms        ? ?/sec    1.10     51.1±0.88ms        ? ?/sec
like_utf8view scalar ends with 4 bytes                                                                   1.00     47.3±0.22ms        ? ?/sec    1.11     52.5±0.14ms        ? ?/sec
like_utf8view scalar ends with 6 bytes                                                                   1.00     47.3±0.51ms        ? ?/sec    1.11     52.3±0.28ms        ? ?/sec
like_utf8view scalar equals                                                                              1.12     38.7±0.22ms        ? ?/sec    1.00     34.6±0.28ms        ? ?/sec
like_utf8view scalar starts with 13 bytes                                                                1.00     44.8±0.31ms        ? ?/sec    1.09     48.8±0.55ms        ? ?/sec
like_utf8view scalar starts with 4 bytes                                                                 1.00     28.8±0.09ms        ? ?/sec    1.26     36.2±0.75ms        ? ?/sec
like_utf8view scalar starts with 6 bytes                                                                 1.00     45.5±0.32ms        ? ?/sec    1.10     50.0±0.52ms        ? ?/sec
long same prefix strings like_utf8 scalar complex                                                        1.00   1752.2±7.61µs        ? ?/sec    1.01  1771.8±13.74µs        ? ?/sec
long same prefix strings like_utf8 scalar contains                                                       1.03      4.5±0.09ms        ? ?/sec    1.00      4.3±0.02ms        ? ?/sec
long same prefix strings like_utf8 scalar ends with                                                      1.02      2.0±0.03ms        ? ?/sec    1.00  1957.7±12.34µs        ? ?/sec
long same prefix strings like_utf8 scalar equals                                                         1.01    637.7±8.30µs        ? ?/sec    1.00   633.5±10.22µs        ? ?/sec
long same prefix strings like_utf8 scalar starts with                                                    1.03      2.3±0.09ms        ? ?/sec    1.00      2.2±0.09ms        ? ?/sec
long same prefix strings like_utf8view scalar complex                                                    1.02   1801.9±8.88µs        ? ?/sec    1.00  1774.0±10.40µs        ? ?/sec
long same prefix strings like_utf8view scalar contains                                                   1.01      4.4±0.02ms        ? ?/sec    1.00      4.4±0.06ms        ? ?/sec
long same prefix strings like_utf8view scalar ends with                                                  1.03      2.1±0.05ms        ? ?/sec    1.00  1999.7±47.04µs        ? ?/sec
long same prefix strings like_utf8view scalar equals                                                     1.02    700.1±8.44µs        ? ?/sec    1.00   686.3±13.16µs        ? ?/sec
long same prefix strings like_utf8view scalar starts with                                                1.00      2.2±0.03ms        ? ?/sec    1.03      2.3±0.06ms        ? ?/sec
lt Float32                                                                                               1.00     56.1±2.60µs        ? ?/sec    1.02     57.0±0.11µs        ? ?/sec
lt Int32                                                                                                 1.00     44.2±0.12µs        ? ?/sec    1.00     44.3±0.34µs        ? ?/sec
lt StringViewArray StringViewArray inlined bytes                                                         1.00     26.3±0.16ms        ? ?/sec    1.08     28.3±0.32ms        ? ?/sec
lt long same prefix strings StringArray                                                                  1.00    638.0±9.86µs        ? ?/sec    1.07    682.9±3.68µs        ? ?/sec
lt long same prefix strings StringViewArray                                                              1.10    796.0±6.40µs        ? ?/sec    1.00    725.2±8.52µs        ? ?/sec
lt scalar Float32                                                                                        1.00     46.2±0.51µs        ? ?/sec    1.00     46.4±0.17µs        ? ?/sec
lt scalar Int32                                                                                          1.00     44.2±0.39µs        ? ?/sec    1.00     44.3±0.44µs        ? ?/sec
lt scalar StringArray                                                                                    1.00     43.8±0.17ms        ? ?/sec    1.02     44.7±0.53ms        ? ?/sec
lt scalar StringViewArray                                                                                1.14     37.1±0.27ms        ? ?/sec    1.00     32.6±0.14ms        ? ?/sec
lt_eq Float32                                                                                            1.02     57.4±0.83µs        ? ?/sec    1.00     56.3±2.18µs        ? ?/sec
lt_eq Int32                                                                                              1.00     44.2±0.09µs        ? ?/sec    1.00     44.4±0.84µs        ? ?/sec
lt_eq scalar Float32                                                                                     1.00     45.8±0.24µs        ? ?/sec    1.00     45.7±0.32µs        ? ?/sec
lt_eq scalar Int32                                                                                       1.00     44.1±0.13µs        ? ?/sec    1.00     44.3±0.72µs        ? ?/sec
neq Float32                                                                                              1.00     44.3±0.16µs        ? ?/sec    1.00     44.3±0.55µs        ? ?/sec
neq Int32                                                                                                1.00     44.2±0.11µs        ? ?/sec    1.00     44.4±1.46µs        ? ?/sec
neq long same prefix strings StringArray                                                                 1.00   567.8±11.01µs        ? ?/sec    1.00    568.4±6.09µs        ? ?/sec
neq long same prefix strings StringViewArray                                                             1.00    784.8±7.82µs        ? ?/sec    1.07   838.6±16.64µs        ? ?/sec
neq scalar Float32                                                                                       1.00     44.2±0.08µs        ? ?/sec    1.00     44.1±0.06µs        ? ?/sec
neq scalar Int32                                                                                         1.00     44.2±0.09µs        ? ?/sec    1.01     44.4±1.29µs        ? ?/sec
nilike_utf8 scalar complex                                                                               1.00      3.7±0.04ms        ? ?/sec    1.04      3.8±0.16ms        ? ?/sec
nilike_utf8 scalar contains                                                                              1.01      4.6±0.07ms        ? ?/sec    1.00      4.6±0.15ms        ? ?/sec
nilike_utf8 scalar ends with                                                                             1.00  1076.9±23.85µs        ? ?/sec    1.09  1177.0±50.80µs        ? ?/sec
nilike_utf8 scalar equals                                                                                1.04   669.9±40.53µs        ? ?/sec    1.00   642.6±27.22µs        ? ?/sec
nilike_utf8 scalar starts with                                                                           1.00  1002.9±15.69µs        ? ?/sec    1.07  1076.2±43.47µs        ? ?/sec
nlike_utf8 scalar complex                                                                                1.02      3.0±0.07ms        ? ?/sec    1.00      2.9±0.08ms        ? ?/sec
nlike_utf8 scalar contains                                                                               1.01  1789.5±18.44µs        ? ?/sec    1.00  1771.4±18.65µs        ? ?/sec
nlike_utf8 scalar ends with                                                                              1.01   412.4±20.38µs        ? ?/sec    1.00    407.8±8.18µs        ? ?/sec
nlike_utf8 scalar equals                                                                                 1.00     93.0±1.89µs        ? ?/sec    1.16    107.4±0.49µs        ? ?/sec
nlike_utf8 scalar starts with                                                                            1.00   344.4±13.29µs        ? ?/sec    1.01   349.3±11.87µs        ? ?/sec

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing buffer-builder-hot-path-opts (2e70130) to 7dbe58a diff
BENCH_NAME=concatenate_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench concatenate_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=buffer-builder-hot-path-opts
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                          buffer-builder-hot-path-opts           main
-----                                                          ----------------------------           ----
concat 1024 arrays boolean 4                                   1.00     21.3±0.48µs        ? ?/sec    1.03     21.9±0.18µs        ? ?/sec
concat 1024 arrays i32 4                                       1.00     13.7±0.33µs        ? ?/sec    1.03     14.2±0.04µs        ? ?/sec
concat 1024 arrays str 4                                       1.00     35.3±0.80µs        ? ?/sec    1.04     36.6±0.25µs        ? ?/sec
concat boolean 1024                                            1.02    303.6±6.52ns        ? ?/sec    1.00    298.3±7.87ns        ? ?/sec
concat boolean 8192 over 100 arrays                            1.00      5.0±0.04µs        ? ?/sec    1.01      5.1±0.09µs        ? ?/sec
concat boolean nulls 1024                                      1.04    552.3±6.15ns        ? ?/sec    1.00    531.1±5.21ns        ? ?/sec
concat boolean nulls 8192 over 100 arrays                      1.00     18.2±0.06µs        ? ?/sec    1.00     18.1±0.36µs        ? ?/sec
concat fixed size lists                                        1.00   791.7±26.09µs        ? ?/sec    1.01   801.0±27.52µs        ? ?/sec
concat i32 1024                                                1.00    349.6±0.86ns        ? ?/sec    1.01    352.6±2.96ns        ? ?/sec
concat i32 8192 over 100 arrays                                1.04   219.5±10.43µs        ? ?/sec    1.00    210.9±9.29µs        ? ?/sec
concat i32 nulls 1024                                          1.00    565.7±2.14ns        ? ?/sec    1.02    576.2±5.60ns        ? ?/sec
concat i32 nulls 8192 over 100 arrays                          1.00    235.3±4.64µs        ? ?/sec    1.02    239.9±5.28µs        ? ?/sec
concat str 1024                                                1.00     12.9±1.20µs        ? ?/sec    1.07     13.8±0.88µs        ? ?/sec
concat str 8192 over 100 arrays                                1.01    105.3±1.29ms        ? ?/sec    1.00    104.0±1.16ms        ? ?/sec
concat str nulls 1024                                          1.08      5.9±0.55µs        ? ?/sec    1.00      5.5±0.58µs        ? ?/sec
concat str nulls 8192 over 100 arrays                          1.01     54.2±0.48ms        ? ?/sec    1.00     53.6±1.13ms        ? ?/sec
concat str_dict 1024                                           1.01      2.6±0.03µs        ? ?/sec    1.00      2.6±0.01µs        ? ?/sec
concat str_dict_sparse 1024                                    1.06      7.3±0.08µs        ? ?/sec    1.00      6.9±0.03µs        ? ?/sec
concat struct with int32 and dicts size=1024 count=2           1.00      6.4±0.13µs        ? ?/sec    1.01      6.5±0.03µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0               1.01     78.3±0.44µs        ? ?/sec    1.00     77.7±0.81µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0.2             1.00     79.3±0.53µs        ? ?/sec    1.05     83.0±0.28µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0                1.00     77.5±0.63µs        ? ?/sec    1.16     89.7±1.86µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.00     80.0±2.46µs        ? ?/sec    1.13     90.7±0.45µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0      1.01     47.7±3.34µs        ? ?/sec    1.00     47.2±3.41µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0.2    1.02     49.0±3.08µs        ? ?/sec    1.00     48.2±2.68µs        ? ?/sec

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing buffer-builder-hot-path-opts (2e70130) to 7dbe58a diff
BENCH_NAME=filter_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench filter_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=buffer-builder-hot-path-opts
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                         buffer-builder-hot-path-opts           main
-----                                                                         ----------------------------           ----
filter context decimal128 (kept 1/2)                                          1.06     45.5±5.22µs        ? ?/sec    1.00     42.9±0.46µs        ? ?/sec
filter context decimal128 high selectivity (kept 1023/1024)                   1.02     50.3±1.56µs        ? ?/sec    1.00     49.5±1.40µs        ? ?/sec
filter context decimal128 low selectivity (kept 1/1024)                       1.03    234.1±2.51ns        ? ?/sec    1.00    226.8±1.65ns        ? ?/sec
filter context f32 (kept 1/2)                                                 1.00     77.4±0.24µs        ? ?/sec    1.14     88.1±2.19µs        ? ?/sec
filter context f32 high selectivity (kept 1023/1024)                          1.01     10.2±0.39µs        ? ?/sec    1.00     10.1±0.36µs        ? ?/sec
filter context f32 low selectivity (kept 1/1024)                              1.00    442.1±6.77ns        ? ?/sec    1.00    440.1±7.84ns        ? ?/sec
filter context fsb with value length 20 (kept 1/2)                            1.00     60.5±0.11µs        ? ?/sec    1.17     70.6±0.27µs        ? ?/sec
filter context fsb with value length 20 high selectivity (kept 1023/1024)     1.00     60.6±0.14µs        ? ?/sec    1.17     70.7±0.56µs        ? ?/sec
filter context fsb with value length 20 low selectivity (kept 1/1024)         1.00     60.5±0.10µs        ? ?/sec    1.17     70.7±0.38µs        ? ?/sec
filter context fsb with value length 5 (kept 1/2)                             1.00     60.8±0.40µs        ? ?/sec    1.16     70.6±0.60µs        ? ?/sec
filter context fsb with value length 5 high selectivity (kept 1023/1024)      1.00     60.7±0.73µs        ? ?/sec    1.17     70.9±1.46µs        ? ?/sec
filter context fsb with value length 5 low selectivity (kept 1/1024)          1.00     60.6±0.53µs        ? ?/sec    1.16     70.6±0.23µs        ? ?/sec
filter context fsb with value length 50 (kept 1/2)                            1.00     60.6±0.17µs        ? ?/sec    1.17     70.7±0.81µs        ? ?/sec
filter context fsb with value length 50 high selectivity (kept 1023/1024)     1.00     60.5±0.11µs        ? ?/sec    1.17     70.8±0.88µs        ? ?/sec
filter context fsb with value length 50 low selectivity (kept 1/1024)         1.00     60.5±0.11µs        ? ?/sec    1.17     70.6±0.24µs        ? ?/sec
filter context i32 (kept 1/2)                                                 1.00     16.7±0.10µs        ? ?/sec    1.00     16.7±0.11µs        ? ?/sec
filter context i32 high selectivity (kept 1023/1024)                          1.14      7.2±0.24µs        ? ?/sec    1.00      6.3±0.48µs        ? ?/sec
filter context i32 low selectivity (kept 1/1024)                              1.03    226.5±2.57ns        ? ?/sec    1.00    220.1±0.69ns        ? ?/sec
filter context i32 w NULLs (kept 1/2)                                         1.00     77.4±0.30µs        ? ?/sec    1.13     87.7±0.29µs        ? ?/sec
filter context i32 w NULLs high selectivity (kept 1023/1024)                  1.00      9.9±0.40µs        ? ?/sec    1.01     10.0±0.43µs        ? ?/sec
filter context i32 w NULLs low selectivity (kept 1/1024)                      1.00    441.4±2.79ns        ? ?/sec    1.00    441.6±6.72ns        ? ?/sec
filter context mixed string view (kept 1/2)                                   1.00    111.0±7.69µs        ? ?/sec    1.09    121.2±8.41µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            1.08     58.3±0.85µs        ? ?/sec    1.00     54.0±0.95µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                1.01   626.9±14.95ns        ? ?/sec    1.00    622.1±5.49ns        ? ?/sec
filter context short string view (kept 1/2)                                   1.00    104.9±5.39µs        ? ?/sec    1.06    111.7±0.78µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            1.04     55.9±2.06µs        ? ?/sec    1.00     53.8±1.58µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                1.01    494.3±6.02ns        ? ?/sec    1.00    489.6±4.05ns        ? ?/sec
filter context string (kept 1/2)                                              1.00   573.6±10.47µs        ? ?/sec    1.01    577.4±6.02µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   1.00     16.8±0.15µs        ? ?/sec    1.00     16.8±0.21µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.00      7.1±0.28µs        ? ?/sec    1.05      7.4±0.28µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.04    656.6±7.25ns        ? ?/sec    1.00    632.9±3.22ns        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           1.00     78.2±1.33µs        ? ?/sec    1.13     88.3±0.48µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.02     10.7±0.35µs        ? ?/sec    1.00     10.5±0.34µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.03   889.8±11.62ns        ? ?/sec    1.00    864.5±3.01ns        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.00   653.6±17.70µs        ? ?/sec    1.03   672.3±20.63µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.05   1021.9±6.97ns        ? ?/sec    1.00    970.3±5.68ns        ? ?/sec
filter context u8 (kept 1/2)                                                  1.00     15.0±0.08µs        ? ?/sec    1.00     14.9±0.03µs        ? ?/sec
filter context u8 high selectivity (kept 1023/1024)                           1.11      2.0±0.01µs        ? ?/sec    1.00   1802.6±7.50ns        ? ?/sec
filter context u8 low selectivity (kept 1/1024)                               1.01    216.2±2.44ns        ? ?/sec    1.00    214.9±2.57ns        ? ?/sec
filter context u8 w NULLs (kept 1/2)                                          1.00     75.9±0.23µs        ? ?/sec    1.14     86.3±1.94µs        ? ?/sec
filter context u8 w NULLs high selectivity (kept 1023/1024)                   1.00      5.4±0.02µs        ? ?/sec    1.01      5.4±0.02µs        ? ?/sec
filter context u8 w NULLs low selectivity (kept 1/1024)                       1.01    438.7±3.59ns        ? ?/sec    1.00    435.4±6.46ns        ? ?/sec
filter decimal128 (kept 1/2)                                                  1.00     47.9±3.18µs        ? ?/sec    1.23     59.0±3.76µs        ? ?/sec
filter decimal128 high selectivity (kept 1023/1024)                           1.02     53.1±1.92µs        ? ?/sec    1.00     52.2±0.88µs        ? ?/sec
filter decimal128 low selectivity (kept 1/1024)                               1.05      3.1±0.02µs        ? ?/sec    1.00      2.9±0.02µs        ? ?/sec
filter f32 (kept 1/2)                                                         1.09    172.6±0.34µs        ? ?/sec    1.00    158.4±1.76µs        ? ?/sec
filter fsb with value length 20 (kept 1/2)                                    1.00    137.0±1.11µs        ? ?/sec    1.00    136.8±0.59µs        ? ?/sec
filter fsb with value length 20 high selectivity (kept 1023/1024)             1.00     69.8±2.10µs        ? ?/sec    1.08     75.7±2.03µs        ? ?/sec
filter fsb with value length 20 low selectivity (kept 1/1024)                 1.29      3.5±0.01µs        ? ?/sec    1.00      2.7±0.01µs        ? ?/sec
filter fsb with value length 5 (kept 1/2)                                     1.00    142.5±0.75µs        ? ?/sec    1.00    142.5±0.43µs        ? ?/sec
filter fsb with value length 5 high selectivity (kept 1023/1024)              1.03     11.8±0.67µs        ? ?/sec    1.00     11.4±0.51µs        ? ?/sec
filter fsb with value length 5 low selectivity (kept 1/1024)                  1.30      3.4±0.01µs        ? ?/sec    1.00      2.6±0.01µs        ? ?/sec
filter fsb with value length 50 (kept 1/2)                                    1.08    171.5±8.46µs        ? ?/sec    1.00    158.7±4.19µs        ? ?/sec
filter fsb with value length 50 high selectivity (kept 1023/1024)             1.02    216.2±1.74µs        ? ?/sec    1.00   212.4±11.02µs        ? ?/sec
filter fsb with value length 50 low selectivity (kept 1/1024)                 1.29      3.4±0.05µs        ? ?/sec    1.00      2.7±0.05µs        ? ?/sec
filter i32 (kept 1/2)                                                         1.00     53.6±0.16µs        ? ?/sec    1.01     53.9±0.50µs        ? ?/sec
filter i32 high selectivity (kept 1023/1024)                                  1.06      9.2±0.24µs        ? ?/sec    1.00      8.7±0.40µs        ? ?/sec
filter i32 low selectivity (kept 1/1024)                                      1.00      2.9±0.02µs        ? ?/sec    1.02      2.9±0.01µs        ? ?/sec
filter optimize (kept 1/2)                                                    1.04     37.0±0.05µs        ? ?/sec    1.00     35.5±0.18µs        ? ?/sec
filter optimize high selectivity (kept 1023/1024)                             1.00      2.6±0.01µs        ? ?/sec    1.17      3.0±0.02µs        ? ?/sec
filter optimize low selectivity (kept 1/1024)                                 1.00   1832.3±7.24ns        ? ?/sec    1.38      2.5±0.01µs        ? ?/sec
filter run array (kept 1/2)                                                   1.00    462.1±1.85µs        ? ?/sec    1.04    479.2±2.08µs        ? ?/sec
filter run array high selectivity (kept 1023/1024)                            1.00    495.3±2.20µs        ? ?/sec    1.00    494.5±4.76µs        ? ?/sec
filter run array low selectivity (kept 1/1024)                                1.00    381.1±1.87µs        ? ?/sec    1.00    381.0±2.91µs        ? ?/sec
filter single record batch                                                    1.37     53.6±0.10µs        ? ?/sec    1.00     39.0±0.11µs        ? ?/sec
filter u8 (kept 1/2)                                                          1.00     36.0±0.14µs        ? ?/sec    1.24     44.6±0.15µs        ? ?/sec
filter u8 high selectivity (kept 1023/1024)                                   1.00      3.7±0.01µs        ? ?/sec    1.06      3.9±0.02µs        ? ?/sec
filter u8 low selectivity (kept 1/1024)                                       1.00      2.9±0.02µs        ? ?/sec    1.16      3.3±0.01µs        ? ?/sec

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing buffer-builder-hot-path-opts (2e70130) to 7dbe58a diff
BENCH_NAME=interleave_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench interleave_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=buffer-builder-hot-path-opts
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                                        buffer-builder-hot-path-opts            main
-----                                                                                        ----------------------------            ----
interleave dict(20, 0.0) 100 [0..100, 100..230, 450..1000]                                   1.00    756.6±3.32ns        ? ?/sec     1.01   767.5±21.03ns        ? ?/sec
interleave dict(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                         1.01      2.2±0.01µs        ? ?/sec     1.00      2.2±0.01µs        ? ?/sec
interleave dict(20, 0.0) 1024 [0..100, 100..230, 450..1000]                                  1.02      2.2±0.01µs        ? ?/sec     1.00      2.2±0.04µs        ? ?/sec
interleave dict(20, 0.0) 400 [0..100, 100..230, 450..1000]                                   1.00   1255.8±7.59ns        ? ?/sec     1.00   1254.0±9.12ns        ? ?/sec
interleave dict_distinct 100                                                                 1.02      3.0±0.09µs        ? ?/sec     1.00      2.9±0.02µs        ? ?/sec
interleave dict_distinct 1024                                                                1.02      3.0±0.02µs        ? ?/sec     1.00      2.9±0.02µs        ? ?/sec
interleave dict_distinct 2048                                                                1.02      3.0±0.02µs        ? ?/sec     1.00      2.9±0.01µs        ? ?/sec
interleave dict_sparse(20, 0.0) 100 [0..100, 100..230, 450..1000]                            1.06      2.9±0.12µs        ? ?/sec     1.00      2.7±0.16µs        ? ?/sec
interleave dict_sparse(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                  1.00      4.9±0.10µs        ? ?/sec     1.03      5.0±0.28µs        ? ?/sec
interleave dict_sparse(20, 0.0) 1024 [0..100, 100..230, 450..1000]                           1.00      4.1±0.12µs        ? ?/sec     1.05      4.3±0.17µs        ? ?/sec
interleave dict_sparse(20, 0.0) 400 [0..100, 100..230, 450..1000]                            1.00      3.2±0.17µs        ? ?/sec     1.06      3.4±0.11µs        ? ?/sec
interleave i32(0.0) 100 [0..100, 100..230, 450..1000]                                        1.07    296.8±6.79ns        ? ?/sec     1.00    276.7±5.93ns        ? ?/sec
interleave i32(0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                              1.03  1567.5±83.96ns        ? ?/sec     1.00  1525.3±37.84ns        ? ?/sec
interleave i32(0.0) 1024 [0..100, 100..230, 450..1000]                                       1.02  1552.4±100.77ns        ? ?/sec    1.00  1515.3±19.18ns        ? ?/sec
interleave i32(0.0) 400 [0..100, 100..230, 450..1000]                                        1.02   734.6±37.75ns        ? ?/sec     1.00   718.4±30.95ns        ? ?/sec
interleave i32(0.5) 100 [0..100, 100..230, 450..1000]                                        1.02   587.5±12.25ns        ? ?/sec     1.00    576.1±2.05ns        ? ?/sec
interleave i32(0.5) 1024 [0..100, 100..230, 450..1000, 0..1000]                              1.00      4.0±0.01µs        ? ?/sec     1.01      4.0±0.02µs        ? ?/sec
interleave i32(0.5) 1024 [0..100, 100..230, 450..1000]                                       1.01      4.1±0.14µs        ? ?/sec     1.00      4.1±0.01µs        ? ?/sec
interleave i32(0.5) 400 [0..100, 100..230, 450..1000]                                        1.00   1710.3±6.91ns        ? ?/sec     1.03  1761.9±30.20ns        ? ?/sec
interleave list<i64>(0.0,0.0,20) 100 [0..100, 100..230, 450..1000]                           1.00      2.7±0.01µs        ? ?/sec     1.00      2.7±0.01µs        ? ?/sec
interleave list<i64>(0.0,0.0,20) 1024 [0..100, 100..230, 450..1000, 0..1000]                 1.00     26.1±0.72µs        ? ?/sec     1.02     26.6±1.02µs        ? ?/sec
interleave list<i64>(0.0,0.0,20) 1024 [0..100, 100..230, 450..1000]                          1.00     25.9±0.26µs        ? ?/sec     1.00     26.0±0.84µs        ? ?/sec
interleave list<i64>(0.0,0.0,20) 400 [0..100, 100..230, 450..1000]                           1.00     10.4±0.03µs        ? ?/sec     1.00     10.5±0.03µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 100 [0..100, 100..230, 450..1000]                           1.00      5.7±0.02µs        ? ?/sec     1.00      5.7±0.12µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 1024 [0..100, 100..230, 450..1000, 0..1000]                 1.00     46.3±0.10µs        ? ?/sec     1.00     46.5±0.67µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 1024 [0..100, 100..230, 450..1000]                          1.00     46.7±0.28µs        ? ?/sec     1.00     46.8±0.38µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 400 [0..100, 100..230, 450..1000]                           1.00     18.7±0.41µs        ? ?/sec     1.00     18.7±0.09µs        ? ?/sec
interleave str(20, 0.0) 100 [0..100, 100..230, 450..1000]                                    1.00    761.7±1.69ns        ? ?/sec     1.04   791.2±13.19ns        ? ?/sec
interleave str(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                          1.00      5.9±0.09µs        ? ?/sec     1.05      6.2±0.02µs        ? ?/sec
interleave str(20, 0.0) 1024 [0..100, 100..230, 450..1000]                                   1.00      5.9±0.23µs        ? ?/sec     1.04      6.1±0.03µs        ? ?/sec
interleave str(20, 0.0) 400 [0..100, 100..230, 450..1000]                                    1.00      2.4±0.01µs        ? ?/sec     1.07      2.6±0.04µs        ? ?/sec
interleave str(20, 0.5) 100 [0..100, 100..230, 450..1000]                                    1.01  1056.4±14.36ns        ? ?/sec     1.00   1045.7±4.25ns        ? ?/sec
interleave str(20, 0.5) 1024 [0..100, 100..230, 450..1000, 0..1000]                          1.00     10.1±0.07µs        ? ?/sec     1.00     10.1±0.05µs        ? ?/sec
interleave str(20, 0.5) 1024 [0..100, 100..230, 450..1000]                                   1.00     10.0±0.13µs        ? ?/sec     1.02     10.2±0.10µs        ? ?/sec
interleave str(20, 0.5) 400 [0..100, 100..230, 450..1000]                                    1.00      3.6±0.02µs        ? ?/sec     1.03      3.7±0.02µs        ? ?/sec
interleave str_view(0.0) 100 [0..100, 100..230, 450..1000]                                   1.00    805.6±3.32ns        ? ?/sec     1.08    873.3±1.72ns        ? ?/sec
interleave str_view(0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                         1.00      4.9±0.02µs        ? ?/sec     1.01      5.0±0.03µs        ? ?/sec
interleave str_view(0.0) 1024 [0..100, 100..230, 450..1000]                                  1.00      4.8±0.01µs        ? ?/sec     1.01      4.9±0.01µs        ? ?/sec
interleave str_view(0.0) 400 [0..100, 100..230, 450..1000]                                   1.00   1925.1±7.45ns        ? ?/sec     1.11      2.1±0.01µs        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 100 [0..100, 100..230, 450..1000]                       1.00    843.9±5.15ns        ? ?/sec     1.04    878.4±8.20ns        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]             1.00      3.3±0.03µs        ? ?/sec     1.00      3.3±0.10µs        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 1024 [0..100, 100..230, 450..1000]                      1.00      3.4±0.08µs        ? ?/sec     1.02      3.5±0.11µs        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 400 [0..100, 100..230, 450..1000]                       1.00   1718.0±9.86ns        ? ?/sec     1.00  1723.0±21.21ns        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 100 [0..100, 100..230, 450..1000]                   1.00   1295.7±4.07ns        ? ?/sec     1.02   1323.2±7.64ns        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]         1.00      7.8±0.13µs        ? ?/sec     1.03      8.0±0.30µs        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 1024 [0..100, 100..230, 450..1000]                  1.00      7.7±0.15µs        ? ?/sec     1.03      8.0±0.17µs        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 400 [0..100, 100..230, 450..1000]                   1.00      3.5±0.05µs        ? ?/sec     1.02      3.5±0.17µs        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 100 [0..100, 100..230, 450..1000]              1.00  1797.6±17.43ns        ? ?/sec     1.02  1832.9±25.16ns        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 1024 [0..100, 100..230, 450..1000, 0..1000]    1.00     12.2±0.06µs        ? ?/sec     1.04     12.7±0.05µs        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 1024 [0..100, 100..230, 450..1000]             1.00     12.1±0.06µs        ? ?/sec     1.04     12.6±0.03µs        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 400 [0..100, 100..230, 450..1000]              1.00      5.4±0.05µs        ? ?/sec     1.02      5.5±0.06µs        ? ?/sec

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing buffer-builder-hot-path-opts (2e70130) to 7dbe58a diff
BENCH_NAME=sort_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench sort_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=buffer-builder-hot-path-opts
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                   buffer-builder-hot-path-opts           main
-----                                                   ----------------------------           ----
lexsort (bool, bool) 2^12                               1.00    115.3±3.03µs        ? ?/sec    1.04    119.4±2.87µs        ? ?/sec
lexsort (bool, bool) nulls 2^12                         1.06   162.8±10.79µs        ? ?/sec    1.00    153.6±0.41µs        ? ?/sec
lexsort (f32, f32) 2^10                                 1.01     45.7±0.49µs        ? ?/sec    1.00     45.3±2.40µs        ? ?/sec
lexsort (f32, f32) 2^12                                 1.01    212.6±0.95µs        ? ?/sec    1.00    211.5±2.27µs        ? ?/sec
lexsort (f32, f32) 2^12 limit 10                        1.02     39.1±2.17µs        ? ?/sec    1.00     38.4±0.24µs        ? ?/sec
lexsort (f32, f32) 2^12 limit 100                       1.01     41.4±1.17µs        ? ?/sec    1.00     40.8±0.16µs        ? ?/sec
lexsort (f32, f32) 2^12 limit 1000                      1.01     79.6±1.26µs        ? ?/sec    1.00     78.7±0.79µs        ? ?/sec
lexsort (f32, f32) 2^12 limit 2^12                      1.00    213.6±4.30µs        ? ?/sec    1.00    213.3±2.43µs        ? ?/sec
lexsort (f32, f32) nulls 2^10                           1.08     55.4±0.75µs        ? ?/sec    1.00     51.3±0.14µs        ? ?/sec
lexsort (f32, f32) nulls 2^12                           1.07    259.4±2.81µs        ? ?/sec    1.00    243.2±0.58µs        ? ?/sec
lexsort (f32, f32) nulls 2^12 limit 10                  1.06     90.2±0.62µs        ? ?/sec    1.00     84.9±1.26µs        ? ?/sec
lexsort (f32, f32) nulls 2^12 limit 100                 1.06     91.2±0.56µs        ? ?/sec    1.00     85.7±0.60µs        ? ?/sec
lexsort (f32, f32) nulls 2^12 limit 1000                1.09    103.9±1.50µs        ? ?/sec    1.00     95.6±0.67µs        ? ?/sec
lexsort (f32, f32) nulls 2^12 limit 2^12                1.07    259.8±0.95µs        ? ?/sec    1.00    243.4±0.72µs        ? ?/sec
rank f32 2^12                                           1.04     72.3±0.24µs        ? ?/sec    1.00     69.6±0.49µs        ? ?/sec
rank f32 nulls 2^12                                     1.05     37.2±0.07µs        ? ?/sec    1.00     35.4±0.33µs        ? ?/sec
rank string[10] 2^12                                    1.00    252.8±4.34µs        ? ?/sec    1.01    255.4±2.99µs        ? ?/sec
rank string[10] nulls 2^12                              1.00    122.4±1.33µs        ? ?/sec    1.00    122.2±1.95µs        ? ?/sec
sort f32 2^12                                           1.00     66.8±0.74µs        ? ?/sec    1.05     70.1±1.41µs        ? ?/sec
sort f32 nulls 2^12                                     1.00     29.3±0.35µs        ? ?/sec    1.02     30.0±0.39µs        ? ?/sec
sort f32 nulls to indices 2^12                          1.04     39.1±0.06µs        ? ?/sec    1.00     37.6±0.31µs        ? ?/sec
sort f32 to indices 2^12                                1.05     75.5±1.29µs        ? ?/sec    1.00     72.2±1.39µs        ? ?/sec
sort i32 2^10                                           1.18      8.7±0.13µs        ? ?/sec    1.00      7.3±0.02µs        ? ?/sec
sort i32 2^12                                           1.18     42.3±1.00µs        ? ?/sec    1.00     36.0±0.86µs        ? ?/sec
sort i32 nulls 2^10                                     1.05      5.0±0.01µs        ? ?/sec    1.00      4.8±0.05µs        ? ?/sec
sort i32 nulls 2^12                                     1.05     21.1±0.07µs        ? ?/sec    1.00     20.1±0.04µs        ? ?/sec
sort i32 nulls to indices 2^10                          1.01      7.0±0.01µs        ? ?/sec    1.00      7.0±0.02µs        ? ?/sec
sort i32 nulls to indices 2^12                          1.02     29.2±0.43µs        ? ?/sec    1.00     28.7±0.26µs        ? ?/sec
sort i32 to indices 2^10                                1.01     11.5±0.10µs        ? ?/sec    1.00     11.4±0.02µs        ? ?/sec
sort i32 to indices 2^12                                1.00     53.5±0.31µs        ? ?/sec    1.00     53.4±0.56µs        ? ?/sec
sort primitive run 2^12                                 1.05      7.2±0.35µs        ? ?/sec    1.00      6.8±0.07µs        ? ?/sec
sort primitive run to indices 2^12                      1.03      8.4±0.24µs        ? ?/sec    1.00      8.1±0.10µs        ? ?/sec
sort string[0-100] nulls to indices 2^12                1.02     43.1±0.26µs        ? ?/sec    1.00     42.4±0.10µs        ? ?/sec
sort string[0-100] to indices 2^12                      1.01     92.2±0.87µs        ? ?/sec    1.00     91.3±0.21µs        ? ?/sec
sort string[0-10] nulls to indices 2^12                 1.00     46.9±0.29µs        ? ?/sec    1.04     48.6±0.25µs        ? ?/sec
sort string[0-10] to indices 2^12                       1.00    125.1±0.18µs        ? ?/sec    1.00    125.0±0.41µs        ? ?/sec
sort string[0-400] nulls to indices 2^12                1.00     42.5±0.40µs        ? ?/sec    1.01     42.9±0.17µs        ? ?/sec
sort string[0-400] to indices 2^12                      1.01     91.8±1.94µs        ? ?/sec    1.00     91.1±0.19µs        ? ?/sec
sort string[1000] nulls to indices 2^12                 1.00     43.6±0.30µs        ? ?/sec    1.01     43.9±0.61µs        ? ?/sec
sort string[1000] to indices 2^12                       1.00     88.7±2.50µs        ? ?/sec    1.02     90.2±0.99µs        ? ?/sec
sort string[100] nulls to indices 2^12                  1.00     42.4±0.82µs        ? ?/sec    1.01     42.7±0.21µs        ? ?/sec
sort string[100] to indices 2^12                        1.00     87.4±1.34µs        ? ?/sec    1.02     89.1±0.37µs        ? ?/sec
sort string[10] dict nulls to indices 2^12              1.00    150.9±0.27µs        ? ?/sec    1.01    151.7±1.86µs        ? ?/sec
sort string[10] dict to indices 2^12                    1.00    314.3±3.07µs        ? ?/sec    1.01    316.9±0.83µs        ? ?/sec
sort string[10] nulls to indices 2^12                   1.00     43.1±0.55µs        ? ?/sec    1.01     43.5±0.36µs        ? ?/sec
sort string[10] to indices 2^12                         1.00     87.1±0.97µs        ? ?/sec    1.02     88.9±0.72µs        ? ?/sec
sort string_view[0-400] nulls to indices 2^12           1.04     57.4±0.39µs        ? ?/sec    1.00     55.5±0.50µs        ? ?/sec
sort string_view[0-400] to indices 2^12                 1.02    122.1±1.15µs        ? ?/sec    1.00    119.4±0.88µs        ? ?/sec
sort string_view[10] nulls to indices 2^12              1.00     44.6±0.82µs        ? ?/sec    1.02     45.3±1.92µs        ? ?/sec
sort string_view[10] to indices 2^12                    1.00    105.5±2.12µs        ? ?/sec    1.01    107.1±4.30µs        ? ?/sec
sort string_view_inlined[0-12] nulls to indices 2^12    1.00     42.9±0.30µs        ? ?/sec    1.00     42.9±0.26µs        ? ?/sec
sort string_view_inlined[0-12] to indices 2^12          1.03     96.1±0.36µs        ? ?/sec    1.00     93.1±0.61µs        ? ?/sec

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing buffer-builder-hot-path-opts (2e70130) to 7dbe58a diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench take_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=buffer-builder-hot-path-opts
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                     buffer-builder-hot-path-opts           main
-----                                                                     ----------------------------           ----
take bool 1024                                                            1.00   1329.2±1.76ns        ? ?/sec    1.01  1337.6±16.87ns        ? ?/sec
take bool 512                                                             1.00    729.1±6.37ns        ? ?/sec    1.01   737.5±16.48ns        ? ?/sec
take bool null indices 1024                                               1.17   1465.4±5.26ns        ? ?/sec    1.00  1256.5±32.78ns        ? ?/sec
take bool null values 1024                                                1.00      2.6±0.00µs        ? ?/sec    1.01      2.6±0.05µs        ? ?/sec
take bool null values null indices 1024                                   1.00      2.9±0.03µs        ? ?/sec    1.02      2.9±0.03µs        ? ?/sec
take check bounds i32 1024                                                1.00    850.7±7.04ns        ? ?/sec    1.01   855.1±26.84ns        ? ?/sec
take check bounds i32 512                                                 1.00   464.1±16.45ns        ? ?/sec    1.28    592.5±5.59ns        ? ?/sec
take i32 1024                                                             1.00    588.4±1.09ns        ? ?/sec    1.23   726.3±13.20ns        ? ?/sec
take i32 512                                                              1.00    384.9±2.06ns        ? ?/sec    1.16    445.9±4.34ns        ? ?/sec
take i32 null indices 1024                                                1.00    991.8±1.76ns        ? ?/sec    1.00    994.0±1.96ns        ? ?/sec
take i32 null values 1024                                                 1.04      2.1±0.00µs        ? ?/sec    1.00      2.0±0.02µs        ? ?/sec
take i32 null values null indices 1024                                    1.19      2.6±0.03µs        ? ?/sec    1.00      2.2±0.03µs        ? ?/sec
take primitive fsb value len: 12, indices: 1024                           1.07      3.7±0.11µs        ? ?/sec    1.00      3.5±0.04µs        ? ?/sec
take primitive fsb value len: 12, null values, indices: 1024              1.05      5.0±0.04µs        ? ?/sec    1.00      4.8±0.03µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.04     21.8±0.44µs        ? ?/sec    1.00     21.0±0.68µs        ? ?/sec
take str 1024                                                             1.00     11.1±0.08µs        ? ?/sec    1.01     11.2±0.21µs        ? ?/sec
take str 512                                                              1.00      5.3±0.03µs        ? ?/sec    1.02      5.4±0.03µs        ? ?/sec
take str null indices 1024                                                1.00      6.8±0.04µs        ? ?/sec    1.15      7.8±0.14µs        ? ?/sec
take str null indices 512                                                 1.00      3.3±0.05µs        ? ?/sec    1.13      3.8±0.02µs        ? ?/sec
take str null values 1024                                                 1.00      8.7±0.23µs        ? ?/sec    1.02      8.8±0.11µs        ? ?/sec
take str null values null indices 1024                                    1.00      6.4±0.02µs        ? ?/sec    1.09      7.0±0.09µs        ? ?/sec
take stringview 1024                                                      1.00    894.1±3.97ns        ? ?/sec    1.00   893.7±14.80ns        ? ?/sec
take stringview 512                                                       1.00    586.1±1.12ns        ? ?/sec    1.01    592.2±5.88ns        ? ?/sec
take stringview null indices 1024                                         1.00  1430.9±21.22ns        ? ?/sec    1.01  1440.0±14.81ns        ? ?/sec
take stringview null indices 512                                          1.00    711.9±3.14ns        ? ?/sec    1.04    738.6±3.18ns        ? ?/sec
take stringview null values 1024                                          1.00      2.1±0.04µs        ? ?/sec    1.01      2.1±0.17µs        ? ?/sec
take stringview null values null indices 1024                             1.12      2.8±0.11µs        ? ?/sec    1.00      2.5±0.04µs        ? ?/sec

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing buffer-builder-hot-path-opts (2e70130) to 7dbe58a diff
BENCH_NAME=zip_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench zip_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=buffer-builder-hot-path-opts
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                                   buffer-builder-hot-path-opts           main
-----                                                                                   ----------------------------           ----
zip_8192_from_i32/array_vs_array/10pct_true                                             1.03     36.1±0.13µs        ? ?/sec    1.00     35.1±0.11µs        ? ?/sec
zip_8192_from_i32/array_vs_array/1pct_true                                              1.02      5.2±0.02µs        ? ?/sec    1.00      5.1±0.01µs        ? ?/sec
zip_8192_from_i32/array_vs_array/50pct_nulls                                            1.01     76.1±0.20µs        ? ?/sec    1.00     75.5±0.79µs        ? ?/sec
zip_8192_from_i32/array_vs_array/50pct_true                                             1.01    103.5±2.09µs        ? ?/sec    1.00    102.7±1.13µs        ? ?/sec
zip_8192_from_i32/array_vs_array/90pct_true                                             1.02     37.2±0.51µs        ? ?/sec    1.00     36.4±0.33µs        ? ?/sec
zip_8192_from_i32/array_vs_array/99pct_true                                             1.00      6.0±0.06µs        ? ?/sec    1.02      6.1±0.04µs        ? ?/sec
zip_8192_from_i32/array_vs_array/all_false                                              1.00      2.6±0.05µs        ? ?/sec    1.03      2.7±0.07µs        ? ?/sec
zip_8192_from_i32/array_vs_array/all_true                                               1.00      2.4±0.04µs        ? ?/sec    1.06      2.6±0.12µs        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/10pct_true                                   1.01     33.2±0.37ns        ? ?/sec    1.00     33.1±0.17ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/1pct_true                                    1.00     32.4±0.41ns        ? ?/sec    1.02     33.0±0.16ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/50pct_nulls                                  1.00     33.1±0.26ns        ? ?/sec    1.00     33.2±0.39ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/50pct_true                                   1.00     32.3±0.16ns        ? ?/sec    1.03     33.1±0.21ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/90pct_true                                   1.00     32.3±0.14ns        ? ?/sec    1.02     33.1±0.17ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/99pct_true                                   1.00     32.2±0.14ns        ? ?/sec    1.03     33.1±0.18ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/all_false                                    1.00     33.1±0.59ns        ? ?/sec    1.00     33.1±0.17ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/all_true                                     1.00     32.3±0.45ns        ? ?/sec    1.05     33.8±0.48ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/10pct_true                                   1.01     30.4±0.44ns        ? ?/sec    1.00     30.2±0.20ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/1pct_true                                    1.01     30.6±0.31ns        ? ?/sec    1.00     30.1±0.17ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/50pct_nulls                                  1.00     30.3±0.27ns        ? ?/sec    1.01     30.5±1.97ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/50pct_true                                   1.00     30.2±0.21ns        ? ?/sec    1.01     30.5±0.23ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/90pct_true                                   1.00     30.2±0.27ns        ? ?/sec    1.00     30.2±0.16ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/99pct_true                                   1.00     30.2±0.15ns        ? ?/sec    1.01     30.5±0.49ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/all_false                                    1.00     30.2±0.15ns        ? ?/sec    1.00     30.2±0.29ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/all_true                                     1.00     30.2±0.16ns        ? ?/sec    1.01     30.6±0.67ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/10pct_true                             1.00  1177.4±11.10ns        ? ?/sec    1.00   1173.5±7.66ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/1pct_true                              1.00  1168.6±10.49ns        ? ?/sec    1.00  1173.2±16.84ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/50pct_nulls                            1.00  1291.1±11.01ns        ? ?/sec    1.01   1298.3±9.53ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/50pct_true                             1.04  1173.1±11.72ns        ? ?/sec    1.00   1126.2±9.01ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/90pct_true                             1.00  1127.8±33.25ns        ? ?/sec    1.07   1204.8±5.43ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/99pct_true                             1.01  1174.5±42.03ns        ? ?/sec    1.00  1167.6±13.43ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/all_false                              1.00   1178.2±4.69ns        ? ?/sec    1.03   1214.1±9.89ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/all_true                               1.05   1167.4±5.24ns        ? ?/sec    1.00  1115.8±17.70ns        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/10pct_true                                          1.00      9.1±0.09µs        ? ?/sec    1.00      9.2±0.05µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/1pct_true                                           1.00      9.2±0.11µs        ? ?/sec    1.00      9.2±0.04µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/50pct_nulls                                         1.00      9.2±0.07µs        ? ?/sec    1.00      9.3±0.03µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/50pct_true                                          1.00      9.1±0.07µs        ? ?/sec    1.00      9.1±0.06µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/90pct_true                                          1.00      9.2±0.02µs        ? ?/sec    1.00      9.2±0.13µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/99pct_true                                          1.00      9.1±0.09µs        ? ?/sec    1.00      9.1±0.02µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/all_false                                           1.00      9.2±0.21µs        ? ?/sec    1.00      9.1±0.04µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/all_true                                            1.00      9.0±0.02µs        ? ?/sec    1.00      9.0±0.02µs        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/10pct_true                                    1.07  1348.6±11.80ns        ? ?/sec    1.00   1264.4±6.93ns        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/1pct_true                                     1.05   1263.2±9.88ns        ? ?/sec    1.00  1204.3±17.84ns        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/50pct_nulls                                   1.03   1378.1±7.81ns        ? ?/sec    1.00  1335.9±16.46ns        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/50pct_true                                    1.05  1237.0±17.94ns        ? ?/sec    1.00   1182.0±7.43ns        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/90pct_true                                    1.09  1358.2±12.22ns        ? ?/sec    1.00  1245.1±13.34ns        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/99pct_true                                    1.03  1321.0±14.78ns        ? ?/sec    1.00  1286.6±22.31ns        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/all_false                                     1.01  1332.2±11.20ns        ? ?/sec    1.00   1321.1±7.41ns        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/all_true                                      1.06  1252.0±16.45ns        ? ?/sec    1.00  1180.6±10.69ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/10pct_true                           1.00    323.2±3.75µs        ? ?/sec    1.03    331.9±8.15µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/1pct_true                            1.01    293.2±4.48µs        ? ?/sec    1.00    290.8±6.52µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/50pct_nulls                          1.01   396.1±14.12µs        ? ?/sec    1.00   392.9±13.22µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/50pct_true                           1.00    414.3±8.66µs        ? ?/sec    1.04   430.1±13.51µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/90pct_true                           1.00    343.9±9.55µs        ? ?/sec    1.00    343.7±8.68µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/99pct_true                           1.01    285.0±7.58µs        ? ?/sec    1.00    281.9±6.36µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/all_false                            1.00    114.1±4.56µs        ? ?/sec    1.00    114.3±3.37µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/all_true                             1.00    120.0±4.18µs        ? ?/sec    1.00    120.1±6.12µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/10pct_true                 1.00     33.0±0.98ns        ? ?/sec    1.01     33.3±0.32ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/1pct_true                  1.00     32.5±0.48ns        ? ?/sec    1.03     33.3±0.33ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/50pct_nulls                1.00     32.5±0.64ns        ? ?/sec    1.06     34.3±1.76ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/50pct_true                 1.00     32.7±0.31ns        ? ?/sec    1.04     33.9±0.19ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/90pct_true                 1.00     32.5±0.28ns        ? ?/sec    1.04     33.9±1.34ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/99pct_true                 1.00     32.3±0.15ns        ? ?/sec    1.05     33.9±0.17ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/all_false                  1.00     32.4±0.34ns        ? ?/sec    1.03     33.3±0.33ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/all_true                   1.00     32.3±0.16ns        ? ?/sec    1.03     33.3±0.32ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/10pct_true                 1.00     30.7±0.25ns        ? ?/sec    1.01     31.1±0.55ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/1pct_true                  1.00     30.5±0.33ns        ? ?/sec    1.02     31.3±0.57ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/50pct_nulls                1.00     30.6±0.86ns        ? ?/sec    1.02     31.3±0.66ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/50pct_true                 1.00     30.7±0.51ns        ? ?/sec    1.00     30.8±0.53ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/90pct_true                 1.00     30.5±0.40ns        ? ?/sec    1.02     31.1±0.61ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/99pct_true                 1.00     30.5±0.29ns        ? ?/sec    1.03     31.3±0.58ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/all_false                  1.00     30.6±0.32ns        ? ?/sec    1.01     30.8±0.55ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/all_true                   1.00     30.5±0.43ns        ? ?/sec    1.01     30.9±0.59ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/10pct_true           1.03     20.6±0.19µs        ? ?/sec    1.00     20.0±0.48µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/1pct_true            1.00     10.9±0.11µs        ? ?/sec    1.00     10.9±0.12µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/50pct_nulls          1.03     39.5±0.64µs        ? ?/sec    1.00     38.5±0.74µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/50pct_true           1.02     67.1±0.74µs        ? ?/sec    1.00     66.1±1.19µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/90pct_true           1.01     77.2±0.85µs        ? ?/sec    1.00     76.7±0.60µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/99pct_true           1.02     84.8±0.89µs        ? ?/sec    1.00     83.4±0.47µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/all_false            1.00   955.8±49.32ns        ? ?/sec    1.03   982.8±76.39ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/all_true             1.01     85.9±1.20µs        ? ?/sec    1.00     84.9±0.52µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/10pct_true                        1.00     75.9±1.20µs        ? ?/sec    1.01     76.3±0.85µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/1pct_true                         1.00     55.2±0.70µs        ? ?/sec    1.05     58.1±0.72µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/50pct_nulls                       1.00     91.4±1.05µs        ? ?/sec    1.00     91.2±1.26µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/50pct_true                        1.00    106.9±1.18µs        ? ?/sec    1.00    107.1±1.18µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/90pct_true                        1.00    101.2±2.72µs        ? ?/sec    1.00    101.4±2.17µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/99pct_true                        1.00     87.6±1.78µs        ? ?/sec    1.05     92.3±0.61µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/all_false                         1.00     44.1±0.86µs        ? ?/sec    1.00     43.9±0.80µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/all_true                          1.00     76.9±1.34µs        ? ?/sec    1.00     76.7±1.05µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/10pct_true                  1.01     78.3±1.10µs        ? ?/sec    1.00     77.4±0.58µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/1pct_true                   1.02     86.0±1.04µs        ? ?/sec    1.00     84.0±0.69µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/50pct_nulls                 1.00     74.5±1.22µs        ? ?/sec    1.00     74.6±1.76µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/50pct_true                  1.01     66.2±1.14µs        ? ?/sec    1.00     65.3±0.99µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/90pct_true                  1.01     20.8±0.52µs        ? ?/sec    1.00     20.6±0.50µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/99pct_true                  1.00     11.3±0.06µs        ? ?/sec    1.01     11.3±0.03µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/all_false                   1.02     86.7±1.17µs        ? ?/sec    1.00     84.8±0.97µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/all_true                    1.00  1120.7±36.84ns        ? ?/sec    1.05  1178.9±42.29ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/10pct_true                         1.00   332.3±10.76µs        ? ?/sec    1.00   333.8±16.99µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/1pct_true                          1.06    303.3±7.18µs        ? ?/sec    1.00    286.2±5.49µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/50pct_nulls                        1.02   394.0±12.46µs        ? ?/sec    1.00    386.7±8.61µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/50pct_true                         1.00    421.5±7.22µs        ? ?/sec    1.03   432.3±18.66µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/90pct_true                         1.00    325.9±3.04µs        ? ?/sec    1.03   334.5±10.89µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/99pct_true                         1.02    264.6±4.61µs        ? ?/sec    1.00    260.2±5.92µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/all_false                          1.00    115.9±6.00µs        ? ?/sec    1.01    117.1±5.03µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/all_true                           1.00    114.1±3.76µs        ? ?/sec    1.01    115.2±3.83µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/10pct_true               1.00     32.5±0.82ns        ? ?/sec    1.02     33.4±0.45ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/1pct_true                1.00     32.4±0.32ns        ? ?/sec    1.03     33.3±0.34ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/50pct_nulls              1.00     32.3±0.14ns        ? ?/sec    1.03     33.3±0.41ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/50pct_true               1.00     32.4±0.33ns        ? ?/sec    1.03     33.4±0.40ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/90pct_true               1.00     32.4±0.14ns        ? ?/sec    1.03     33.4±0.43ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/99pct_true               1.00     32.4±0.31ns        ? ?/sec    1.03     33.4±0.69ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/all_false                1.00     32.4±0.49ns        ? ?/sec    1.03     33.3±0.33ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/all_true                 1.00     32.3±0.13ns        ? ?/sec    1.03     33.3±0.32ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/10pct_true               1.00     30.5±0.42ns        ? ?/sec    1.01     30.9±0.52ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/1pct_true                1.00     30.5±0.32ns        ? ?/sec    1.01     30.9±0.55ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/50pct_nulls              1.00     30.5±0.30ns        ? ?/sec    1.01     30.9±0.55ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/50pct_true               1.00     30.6±0.51ns        ? ?/sec    1.02     31.1±1.91ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/90pct_true               1.00     30.8±1.31ns        ? ?/sec    1.00     30.9±0.61ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/99pct_true               1.00     30.5±0.29ns        ? ?/sec    1.01     30.8±0.52ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/all_false                1.00     30.5±0.33ns        ? ?/sec    1.01     30.9±0.79ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/all_true                 1.00     30.5±0.30ns        ? ?/sec    1.01     30.9±0.63ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/10pct_true         1.03     20.4±0.95µs        ? ?/sec    1.00     19.9±0.54µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/1pct_true          1.01     11.0±0.20µs        ? ?/sec    1.00     10.8±0.05µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/50pct_nulls        1.02     38.0±1.12µs        ? ?/sec    1.00     37.3±1.21µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/50pct_true         1.03     66.9±1.17µs        ? ?/sec    1.00     64.9±1.20µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/90pct_true         1.01     78.4±1.07µs        ? ?/sec    1.00     77.6±1.31µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/99pct_true         1.00     85.4±0.61µs        ? ?/sec    1.01     86.5±1.41µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/all_false          1.00   925.2±42.51ns        ? ?/sec    1.00   929.7±62.63ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/all_true           1.00     86.1±0.71µs        ? ?/sec    1.02     87.5±2.43µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/10pct_true                      1.02    126.3±3.39µs        ? ?/sec    1.00    123.7±1.67µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/1pct_true                       1.00    144.5±1.95µs        ? ?/sec    1.00    143.9±1.26µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/50pct_nulls                     1.00    125.7±1.64µs        ? ?/sec    1.00    125.3±1.87µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/50pct_true                      1.00    126.2±3.92µs        ? ?/sec    1.00    125.6±1.60µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/90pct_true                      1.04    108.3±2.09µs        ? ?/sec    1.00    104.0±1.83µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/99pct_true                      1.02     91.0±1.48µs        ? ?/sec    1.00     89.2±1.43µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/all_false                       1.01    136.5±2.20µs        ? ?/sec    1.00    135.0±1.46µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/all_true                        1.00     76.3±0.65µs        ? ?/sec    1.01     77.2±0.73µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/10pct_true                1.00     77.4±1.26µs        ? ?/sec    1.02     78.7±1.32µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/1pct_true                 1.00     84.8±0.84µs        ? ?/sec    1.03     87.7±2.40µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/50pct_nulls               1.00     75.2±1.03µs        ? ?/sec    1.01     75.9±2.00µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/50pct_true                1.01     66.7±1.09µs        ? ?/sec    1.00     66.3±1.50µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/90pct_true                1.00     20.6±0.50µs        ? ?/sec    1.04     21.5±0.44µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/99pct_true                1.00     11.4±0.11µs        ? ?/sec    1.00     11.3±0.11µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/all_false                 1.00     86.6±0.69µs        ? ?/sec    1.02     88.2±2.02µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/all_true                  1.00  1133.9±24.52ns        ? ?/sec    1.08  1222.0±46.57ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/10pct_true                             1.00     62.4±0.40µs        ? ?/sec    1.01     63.1±0.65µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/1pct_true                              1.00     20.6±0.17µs        ? ?/sec    1.09     22.4±0.11µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/50pct_nulls                            1.01    123.2±1.17µs        ? ?/sec    1.00    121.9±0.74µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/50pct_true                             1.01    162.6±0.33µs        ? ?/sec    1.00    161.2±0.75µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/90pct_true                             1.00     64.8±0.26µs        ? ?/sec    1.00     64.6±0.22µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/99pct_true                             1.00     21.7±0.19µs        ? ?/sec    1.07     23.2±0.15µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/all_false                              1.00     15.8±0.12µs        ? ?/sec    1.14     18.0±0.08µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/all_true                               1.00     15.5±0.16µs        ? ?/sec    1.15     17.8±0.17µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/10pct_true                   1.00     33.2±1.29ns        ? ?/sec    1.01     33.4±0.58ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/1pct_true                    1.00     32.4±0.74ns        ? ?/sec    1.03     33.4±0.58ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/50pct_nulls                  1.00     32.4±0.34ns        ? ?/sec    1.03     33.5±0.62ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/50pct_true                   1.00     32.6±0.80ns        ? ?/sec    1.03     33.4±0.57ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/90pct_true                   1.00     33.1±1.23ns        ? ?/sec    1.01     33.5±0.87ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/99pct_true                   1.00     33.3±0.16ns        ? ?/sec    1.01     33.7±0.76ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/all_false                    1.00     33.4±0.33ns        ? ?/sec    1.01     33.5±0.77ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/all_true                     1.00     32.4±0.17ns        ? ?/sec    1.03     33.4±0.57ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/10pct_true                   1.00     31.0±2.05ns        ? ?/sec    1.00     30.9±0.55ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/1pct_true                    1.00     30.5±0.30ns        ? ?/sec    1.01     30.8±0.54ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/50pct_nulls                  1.00     30.6±0.46ns        ? ?/sec    1.02     31.1±0.54ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/50pct_true                   1.00     30.7±0.30ns        ? ?/sec    1.01     31.1±0.54ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/90pct_true                   1.00     30.5±0.32ns        ? ?/sec    1.01     30.8±0.54ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/99pct_true                   1.00     30.7±0.22ns        ? ?/sec    1.01     30.9±0.55ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/all_false                    1.00     30.5±0.30ns        ? ?/sec    1.02     31.0±0.61ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/all_true                     1.00     30.7±0.22ns        ? ?/sec    1.01     31.0±0.69ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/10pct_true             1.02     15.8±0.12µs        ? ?/sec    1.00     15.5±0.23µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/1pct_true              1.01     10.5±0.09µs        ? ?/sec    1.00     10.4±0.07µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/50pct_nulls            1.04     26.3±0.27µs        ? ?/sec    1.00     25.2±0.22µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/50pct_true             1.01     38.5±0.45µs        ? ?/sec    1.00     38.1±0.53µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/90pct_true             1.00     18.8±0.15µs        ? ?/sec    1.00     18.8±0.12µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/99pct_true             1.03     13.8±1.45µs        ? ?/sec    1.00     13.4±0.12µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/all_false              1.00   900.0±53.70ns        ? ?/sec    1.00   901.7±37.35ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/all_true               1.00     12.8±0.08µs        ? ?/sec    1.00     12.8±0.08µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/10pct_true                          1.03     33.9±0.41µs        ? ?/sec    1.00     33.0±0.30µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/1pct_true                           1.00     15.0±0.16µs        ? ?/sec    1.00     15.0±0.19µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/50pct_nulls                         1.00     54.5±0.18µs        ? ?/sec    1.00     54.5±0.67µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/50pct_true                          1.03     72.6±0.50µs        ? ?/sec    1.00     70.3±0.35µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/90pct_true                          1.05     33.9±0.52µs        ? ?/sec    1.00     32.2±1.70µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/99pct_true                          1.02     16.2±0.29µs        ? ?/sec    1.00     15.9±0.11µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/all_false                           1.02      2.6±0.03µs        ? ?/sec    1.00      2.6±0.05µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/all_true                            1.01      3.1±0.06µs        ? ?/sec    1.00      3.1±0.04µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/10pct_true                    1.00     18.4±0.10µs        ? ?/sec    1.01     18.6±0.41µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/1pct_true                     1.01     13.4±0.26µs        ? ?/sec    1.00     13.3±0.13µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/50pct_nulls                   1.00     27.5±0.33µs        ? ?/sec    1.00     27.4±0.19µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/50pct_true                    1.01     38.5±0.24µs        ? ?/sec    1.00     38.1±0.26µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/90pct_true                    1.02     16.2±0.04µs        ? ?/sec    1.00     16.0±0.62µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/99pct_true                    1.00     10.8±0.04µs        ? ?/sec    1.00     10.8±0.34µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/all_false                     1.00     12.8±0.09µs        ? ?/sec    1.00     12.9±0.11µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/all_true                      1.13  1167.2±53.41ns        ? ?/sec    1.00  1037.5±42.34ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/10pct_true                           1.02     63.2±1.55µs        ? ?/sec    1.00     61.7±0.80µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/1pct_true                            1.00     20.6±0.22µs        ? ?/sec    1.08     22.2±0.17µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/50pct_nulls                          1.03    124.6±0.64µs        ? ?/sec    1.00    121.2±2.09µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/50pct_true                           1.01    163.6±1.62µs        ? ?/sec    1.00    162.0±2.19µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/90pct_true                           1.00     64.6±0.28µs        ? ?/sec    1.00     64.7±3.20µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/99pct_true                           1.00     21.4±0.17µs        ? ?/sec    1.08     23.1±0.16µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/all_false                            1.00     15.6±0.19µs        ? ?/sec    1.15     18.0±0.08µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/all_true                             1.00     15.7±0.19µs        ? ?/sec    1.14     17.8±0.17µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/10pct_true                 1.00     32.4±0.49ns        ? ?/sec    1.03     33.3±0.34ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/1pct_true                  1.00     32.5±0.35ns        ? ?/sec    1.03     33.3±0.36ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/50pct_nulls                1.00     32.4±0.21ns        ? ?/sec    1.03     33.3±0.32ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/50pct_true                 1.00     32.3±0.14ns        ? ?/sec    1.03     33.4±0.72ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/90pct_true                 1.00     33.3±0.15ns        ? ?/sec    1.03     34.3±0.64ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/99pct_true                 1.00     32.4±0.33ns        ? ?/sec    1.03     33.4±0.46ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/all_false                  1.00     33.4±1.78ns        ? ?/sec    1.02     33.9±0.16ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/all_true                   1.00     32.5±0.53ns        ? ?/sec    1.05     34.0±0.52ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/10pct_true                 1.00     30.8±0.56ns        ? ?/sec    1.01     31.1±0.61ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/1pct_true                  1.00     30.6±0.48ns        ? ?/sec    1.01     30.8±0.53ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/50pct_nulls                1.00     30.5±0.31ns        ? ?/sec    1.01     30.8±0.54ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/50pct_true                 1.00     30.5±0.30ns        ? ?/sec    1.01     30.9±0.79ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/90pct_true                 1.00     30.7±0.55ns        ? ?/sec    1.01     31.1±0.56ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/99pct_true                 1.00     30.7±0.39ns        ? ?/sec    1.00     30.8±0.51ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/all_false                  1.00     30.6±0.28ns        ? ?/sec    1.01     31.0±0.54ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/all_true                   1.00     30.5±0.31ns        ? ?/sec    1.02     31.0±0.54ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/10pct_true           1.02     15.8±0.03µs        ? ?/sec    1.00     15.5±0.03µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/1pct_true            1.01     10.4±0.01µs        ? ?/sec    1.00     10.4±0.04µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/50pct_nulls          1.04     26.3±0.28µs        ? ?/sec    1.00     25.3±0.22µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/50pct_true           1.01     38.4±0.21µs        ? ?/sec    1.00     38.0±0.10µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/90pct_true           1.00     18.6±0.17µs        ? ?/sec    1.00     18.7±0.12µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/99pct_true           1.01     13.4±0.30µs        ? ?/sec    1.00     13.3±0.10µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/all_false            1.08   957.0±36.23ns        ? ?/sec    1.00   889.0±39.95ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/all_true             1.00     12.7±0.09µs        ? ?/sec    1.00     12.6±0.07µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/10pct_true                        1.00     33.4±0.08µs        ? ?/sec    1.01     33.7±0.10µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/1pct_true                         1.01     15.4±0.02µs        ? ?/sec    1.00     15.3±0.03µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/50pct_nulls                       1.00     54.6±0.32µs        ? ?/sec    1.00     54.7±3.59µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/50pct_true                        1.00     70.0±0.23µs        ? ?/sec    1.00     70.1±0.14µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/90pct_true                        1.02     32.4±0.09µs        ? ?/sec    1.00     31.9±0.07µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/99pct_true                        1.01     16.0±0.12µs        ? ?/sec    1.00     15.9±0.03µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/all_false                         1.01      2.9±0.14µs        ? ?/sec    1.00      2.9±0.07µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/all_true                          1.01      3.1±0.08µs        ? ?/sec    1.00      3.1±0.04µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/10pct_true                  1.01     18.6±0.39µs        ? ?/sec    1.00     18.5±0.10µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/1pct_true                   1.00     13.3±0.12µs        ? ?/sec    1.00     13.2±0.11µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/50pct_nulls                 1.00     27.3±0.37µs        ? ?/sec    1.01     27.6±0.25µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/50pct_true                  1.02     38.7±0.18µs        ? ?/sec    1.00     38.1±0.12µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/90pct_true                  1.02     16.2±0.02µs        ? ?/sec    1.00     15.8±0.05µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/99pct_true                  1.01     10.7±0.03µs        ? ?/sec    1.00     10.6±0.02µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/all_false                   1.00     12.8±0.08µs        ? ?/sec    1.00     12.7±0.15µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/all_true                    1.00  1088.3±45.55ns        ? ?/sec    1.02  1109.2±30.83ns        ? ?/sec
zip_8192_from_string_views size (10..100)/array_vs_array/10pct_true                     1.01     53.3±0.31µs        ? ?/sec    1.00     52.7±0.42µs        ? ?/sec
zip_8192_from_string_views size (10..100)/array_vs_array/1pct_true                      1.00     16.6±0.18µs        ? ?/sec    1.16     19.3±0.23µs        ? ?/sec
zip_8192_from_string_views size (10..100)/array_vs_array/50pct_nulls                    1.00     97.6±0.41µs        ? ?/sec    1.01     98.5±0.29µs        ? ?/sec
zip_8192_from_string_views size (10..100)/array_vs_array/50pct_true                     1.01    128.8±2.25µs        ? ?/sec    1.00    127.8±0.62µs        ? ?/sec
zip_8192_from_string_views size (10..100)/array_vs_array/90pct_true                     1.00     53.8±0.23µs        ? ?/sec    1.02     54.7±3.51µs        ? ?/sec
zip_8192_from_string_views size (10..100)/array_vs_array/99pct_true                     1.00     17.7±0.50µs        ? ?/sec    1.14     20.2±1.01µs        ? ?/sec
zip_8192_from_string_views size (10..100)/array_vs_array/all_false                      1.00     13.3±0.06µs        ? ?/sec    1.23     16.3±0.43µs        ? ?/sec
zip_8192_from_string_views size (10..100)/array_vs_array/all_true                       1.00     13.4±0.67µs        ? ?/sec    1.21     16.3±0.61µs        ? ?/sec
zip_8192_from_string_views size (10..100)/array_vs_non_null_scalar/10pct_true           1.04     34.3±1.33ns        ? ?/sec    1.00     33.1±0.18ns        ? ?/sec
zip_8192_from_string_views size (10..100)/array_vs_non_null_scalar/1pct_true            1.02     33.9±1.46ns        ? ?/sec    1.00     33.1±0.16ns        ? ?/sec
zip_8192_from_string_views size (10..100)/array_vs_non_null_scalar/50pct_nulls          1.02     33.9±1.43ns        ? ?/sec    1.00     33.2±0.32ns        ? ?/sec
zip_8192_from_string_views size (10..100)/array_vs_non_null_scalar/50pct_true           1.02     33.9±1.45ns        ? ?/sec    1.00     33.2±0.21ns        ? ?/sec
zip_8192_from_string_views size (10..100)/array_vs_non_null_scalar/90pct_true           1.02     33.9±1.44ns        ? ?/sec    1.00     33.3±0.70ns        ? ?/sec
zip_8192_from_string_views size (10..100)/array_vs_non_null_scalar/99pct_true           1.04     34.4±1.34ns        ? ?/sec    1.00     33.2±0.22ns        ? ?/sec
zip_8192_from_string_views size (10..100)/array_vs_non_null_scalar/all_false            1.04     34.4±1.34ns        ? ?/sec    1.00     33.2±0.16ns        ? ?/sec
zip_8192_from_string_views size (10..100)/array_vs_non_null_scalar/all_true             1.02     33.9±1.51ns        ? ?/sec    1.00     33.2±0.22ns        ? ?/sec
zip_8192_from_string_views size (10..100)/non_null_scalar_vs_array/10pct_true           1.01     30.9±0.51ns        ? ?/sec    1.00     30.6±1.05ns        ? ?/sec
zip_8192_from_string_views size (10..100)/non_null_scalar_vs_array/1pct_true            1.01     30.8±0.49ns        ? ?/sec    1.00     30.5±0.19ns        ? ?/sec
zip_8192_from_string_views size (10..100)/non_null_scalar_vs_array/50pct_nulls          1.01     30.8±0.50ns        ? ?/sec    1.00     30.5±0.18ns        ? ?/sec
zip_8192_from_string_views size (10..100)/non_null_scalar_vs_array/50pct_true           1.01     30.9±0.56ns        ? ?/sec    1.00     30.5±0.47ns        ? ?/sec
zip_8192_from_string_views size (10..100)/non_null_scalar_vs_array/90pct_true           1.03     31.4±0.76ns        ? ?/sec    1.00     30.5±0.20ns        ? ?/sec
zip_8192_from_string_views size (10..100)/non_null_scalar_vs_array/99pct_true           1.01     30.9±0.71ns        ? ?/sec    1.00     30.6±0.93ns        ? ?/sec
zip_8192_from_string_views size (10..100)/non_null_scalar_vs_array/all_false            1.01     30.8±0.49ns        ? ?/sec    1.00     30.4±0.17ns        ? ?/sec
zip_8192_from_string_views size (10..100)/non_null_scalar_vs_array/all_true             1.02     31.1±0.66ns        ? ?/sec    1.00     30.5±1.01ns        ? ?/sec
zip_8192_from_string_views size (10..100)/non_null_scalar_vs_null_scalar/10pct_true     1.00      5.4±0.03µs        ? ?/sec    1.00      5.4±0.02µs        ? ?/sec
zip_8192_from_string_views size (10..100)/non_null_scalar_vs_null_scalar/1pct_true      1.00      5.4±0.08µs        ? ?/sec    1.00      5.4±0.02µs        ? ?/sec
zip_8192_from_string_views size (10..100)/non_null_scalar_vs_null_scalar/50pct_nulls    1.00      5.5±0.04µs        ? ?/sec    1.00      5.5±0.09µs        ? ?/sec
zip_8192_from_string_views size (10..100)/non_null_scalar_vs_null_scalar/50pct_true     1.00      5.4±0.07µs        ? ?/sec    1.00      5.4±0.07µs        ? ?/sec
zip_8192_from_string_views size (10..100)/non_null_scalar_vs_null_scalar/90pct_true     1.00      5.4±0.10µs        ? ?/sec    1.00      5.4±0.01µs        ? ?/sec
zip_8192_from_string_views size (10..100)/non_null_scalar_vs_null_scalar/99pct_true     1.00      5.4±0.08µs        ? ?/sec    1.00      5.4±0.07µs        ? ?/sec
zip_8192_from_string_views size (10..100)/non_null_scalar_vs_null_scalar/all_false      1.15      2.7±0.40µs        ? ?/sec    1.00      2.3±0.27µs        ? ?/sec
zip_8192_from_string_views size (10..100)/non_null_scalar_vs_null_scalar/all_true       1.00      5.5±0.04µs        ? ?/sec    1.00      5.5±0.01µs        ? ?/sec
zip_8192_from_string_views size (10..100)/non_nulls_scalars/10pct_true                  1.02     19.4±0.35µs        ? ?/sec    1.00     19.0±0.08µs        ? ?/sec
zip_8192_from_string_views size (10..100)/non_nulls_scalars/1pct_true                   1.00      5.0±0.06µs        ? ?/sec    1.00      5.0±0.08µs        ? ?/sec
zip_8192_from_string_views size (10..100)/non_nulls_scalars/50pct_nulls                 1.00     39.2±0.14µs        ? ?/sec    1.01     39.5±0.43µs        ? ?/sec
zip_8192_from_string_views size (10..100)/non_nulls_scalars/50pct_true                  1.00     57.0±0.46µs        ? ?/sec    1.01     57.3±0.28µs        ? ?/sec
zip_8192_from_string_views size (10..100)/non_nulls_scalars/90pct_true                  1.00     19.8±0.07µs        ? ?/sec    1.01     19.9±0.50µs        ? ?/sec
zip_8192_from_string_views size (10..100)/non_nulls_scalars/99pct_true                  1.00      5.3±0.08µs        ? ?/sec    1.01      5.4±0.22µs        ? ?/sec
zip_8192_from_string_views size (10..100)/non_nulls_scalars/all_false                   1.00      5.3±0.02µs        ? ?/sec    1.00      5.3±0.07µs        ? ?/sec
zip_8192_from_string_views size (10..100)/non_nulls_scalars/all_true                    1.00      5.3±0.04µs        ? ?/sec    1.00      5.3±0.02µs        ? ?/sec
zip_8192_from_string_views size (10..100)/null_vs_non_null_scalar/10pct_true            1.00      5.5±0.03µs        ? ?/sec    1.00      5.6±0.02µs        ? ?/sec
zip_8192_from_string_views size (10..100)/null_vs_non_null_scalar/1pct_true             1.00      5.5±0.01µs        ? ?/sec    1.00      5.6±0.03µs        ? ?/sec
zip_8192_from_string_views size (10..100)/null_vs_non_null_scalar/50pct_nulls           1.00      5.6±0.09µs        ? ?/sec    1.00      5.6±0.02µs        ? ?/sec
zip_8192_from_string_views size (10..100)/null_vs_non_null_scalar/50pct_true            1.00      5.5±0.01µs        ? ?/sec    1.00      5.6±0.07µs        ? ?/sec
zip_8192_from_string_views size (10..100)/null_vs_non_null_scalar/90pct_true            1.00      5.6±0.01µs        ? ?/sec    1.00      5.5±0.01µs        ? ?/sec
zip_8192_from_string_views size (10..100)/null_vs_non_null_scalar/99pct_true            1.02      5.6±0.03µs        ? ?/sec    1.00      5.5±0.01µs        ? ?/sec
zip_8192_from_string_views size (10..100)/null_vs_non_null_scalar/all_false             1.00      5.5±0.06µs        ? ?/sec    1.00      5.5±0.01µs        ? ?/sec
zip_8192_from_string_views size (10..100)/null_vs_non_null_scalar/all_true              1.09      2.6±0.31µs        ? ?/sec    1.00      2.4±0.28µs        ? ?/sec
zip_8192_from_string_views size (3..10)/array_vs_array/10pct_true                       1.00     45.9±0.13µs        ? ?/sec    1.02     46.7±0.43µs        ? ?/sec
zip_8192_from_string_views size (3..10)/array_vs_array/1pct_true                        1.00     12.5±0.05µs        ? ?/sec    1.31     16.4±0.23µs        ? ?/sec
zip_8192_from_string_views size (3..10)/array_vs_array/50pct_nulls                      1.01     87.8±0.35µs        ? ?/sec    1.00     87.4±2.82µs        ? ?/sec
zip_8192_from_string_views size (3..10)/array_vs_array/50pct_true                       1.00    117.4±1.01µs        ? ?/sec    1.00    117.9±0.43µs        ? ?/sec
zip_8192_from_string_views size (3..10)/array_vs_array/90pct_true                       1.00     47.1±0.38µs        ? ?/sec    1.03     48.5±1.55µs        ? ?/sec
zip_8192_from_string_views size (3..10)/array_vs_array/99pct_true                       1.00     13.7±0.07µs        ? ?/sec    1.29     17.7±0.07µs        ? ?/sec
zip_8192_from_string_views size (3..10)/array_vs_array/all_false                        1.00      8.6±0.04µs        ? ?/sec    1.48     12.7±0.03µs        ? ?/sec
zip_8192_from_string_views size (3..10)/array_vs_array/all_true                         1.00      8.6±0.11µs        ? ?/sec    1.50     12.9±0.19µs        ? ?/sec
zip_8192_from_string_views size (3..10)/array_vs_non_null_scalar/10pct_true             1.01     33.5±1.67ns        ? ?/sec    1.00     33.2±0.32ns        ? ?/sec
zip_8192_from_string_views size (3..10)/array_vs_non_null_scalar/1pct_true              1.02     34.6±1.87ns        ? ?/sec    1.00     33.9±0.75ns        ? ?/sec
zip_8192_from_string_views size (3..10)/array_vs_non_null_scalar/50pct_nulls            1.04     34.6±1.91ns        ? ?/sec    1.00     33.1±0.16ns        ? ?/sec
zip_8192_from_string_views size (3..10)/array_vs_non_null_scalar/50pct_true             1.03     34.9±1.86ns        ? ?/sec    1.00     33.9±0.22ns        ? ?/sec
zip_8192_from_string_views size (3..10)/array_vs_non_null_scalar/90pct_true             1.00     33.5±1.69ns        ? ?/sec    1.01     33.9±0.70ns        ? ?/sec
zip_8192_from_string_views size (3..10)/array_vs_non_null_scalar/99pct_true             1.01     33.6±1.89ns        ? ?/sec    1.00     33.2±0.28ns        ? ?/sec
zip_8192_from_string_views size (3..10)/array_vs_non_null_scalar/all_false              1.01     33.5±1.67ns        ? ?/sec    1.00     33.1±0.17ns        ? ?/sec
zip_8192_from_string_views size (3..10)/array_vs_non_null_scalar/all_true               1.02     34.7±2.02ns        ? ?/sec    1.00     33.9±0.34ns        ? ?/sec
zip_8192_from_string_views size (3..10)/non_null_scalar_vs_array/10pct_true             1.02     31.4±0.79ns        ? ?/sec    1.00     30.7±0.64ns        ? ?/sec
zip_8192_from_string_views size (3..10)/non_null_scalar_vs_array/1pct_true              1.03     31.3±0.76ns        ? ?/sec    1.00     30.5±0.18ns        ? ?/sec
zip_8192_from_string_views size (3..10)/non_null_scalar_vs_array/50pct_nulls            1.00     30.8±0.58ns        ? ?/sec    1.00     30.7±0.20ns        ? ?/sec
zip_8192_from_string_views size (3..10)/non_null_scalar_vs_array/50pct_true             1.03     31.4±0.88ns        ? ?/sec    1.00     30.6±0.76ns        ? ?/sec
zip_8192_from_string_views size (3..10)/non_null_scalar_vs_array/90pct_true             1.01     30.8±0.56ns        ? ?/sec    1.00     30.5±0.19ns        ? ?/sec
zip_8192_from_string_views size (3..10)/non_null_scalar_vs_array/99pct_true             1.03     31.5±0.96ns        ? ?/sec    1.00     30.4±0.17ns        ? ?/sec
zip_8192_from_string_views size (3..10)/non_null_scalar_vs_array/all_false              1.03     31.4±0.79ns        ? ?/sec    1.00     30.4±0.16ns        ? ?/sec
zip_8192_from_string_views size (3..10)/non_null_scalar_vs_array/all_true               1.03     31.4±0.78ns        ? ?/sec    1.00     30.5±0.46ns        ? ?/sec
zip_8192_from_string_views size (3..10)/non_null_scalar_vs_null_scalar/10pct_true       1.00      5.4±0.08µs        ? ?/sec    1.00      5.4±0.01µs        ? ?/sec
zip_8192_from_string_views size (3..10)/non_null_scalar_vs_null_scalar/1pct_true        1.00      5.4±0.02µs        ? ?/sec    1.00      5.4±0.01µs        ? ?/sec
zip_8192_from_string_views size (3..10)/non_null_scalar_vs_null_scalar/50pct_nulls      1.00      5.5±0.10µs        ? ?/sec    1.00      5.5±0.02µs        ? ?/sec
zip_8192_from_string_views size (3..10)/non_null_scalar_vs_null_scalar/50pct_true       1.00      5.4±0.12µs        ? ?/sec    1.00      5.4±0.07µs        ? ?/sec
zip_8192_from_string_views size (3..10)/non_null_scalar_vs_null_scalar/90pct_true       1.00      5.4±0.07µs        ? ?/sec    1.00      5.4±0.01µs        ? ?/sec
zip_8192_from_string_views size (3..10)/non_null_scalar_vs_null_scalar/99pct_true       1.00      5.4±0.05µs        ? ?/sec    1.00      5.4±0.01µs        ? ?/sec
zip_8192_from_string_views size (3..10)/non_null_scalar_vs_null_scalar/all_false        1.03      2.5±0.40µs        ? ?/sec    1.00      2.4±0.26µs        ? ?/sec
zip_8192_from_string_views size (3..10)/non_null_scalar_vs_null_scalar/all_true         1.00      5.4±0.06µs        ? ?/sec    1.00      5.4±0.01µs        ? ?/sec
zip_8192_from_string_views size (3..10)/non_nulls_scalars/10pct_true                    1.01     19.2±0.42µs        ? ?/sec    1.00     19.0±0.10µs        ? ?/sec
zip_8192_from_string_views size (3..10)/non_nulls_scalars/1pct_true                     1.00      4.7±0.03µs        ? ?/sec    1.00      4.7±0.04µs        ? ?/sec
zip_8192_from_string_views size (3..10)/non_nulls_scalars/50pct_nulls                   1.00     39.0±0.37µs        ? ?/sec    1.01     39.3±1.06µs        ? ?/sec
zip_8192_from_string_views size (3..10)/non_nulls_scalars/50pct_true                    1.00     56.8±0.24µs        ? ?/sec    1.01     57.1±0.39µs        ? ?/sec
zip_8192_from_string_views size (3..10)/non_nulls_scalars/90pct_true                    1.01     19.7±0.13µs        ? ?/sec    1.00     19.6±0.06µs        ? ?/sec
zip_8192_from_string_views size (3..10)/non_nulls_scalars/99pct_true                    1.00      5.2±0.05µs        ? ?/sec    1.00      5.2±0.02µs        ? ?/sec
zip_8192_from_string_views size (3..10)/non_nulls_scalars/all_false                     1.00      5.3±0.01µs        ? ?/sec    1.00      5.3±0.03µs        ? ?/sec
zip_8192_from_string_views size (3..10)/non_nulls_scalars/all_true                      1.00      5.3±0.02µs        ? ?/sec    1.00      5.3±0.01µs        ? ?/sec
zip_8192_from_string_views size (3..10)/null_vs_non_null_scalar/10pct_true              1.00      5.5±0.02µs        ? ?/sec    1.00      5.5±0.02µs        ? ?/sec
zip_8192_from_string_views size (3..10)/null_vs_non_null_scalar/1pct_true               1.00      5.5±0.09µs        ? ?/sec    1.00      5.5±0.16µs        ? ?/sec
zip_8192_from_string_views size (3..10)/null_vs_non_null_scalar/50pct_nulls             1.00      5.6±0.05µs        ? ?/sec    1.00      5.6±0.01µs        ? ?/sec
zip_8192_from_string_views size (3..10)/null_vs_non_null_scalar/50pct_true              1.00      5.5±0.01µs        ? ?/sec    1.00      5.5±0.02µs        ? ?/sec
zip_8192_from_string_views size (3..10)/null_vs_non_null_scalar/90pct_true              1.00      5.5±0.04µs        ? ?/sec    1.00      5.5±0.02µs        ? ?/sec
zip_8192_from_string_views size (3..10)/null_vs_non_null_scalar/99pct_true              1.00      5.5±0.01µs        ? ?/sec    1.00      5.5±0.06µs        ? ?/sec
zip_8192_from_string_views size (3..10)/null_vs_non_null_scalar/all_false               1.00      5.5±0.08µs        ? ?/sec    1.00      5.5±0.16µs        ? ?/sec
zip_8192_from_string_views size (3..10)/null_vs_non_null_scalar/all_true                1.12      2.6±0.23µs        ? ?/sec    1.00      2.3±0.14µs        ? ?/sec

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @cetra3 and @rluvaton

I agree with @rluvaton 's comments

Your PR comments say this:

There are a number of places within the code that go from MutableBuffer to Vec and then to Buffer. This causes extra allocations and is ripe for a performance refactor.

But I can't figure out where this extra allocation is (see comments inline). I am probably missing something

// invariant: len <= capacity
len: usize,
layout: Layout,
capacity: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale for inlining Layout here?

It seems like Layout https://doc.rust-lang.org/beta/std/alloc/struct.Layout.html has the same representation and has accessors to get capacity and alignment 🤔 as usize

https://doc.rust-lang.org/beta/std/alloc/struct.Layout.html#method.size
https://doc.rust-lang.org/beta/std/alloc/struct.Layout.html#method.align

If we reverted this change to Layout I think the diff would be easier to understand

/// Only used in cold paths (alloc/dealloc/realloc).
#[inline]
fn layout(&self) -> Layout {
debug_assert!(self.align.is_power_of_two());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above -- I don't understand the rationale for inlining the Layout

Comment on lines 160 to 165
let values: ScalarBuffer<T::Native> = indices
.iter()
.map(|(a, b)| interleaved.arrays[*a].value(*b))
.collect::<Vec<_>>();
.collect();

let array = PrimitiveArray::<T>::try_new(values.into(), interleaved.nulls)?;
let array = PrimitiveArray::<T>::try_new(values, interleaved.nulls)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this would help

The old code did ScalarBuffer::from

impl<T: ArrowNativeType> From<Vec<T>> for ScalarBuffer<T> {
fn from(value: Vec<T>) -> Self {
Self {
buffer: Buffer::from_vec(value),
phantom: Default::default(),
}
}
}

Which calls Buffer::from_vec

Which makes a MutableBuffer by taking over the Vec allocation

fn from(value: Vec<T>) -> Self {
// Safety
// Vec::as_ptr guaranteed to not be null and ArrowNativeType are trivially transmutable
let data = unsafe { NonNull::new_unchecked(value.as_ptr() as _) };
let len = value.len() * mem::size_of::<T>();
// Safety
// Vec guaranteed to have a valid layout matching that of `Layout::array`
// This is based on `RawVec::current_memory`
let layout = unsafe { Layout::array::<T>(value.capacity()).unwrap_unchecked() };
mem::forget(value);
Self {
data,
len,
capacity: layout.size(),
align: layout.align(),
#[cfg(feature = "pool")]
reservation: std::sync::Mutex::new(None),
}

Which then makes a buffer

So TLDR is I don't understand why this saves a memory copy

Or is the theory that all the shenanigans to make Vec -> MutableBuffer -> Bytes/Buffer can be reduced?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically there are a bunch of places in the code I think that make ScalarBuffer from Vecs and assume that is fast. If we can make it faster, maybe we can optimize that path (in one place) rather than having to change all call sites to use collect ScalarBuffer directly 🤔

@Dandandan
Copy link
Contributor

Based on the benchmark runner - it seems not yet showing consistent improvements.

@cetra3
Copy link
Contributor Author

cetra3 commented Feb 11, 2026

Let me address the comments today.

There is one other reason I would like to move away from Vec and that is: if we are making some adjustments to how we cap memory in DF, then it will be easier to integrate with a memory pool if all our internals are using data structures that support memory pools

@cetra3
Copy link
Contributor Author

cetra3 commented Feb 12, 2026

OK I've slimmed this down to just the builder changes and left the changes around adjusting the kernels to a later PR.

I think at some point if we are doing memory accounting we need to be able to ensure we don't lose the memory pool provenance which was the intent behind some of the changes, but I can raise that in a separate PR.

I also think that the allocation strategies for Vec and MutableBuffer are different. The Vec uses power of two to grow out the allocations, and the MutableBuffer uses round_upto_multiple_of_64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants