Sort performance by cetra3 · Pull Request #9407 · apache/arrow-rs

cetra3 · 2026-02-13T07:47:11Z

Which issue does this PR close?

N/A

Rationale for this change

There are a few small wins we can get adjusting how sort works by avoiding some copies in memory & eliminates some unsafe code around the partition scan. I tried a few attempts at bringing down the numbers of the partition scan even more by chunking the bitmaps, but this "simpler" implementation looks like it compiles/branches pretty well

I also optimised the sort_native_type to avoid some double copies and bounds checks, which gets us a nice +17% speedup in some of the primitive benchmarks including null values (i.e, sort i32 nulls 2^12).

Admittedly after changing this code it doesn't even look like DataFusion uses that path... it only uses lexsort_to_indices.

One other extension I thought of was for smaller arrays, whether it makes sense to delegate to the sort_native_type and then slice it, rather than do the indices rigmarole

What changes are included in this PR?

Adjustments to two inner functions:

sort_native_type
& partition_validity_scan

Are these changes tested?

Yes via:

cargo test -p arrow-ord
cargo +nightly miri test -p arrow-ord

& benched with main via:

cargo bench --features test_utils -p arrow --bench sort_kernel -- --baseline original

Are there any user-facing changes?

Nope

Benchmark Results against `main`

Here's a comparison against main on my PC.

It would be interesting to run this via the standard benchmark and see if we have the same improvements there:

sort i32 2^10           time:   [3.6062 µs 3.6126 µs 3.6198 µs]
                        change: [−0.4850% −0.2379% +0.0180%] (p = 0.08 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

sort i32 to indices 2^10
                        time:   [5.1211 µs 5.1294 µs 5.1390 µs]
                        change: [−3.1047% −2.9580% −2.7578%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

sort i32 2^12           time:   [16.895 µs 16.907 µs 16.919 µs]
                        change: [−0.0119% +0.0442% +0.1068%] (p = 0.17 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

sort i32 to indices 2^12
                        time:   [23.050 µs 23.071 µs 23.090 µs]
                        change: [−5.1415% −5.0592% −4.9845%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

sort i32 nulls 2^10     time:   [2.0931 µs 2.0955 µs 2.0983 µs]
                        change: [−17.734% −17.538% −17.307%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

sort i32 nulls to indices 2^10
                        time:   [3.0490 µs 3.0500 µs 3.0510 µs]
                        change: [−2.0059% −1.9109% −1.8199%] (p = 0.00 < 0.05)
                        Performance has improved.

sort i32 nulls 2^12     time:   [8.9355 µs 8.9408 µs 8.9468 µs]
                        change: [−18.961% −18.893% −18.827%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

sort i32 nulls to indices 2^12
                        time:   [12.599 µs 12.603 µs 12.608 µs]
                        change: [−5.2886% −5.2491% −5.2092%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

sort f32 2^12           time:   [25.314 µs 25.329 µs 25.345 µs]
                        change: [+0.0265% +0.1125% +0.1923%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild

sort f32 to indices 2^12
                        time:   [30.457 µs 30.490 µs 30.523 µs]
                        change: [−1.2373% −1.1522% −1.0705%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

sort f32 nulls 2^12     time:   [12.506 µs 12.516 µs 12.527 µs]
                        change: [−1.3180% −1.2207% −1.1189%] (p = 0.00 < 0.05)
                        Performance has improved.

sort f32 nulls to indices 2^12
                        time:   [16.185 µs 16.221 µs 16.263 µs]
                        change: [−1.9818% −1.8358% −1.6704%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

sort string[0-10] to indices 2^12
                        time:   [33.546 µs 33.661 µs 33.817 µs]
                        change: [−0.3347% −0.1203% +0.1008%] (p = 0.32 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

sort string[0-10] nulls to indices 2^12
                        time:   [17.370 µs 17.383 µs 17.396 µs]
                        change: [−3.9307% −3.8159% −3.7119%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) low mild
  1 (1.00%) high mild

sort string[0-100] to indices 2^12
                        time:   [31.321 µs 31.394 µs 31.469 µs]
                        change: [−1.7726% −1.5633% −1.3371%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

sort string[0-100] nulls to indices 2^12
                        time:   [16.468 µs 16.488 µs 16.511 µs]
                        change: [−1.4143% −0.9178% −0.2647%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

sort string[0-400] to indices 2^12
                        time:   [31.458 µs 31.500 µs 31.547 µs]
                        change: [−2.7636% −2.5729% −2.3756%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

sort string[0-400] nulls to indices 2^12
                        time:   [16.734 µs 16.800 µs 16.866 µs]
                        change: [−3.4989% −3.2443% −2.9793%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe

sort string[10] to indices 2^12
                        time:   [30.367 µs 30.404 µs 30.446 µs]
                        change: [−3.1487% −2.9660% −2.7843%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

sort string[10] nulls to indices 2^12
                        time:   [16.809 µs 16.829 µs 16.852 µs]
                        change: [−3.2575% −3.1146% −2.9765%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

sort string[100] to indices 2^12
                        time:   [31.771 µs 31.864 µs 31.978 µs]
                        change: [−0.2878% +0.1187% +0.5950%] (p = 0.58 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  10 (10.00%) high mild
  3 (3.00%) high severe

sort string[100] nulls to indices 2^12
                        time:   [16.676 µs 16.716 µs 16.762 µs]
                        change: [−2.9266% −2.7685% −2.6018%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

sort string[1000] to indices 2^12
                        time:   [30.735 µs 30.766 µs 30.796 µs]
                        change: [−1.6116% −1.4391% −1.2690%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

sort string[1000] nulls to indices 2^12
                        time:   [17.003 µs 17.017 µs 17.032 µs]
                        change: [−0.0323% +0.2819% +0.5431%] (p = 0.09 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  7 (7.00%) high severe

sort string_view[10] to indices 2^12
                        time:   [39.257 µs 39.371 µs 39.502 µs]
                        change: [+0.0916% +0.3817% +0.7433%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  10 (10.00%) high mild
  4 (4.00%) high severe

sort string_view[10] nulls to indices 2^12
                        time:   [20.450 µs 20.462 µs 20.474 µs]
                        change: [−2.4581% −2.3353% −2.2270%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

sort string_view[0-400] to indices 2^12
                        time:   [45.049 µs 45.080 µs 45.115 µs]
                        change: [−1.7400% −1.5494% −1.3678%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

sort string_view[0-400] nulls to indices 2^12
                        time:   [22.724 µs 22.743 µs 22.760 µs]
                        change: [−1.2593% −1.1669% −1.0752%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

sort string_view_inlined[0-12] to indices 2^12
                        time:   [36.892 µs 36.904 µs 36.918 µs]
                        change: [−1.4699% −1.2349% −1.0770%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

sort string_view_inlined[0-12] nulls to indices 2^12
                        time:   [19.937 µs 19.963 µs 19.991 µs]
                        change: [−0.7254% −0.5934% −0.4552%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

sort string[10] dict to indices 2^12
                        time:   [142.71 µs 142.98 µs 143.29 µs]
                        change: [−5.5558% −2.9225% −0.7838%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

sort string[10] dict nulls to indices 2^12
                        time:   [69.301 µs 69.416 µs 69.563 µs]
                        change: [−13.972% −9.2852% −4.7013%] (p = 0.00 < 0.05)
                        Performance has improved.

sort primitive run 2^12 time:   [3.3347 µs 3.3552 µs 3.3795 µs]
                        change: [+0.7883% +1.5862% +2.4305%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 20 outliers among 100 measurements (20.00%)
  1 (1.00%) high mild
  19 (19.00%) high severe

sort primitive run to indices 2^12
                        time:   [3.8023 µs 3.8040 µs 3.8057 µs]
                        change: [+0.3625% +0.4246% +0.4842%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

lexsort (f32, f32) 2^10 time:   [21.543 µs 21.559 µs 21.578 µs]
                        change: [−1.1606% −0.9424% −0.7230%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

lexsort (f32, f32) 2^12 time:   [101.30 µs 101.35 µs 101.42 µs]
                        change: [−1.2541% −1.0419% −0.8720%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

lexsort (f32, f32) nulls 2^10
                        time:   [20.862 µs 20.875 µs 20.891 µs]
                        change: [+0.2650% +0.4623% +0.7238%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

lexsort (f32, f32) nulls 2^12
                        time:   [90.433 µs 90.501 µs 90.572 µs]
                        change: [−1.1384% −0.8178% −0.4471%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe

lexsort (bool, bool) 2^12
                        time:   [35.253 µs 35.268 µs 35.282 µs]
                        change: [−2.4786% −2.4183% −2.3630%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

lexsort (bool, bool) nulls 2^12
                        time:   [47.627 µs 47.669 µs 47.708 µs]
                        change: [−4.2046% −4.0162% −3.8511%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

lexsort (f32, f32) 2^12 limit 10
                        time:   [18.287 µs 18.304 µs 18.324 µs]
                        change: [−3.0634% −2.5423% −2.0626%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  8 (8.00%) high mild

lexsort (f32, f32) 2^12 limit 100
                        time:   [19.535 µs 19.560 µs 19.585 µs]
                        change: [−0.9957% −0.8426% −0.6985%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

lexsort (f32, f32) 2^12 limit 1000
                        time:   [37.494 µs 37.511 µs 37.526 µs]
                        change: [−1.3760% −1.2316% −1.0851%] (p = 0.00 < 0.05)
                        Performance has improved.

lexsort (f32, f32) 2^12 limit 2^12
                        time:   [101.54 µs 101.63 µs 101.72 µs]
                        change: [−0.1362% −0.0357% +0.0559%] (p = 0.47 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

lexsort (f32, f32) nulls 2^12 limit 10
                        time:   [30.107 µs 30.129 µs 30.154 µs]
                        change: [+0.3201% +0.4125% +0.5164%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

lexsort (f32, f32) nulls 2^12 limit 100
                        time:   [30.469 µs 30.486 µs 30.504 µs]
                        change: [+0.3045% +0.3768% +0.4462%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe

lexsort (f32, f32) nulls 2^12 limit 1000
                        time:   [34.089 µs 34.103 µs 34.117 µs]
                        change: [−1.0657% −0.9015% −0.7530%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

lexsort (f32, f32) nulls 2^12 limit 2^12
                        time:   [90.160 µs 90.201 µs 90.247 µs]
                        change: [−2.6910% −2.5248% −2.3647%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

rank f32 2^12           time:   [30.088 µs 30.103 µs 30.119 µs]
                        change: [−1.2426% −1.1095% −0.9916%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

rank f32 nulls 2^12     time:   [15.402 µs 15.415 µs 15.430 µs]
                        change: [−1.9693% −1.8289% −1.6885%] (p = 0.00 < 0.05)
                        Performance has improved.

rank string[10] 2^12    time:   [117.06 µs 117.10 µs 117.13 µs]
                        change: [−3.0992% −2.9805% −2.8630%] (p = 0.00 < 0.05)
                        Performance has improved.

rank string[10] nulls 2^12
                        time:   [55.871 µs 55.906 µs 55.945 µs]
                        change: [−0.5642% −0.4729% −0.3840%] (p = 0.00 < 0.05)
                        Change within noise threshold.

cetra3 · 2026-02-13T07:49:19Z

arrow-ord/src/sort.rs

+    let mut cur_idx = 0u32;

-        // 2) Write null indices by inverting
-        let inv_buf = !bitmap.inner();


I think this is where most of the saving comes in. Recreating the bitmap with inverse is another pass over it, so we have to read it twice.

Makes sense

Interesting, I think this was changed in the other direction in 625e6ee#diff-2c3ee46148ccba8c1092294caab66ba73416314595ce2eef9b2bdd9cd898b27d

I wonder what changed now that makes the benchmarks go the other way.

Dandandan · 2026-02-14T07:21:53Z

run benchmark sort_kernel

alamb-ghbot · 2026-02-14T07:22:01Z

🤖 Hi @Dandandan, thanks for the request (#9407 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

Standard: (none)
Criterion: array_from, array_iter, arrow_reader, arrow_reader_clickbench, arrow_reader_row_filter, arrow_statistics, arrow_writer, bitwise_kernel, boolean_kernels, buffer_bit_ops, builder, cast_kernels, coalesce_kernels, comparison_kernels, concatenate_kernel, csv_writer, encoding, filter_kernels, interleave_kernels, ipc_reader, json-reader, metadata, row_format, sort_kernel, take_kernels, union_array, variant_builder, variant_kernels, variant_validation, view_types, zip_kernels

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

You can also set environment variables on subsequent lines:

run benchmark tpch_mem
DATAFUSION_RUNTIME_MEMORY_LIMIT=1G

Unsupported benchmarks: sort_kernels.

alamb-ghbot · 2026-02-14T07:22:27Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing sort_performance (8657ae6) to 7d16cd0 diff
BENCH_NAME=sort_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench sort_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=sort_performance
Results will be posted here when complete

alamb-ghbot · 2026-02-14T07:41:42Z

🤖: Benchmark completed

Details

group                                                   main                                   sort_performance
-----                                                   ----                                   ----------------
lexsort (bool, bool) 2^12                               1.00    115.9±0.37µs        ? ?/sec    1.00    115.6±1.33µs        ? ?/sec
lexsort (bool, bool) nulls 2^12                         1.03    164.7±0.32µs        ? ?/sec    1.00    160.6±0.70µs        ? ?/sec
lexsort (f32, f32) 2^10                                 1.00     44.8±0.25µs        ? ?/sec    1.00     45.0±0.23µs        ? ?/sec
lexsort (f32, f32) 2^12                                 1.01    211.9±2.48µs        ? ?/sec    1.00    210.8±0.39µs        ? ?/sec
lexsort (f32, f32) 2^12 limit 10                        1.00     38.4±0.07µs        ? ?/sec    1.00     38.5±0.22µs        ? ?/sec
lexsort (f32, f32) 2^12 limit 100                       1.00     41.1±0.06µs        ? ?/sec    1.00     40.9±0.43µs        ? ?/sec
lexsort (f32, f32) 2^12 limit 1000                      1.00     78.6±0.18µs        ? ?/sec    1.01     79.2±2.54µs        ? ?/sec
lexsort (f32, f32) 2^12 limit 2^12                      1.00    211.6±0.60µs        ? ?/sec    1.01    212.8±1.46µs        ? ?/sec
lexsort (f32, f32) nulls 2^10                           1.00     54.7±0.46µs        ? ?/sec    1.02     55.6±0.10µs        ? ?/sec
lexsort (f32, f32) nulls 2^12                           1.00    255.9±0.57µs        ? ?/sec    1.03   263.2±10.94µs        ? ?/sec
lexsort (f32, f32) nulls 2^12 limit 10                  1.02     89.4±0.27µs        ? ?/sec    1.00     87.9±1.58µs        ? ?/sec
lexsort (f32, f32) nulls 2^12 limit 100                 1.02     90.6±0.22µs        ? ?/sec    1.00     89.1±1.69µs        ? ?/sec
lexsort (f32, f32) nulls 2^12 limit 1000                1.02    102.1±0.32µs        ? ?/sec    1.00     99.7±2.08µs        ? ?/sec
lexsort (f32, f32) nulls 2^12 limit 2^12                1.00    256.0±0.88µs        ? ?/sec    1.02    261.4±2.25µs        ? ?/sec
rank f32 2^12                                           1.00     68.7±0.37µs        ? ?/sec    1.00     68.6±0.42µs        ? ?/sec
rank f32 nulls 2^12                                     1.00     35.5±0.09µs        ? ?/sec    1.02     36.1±0.29µs        ? ?/sec
rank string[10] 2^12                                    1.00    252.7±1.35µs        ? ?/sec    1.03    259.2±1.43µs        ? ?/sec
rank string[10] nulls 2^12                              1.00    120.9±0.48µs        ? ?/sec    1.01    122.5±0.26µs        ? ?/sec
sort f32 2^12                                           1.00     70.5±0.56µs        ? ?/sec    1.00     70.4±0.23µs        ? ?/sec
sort f32 nulls 2^12                                     1.00     28.8±0.13µs        ? ?/sec    1.04     29.8±0.20µs        ? ?/sec
sort f32 nulls to indices 2^12                          1.00     37.4±0.12µs        ? ?/sec    1.35     50.5±1.69µs        ? ?/sec
sort f32 to indices 2^12                                1.00     70.6±0.34µs        ? ?/sec    1.02     71.8±0.33µs        ? ?/sec
sort i32 2^10                                           1.02      7.5±0.16µs        ? ?/sec    1.00      7.4±0.22µs        ? ?/sec
sort i32 2^12                                           1.03     37.1±0.75µs        ? ?/sec    1.00     36.1±0.53µs        ? ?/sec
sort i32 nulls 2^10                                     1.04      4.8±0.02µs        ? ?/sec    1.00      4.6±0.01µs        ? ?/sec
sort i32 nulls 2^12                                     1.03     19.8±0.15µs        ? ?/sec    1.00     19.3±0.04µs        ? ?/sec
sort i32 nulls to indices 2^10                          1.00      6.7±0.07µs        ? ?/sec    1.06      7.2±0.02µs        ? ?/sec
sort i32 nulls to indices 2^12                          1.00     28.5±0.24µs        ? ?/sec    1.52     43.3±0.13µs        ? ?/sec
sort i32 to indices 2^10                                1.00     10.9±0.55µs        ? ?/sec    1.05     11.4±0.20µs        ? ?/sec
sort i32 to indices 2^12                                1.00     51.8±1.34µs        ? ?/sec    1.03     53.3±0.47µs        ? ?/sec
sort primitive run 2^12                                 1.11      7.1±0.07µs        ? ?/sec    1.00      6.4±0.07µs        ? ?/sec
sort primitive run to indices 2^12                      1.04      7.7±0.04µs        ? ?/sec    1.00      7.5±0.03µs        ? ?/sec
sort string[0-100] nulls to indices 2^12                1.00     42.3±0.20µs        ? ?/sec    1.37     58.2±0.83µs        ? ?/sec
sort string[0-100] to indices 2^12                      1.00     90.4±0.71µs        ? ?/sec    1.02     92.5±1.87µs        ? ?/sec
sort string[0-10] nulls to indices 2^12                 1.00     51.0±1.41µs        ? ?/sec    1.29     65.9±0.71µs        ? ?/sec
sort string[0-10] to indices 2^12                       1.01    128.8±0.49µs        ? ?/sec    1.00    127.2±3.80µs        ? ?/sec
sort string[0-400] nulls to indices 2^12                1.00     42.7±0.57µs        ? ?/sec    1.34     57.3±0.15µs        ? ?/sec
sort string[0-400] to indices 2^12                      1.00     92.1±0.61µs        ? ?/sec    1.00     91.7±0.33µs        ? ?/sec
sort string[1000] nulls to indices 2^12                 1.00     43.9±0.90µs        ? ?/sec    1.33     58.5±2.90µs        ? ?/sec
sort string[1000] to indices 2^12                       1.00     88.3±0.43µs        ? ?/sec    1.00     88.0±1.18µs        ? ?/sec
sort string[100] nulls to indices 2^12                  1.00     42.5±0.13µs        ? ?/sec    1.36     57.6±1.96µs        ? ?/sec
sort string[100] to indices 2^12                        1.00     88.1±0.90µs        ? ?/sec    1.00     87.7±1.97µs        ? ?/sec
sort string[10] dict nulls to indices 2^12              1.00    153.0±6.48µs        ? ?/sec    1.17    179.2±5.24µs        ? ?/sec
sort string[10] dict to indices 2^12                    1.00    315.4±2.45µs        ? ?/sec    1.01    318.8±0.77µs        ? ?/sec
sort string[10] nulls to indices 2^12                   1.00     43.6±0.36µs        ? ?/sec    1.33     58.0±0.26µs        ? ?/sec
sort string[10] to indices 2^12                         1.01     87.6±1.15µs        ? ?/sec    1.00     87.0±0.18µs        ? ?/sec
sort string_view[0-400] nulls to indices 2^12           1.00     58.0±1.85µs        ? ?/sec    1.30     75.3±0.31µs        ? ?/sec
sort string_view[0-400] to indices 2^12                 1.00    122.2±0.12µs        ? ?/sec    1.00    122.2±2.12µs        ? ?/sec
sort string_view[10] nulls to indices 2^12              1.00     44.6±0.26µs        ? ?/sec    1.44     64.1±1.39µs        ? ?/sec
sort string_view[10] to indices 2^12                    1.00    104.2±0.36µs        ? ?/sec    1.03    107.2±0.57µs        ? ?/sec
sort string_view_inlined[0-12] nulls to indices 2^12    1.00     43.1±1.54µs        ? ?/sec    1.42     61.2±0.20µs        ? ?/sec
sort string_view_inlined[0-12] to indices 2^12          1.00     95.1±3.86µs        ? ?/sec    1.00     94.9±0.89µs        ? ?/sec

github-actions bot added the arrow Changes to the arrow crate label Feb 13, 2026

cetra3 commented Feb 13, 2026

View reviewed changes

Sort perf

8657ae6

cetra3 force-pushed the sort_performance branch from 32667d6 to 8657ae6 Compare February 13, 2026 07:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sort performance#9407

Sort performance#9407
cetra3 wants to merge 1 commit intoapache:mainfrom
pydantic:sort_performance

cetra3 commented Feb 13, 2026 •

edited

Loading

Uh oh!

cetra3 Feb 13, 2026

Uh oh!

Dandandan Feb 13, 2026

Uh oh!

jhorstmann Feb 13, 2026

Uh oh!

Dandandan commented Feb 14, 2026 •

edited

Loading

Uh oh!

alamb-ghbot commented Feb 14, 2026

Uh oh!

alamb-ghbot commented Feb 14, 2026

Uh oh!

alamb-ghbot commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

cetra3 commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Benchmark Results against main

Uh oh!

cetra3 Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Dandandan Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

jhorstmann Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Dandandan commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb-ghbot commented Feb 14, 2026

Uh oh!

alamb-ghbot commented Feb 14, 2026

Uh oh!

alamb-ghbot commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cetra3 commented Feb 13, 2026 •

edited

Loading

Benchmark Results against `main`

Dandandan commented Feb 14, 2026 •

edited

Loading