Skip to content

[feature] additional optimizations#16

Open
iceseer wants to merge 11 commits intoparitytech:mainfrom
iceseer:feature/optimization
Open

[feature] additional optimizations#16
iceseer wants to merge 11 commits intoparitytech:mainfrom
iceseer:feature/optimization

Conversation

@iceseer
Copy link

@iceseer iceseer commented Oct 20, 2025

Features

[feature] parallel calculations
[feature] branch prediction
[feature] prefetches
[feature] arena allocator

Additional options (features=)

parallel - enables most of the optimizations
arena - enables arena-allocator (at the moment, enabling this option is not recommended.)

The number of threads for parallel computing can be set using the RAYON_NUM_THREADS environment variable. By default, it is equal to the number of logical cores.

RAYON_NUM_THREADS=4 cargo build --features parallel,simd

Results

master

construct/PoV: 131072 Chunks: 1023
                        time:   [734.44 µs 735.22 µs 735.74 µs]
                        thrpt:  [169.90 MiB/s 170.02 MiB/s 170.20 MiB/s]
construct/PoV: 131072 Chunks: 1024
                        time:   [614.16 µs 616.57 µs 618.48 µs]
                        thrpt:  [202.11 MiB/s 202.74 MiB/s 203.53 MiB/s]
construct/PoV: 1048576 Chunks: 1023
                        time:   [4.1591 ms 4.1618 ms 4.1641 ms]
                        thrpt:  [240.15 MiB/s 240.28 MiB/s 240.44 MiB/s]
construct/PoV: 1048576 Chunks: 1024
                        time:   [4.0365 ms 4.0396 ms 4.0422 ms]
                        thrpt:  [247.39 MiB/s 247.55 MiB/s 247.74 MiB/s]
construct/PoV: 5242880 Chunks: 1023
                        time:   [20.881 ms 20.907 ms 20.943 ms]
                        thrpt:  [238.74 MiB/s 239.16 MiB/s 239.45 MiB/s]
construct/PoV: 5242880 Chunks: 1024
                        time:   [22.498 ms 22.520 ms 22.535 ms]
                        thrpt:  [221.88 MiB/s 222.03 MiB/s 222.24 MiB/s]

reconstruct_regular/PoV: 131072 Chunks: 1023
                        time:   [942.23 µs 944.27 µs 945.90 µs]
                        thrpt:  [132.15 MiB/s 132.38 MiB/s 132.66 MiB/s]
reconstruct_regular/PoV: 131072 Chunks: 1024
                        time:   [910.97 µs 912.27 µs 913.86 µs]
                        thrpt:  [136.78 MiB/s 137.02 MiB/s 137.22 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1023
                        time:   [5.3444 ms 5.3599 ms 5.3695 ms]
                        thrpt:  [186.24 MiB/s 186.57 MiB/s 187.11 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1024
                        time:   [5.3483 ms 5.3549 ms 5.3616 ms]
                        thrpt:  [186.51 MiB/s 186.75 MiB/s 186.97 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1023
                        time:   [28.427 ms 28.457 ms 28.476 ms]
                        thrpt:  [175.59 MiB/s 175.70 MiB/s 175.89 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1024
                        time:   [28.557 ms 28.575 ms 28.593 ms]
                        thrpt:  [174.87 MiB/s 174.98 MiB/s 175.09 MiB/s]

reconstruct_systematic/PoV: 131072 Chunks: 1023
                        time:   [3.8896 µs 3.8971 µs 3.9025 µs]
                        thrpt:  [31.280 GiB/s 31.324 GiB/s 31.384 GiB/s]
reconstruct_systematic/PoV: 131072 Chunks: 1024
                        time:   [3.6804 µs 3.6950 µs 3.7231 µs]
                        thrpt:  [32.787 GiB/s 33.037 GiB/s 33.168 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1023
                        time:   [40.176 µs 40.293 µs 40.367 µs]
                        thrpt:  [24.192 GiB/s 24.237 GiB/s 24.307 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1024
                        time:   [40.882 µs 41.266 µs 41.478 µs]
                        thrpt:  [23.544 GiB/s 23.665 GiB/s 23.887 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1023
                        time:   [202.30 µs 203.64 µs 204.52 µs]
                        thrpt:  [23.875 GiB/s 23.978 GiB/s 24.136 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1024
                        time:   [200.78 µs 201.15 µs 201.55 µs]
                        thrpt:  [24.226 GiB/s 24.275 GiB/s 24.320 GiB/s]

merklize/PoV: 131072 Chunks: 1023
                        time:   [747.82 µs 749.57 µs 751.27 µs]
                        thrpt:  [166.38 MiB/s 166.76 MiB/s 167.15 MiB/s]
merklize/PoV: 131072 Chunks: 1024
                        time:   [622.28 µs 623.07 µs 623.77 µs]
                        thrpt:  [200.39 MiB/s 200.62 MiB/s 200.87 MiB/s]
merklize/PoV: 1048576 Chunks: 1023
                        time:   [3.3391 ms 3.3513 ms 3.3614 ms]
                        thrpt:  [297.50 MiB/s 298.40 MiB/s 299.48 MiB/s]
merklize/PoV: 1048576 Chunks: 1024
                        time:   [3.2344 ms 3.2420 ms 3.2498 ms]
                        thrpt:  [307.71 MiB/s 308.46 MiB/s 309.18 MiB/s]
merklize/PoV: 5242880 Chunks: 1023
                        time:   [15.130 ms 15.152 ms 15.166 ms]
                        thrpt:  [329.68 MiB/s 330.00 MiB/s 330.47 MiB/s]
merklize/PoV: 5242880 Chunks: 1024
                        time:   [15.034 ms 15.066 ms 15.090 ms]
                        thrpt:  [331.34 MiB/s 331.88 MiB/s 332.57 MiB/s]

verify_chunk/PoV: 131072 Chunks: 1023
                        time:   [1.8991 µs 1.9013 µs 1.9046 µs]
                        thrpt:  [64.093 GiB/s 64.204 GiB/s 64.276 GiB/s]
verify_chunk/PoV: 131072 Chunks: 1024
                        time:   [1.7756 µs 1.7800 µs 1.7852 µs]
                        thrpt:  [68.380 GiB/s 68.580 GiB/s 68.750 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1023
                        time:   [4.2870 µs 4.3015 µs 4.3138 µs]
                        thrpt:  [226.38 GiB/s 227.03 GiB/s 227.80 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1024
                        time:   [4.1736 µs 4.1893 µs 4.2050 µs]
                        thrpt:  [232.24 GiB/s 233.11 GiB/s 233.99 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1023
                        time:   [15.244 µs 15.272 µs 15.298 µs]
                        thrpt:  [319.18 GiB/s 319.73 GiB/s 320.31 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1024
                        time:   [15.069 µs 15.114 µs 15.170 µs]
                        thrpt:  [321.88 GiB/s 323.07 GiB/s 324.02 GiB/s]

feature/optimized "simd,parallel"

construct/PoV: 131072 Chunks: 1023
                        time:   [510.44 µs 518.70 µs 523.39 µs]
                        thrpt:  [238.83 MiB/s 240.99 MiB/s 244.89 MiB/s]
construct/PoV: 131072 Chunks: 1024
                        time:   [491.78 µs 499.87 µs 505.45 µs]
                        thrpt:  [247.30 MiB/s 250.06 MiB/s 254.18 MiB/s]
construct/PoV: 1048576 Chunks: 1023
                        time:   [3.8102 ms 4.0263 ms 4.2182 ms]
                        thrpt:  [237.07 MiB/s 248.37 MiB/s 262.45 MiB/s]
construct/PoV: 1048576 Chunks: 1024
                        time:   [3.3056 ms 3.4247 ms 3.5951 ms]
                        thrpt:  [278.16 MiB/s 291.99 MiB/s 302.52 MiB/s]
construct/PoV: 5242880 Chunks: 1023
                        time:   [11.770 ms 12.076 ms 12.341 ms]
                        thrpt:  [405.16 MiB/s 414.05 MiB/s 424.81 MiB/s]
construct/PoV: 5242880 Chunks: 1024
                        time:   [10.573 ms 10.729 ms 10.889 ms]
                        thrpt:  [459.17 MiB/s 466.01 MiB/s 472.90 MiB/s]

reconstruct_regular/PoV: 131072 Chunks: 1023
                        time:   [1.1142 ms 1.1174 ms 1.1207 ms]
                        thrpt:  [111.54 MiB/s 111.86 MiB/s 112.19 MiB/s]
reconstruct_regular/PoV: 131072 Chunks: 1024
                        time:   [1.0434 ms 1.0461 ms 1.0485 ms]
                        thrpt:  [119.22 MiB/s 119.50 MiB/s 119.81 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1023
                        time:   [5.7017 ms 5.7171 ms 5.7359 ms]
                        thrpt:  [174.34 MiB/s 174.92 MiB/s 175.39 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1024
                        time:   [5.9081 ms 5.9204 ms 5.9383 ms]
                        thrpt:  [168.40 MiB/s 168.91 MiB/s 169.26 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1023
                        time:   [30.485 ms 30.520 ms 30.559 ms]
                        thrpt:  [163.62 MiB/s 163.83 MiB/s 164.01 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1024
                        time:   [30.348 ms 30.410 ms 30.463 ms]
                        thrpt:  [164.13 MiB/s 164.42 MiB/s 164.76 MiB/s]

reconstruct_systematic/PoV: 131072 Chunks: 1023
                        time:   [4.4633 µs 4.4745 µs 4.4908 µs]
                        thrpt:  [27.183 GiB/s 27.281 GiB/s 27.350 GiB/s]
reconstruct_systematic/PoV: 131072 Chunks: 1024
                        time:   [4.2831 µs 4.2918 µs 4.3010 µs]
                        thrpt:  [28.382 GiB/s 28.442 GiB/s 28.500 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1023
                        time:   [44.509 µs 44.593 µs 44.663 µs]
                        thrpt:  [21.865 GiB/s 21.899 GiB/s 21.941 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1024
                        time:   [43.186 µs 43.294 µs 43.375 µs]
                        thrpt:  [22.515 GiB/s 22.557 GiB/s 22.613 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1023
                        time:   [225.61 µs 226.08 µs 226.63 µs]
                        thrpt:  [21.546 GiB/s 21.598 GiB/s 21.643 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1024
                        time:   [213.87 µs 214.43 µs 214.87 µs]
                        thrpt:  [22.724 GiB/s 22.771 GiB/s 22.831 GiB/s]

merklize/PoV: 131072 Chunks: 1023
                        time:   [449.20 µs 452.96 µs 455.20 µs]
                        thrpt:  [274.60 MiB/s 275.97 MiB/s 278.27 MiB/s]
merklize/PoV: 131072 Chunks: 1024
                        time:   [452.59 µs 460.99 µs 469.91 µs]
                        thrpt:  [266.01 MiB/s 271.15 MiB/s 276.19 MiB/s]
merklize/PoV: 1048576 Chunks: 1023
                        time:   [1.1936 ms 1.2028 ms 1.2104 ms]
                        thrpt:  [826.17 MiB/s 831.42 MiB/s 837.79 MiB/s]
merklize/PoV: 1048576 Chunks: 1024
                        time:   [1.1964 ms 1.2112 ms 1.2219 ms]
                        thrpt:  [818.39 MiB/s 825.64 MiB/s 835.82 MiB/s]
merklize/PoV: 5242880 Chunks: 1023
                        time:   [3.9726 ms 4.0436 ms 4.1057 ms]
                        thrpt:  [1.1893 GiB/s 1.2075 GiB/s 1.2291 GiB/s]
merklize/PoV: 5242880 Chunks: 1024
                        time:   [3.8041 ms 3.8699 ms 3.9361 ms]
                        thrpt:  [1.2405 GiB/s 1.2617 GiB/s 1.2836 GiB/s]

verify_chunk/PoV: 131072 Chunks: 1023
                        time:   [1.6454 µs 1.6549 µs 1.6701 µs]
                        thrpt:  [73.093 GiB/s 73.762 GiB/s 74.191 GiB/s]
verify_chunk/PoV: 131072 Chunks: 1024
                        time:   [1.4936 µs 1.5016 µs 1.5135 µs]
                        thrpt:  [80.653 GiB/s 81.294 GiB/s 81.730 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1023
                        time:   [3.8267 µs 3.8389 µs 3.8525 µs]
                        thrpt:  [253.48 GiB/s 254.38 GiB/s 255.20 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1024
                        time:   [3.6486 µs 3.6626 µs 3.6818 µs]
                        thrpt:  [265.24 GiB/s 266.63 GiB/s 267.66 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1023
                        time:   [13.767 µs 13.829 µs 13.892 µs]
                        thrpt:  [351.49 GiB/s 353.08 GiB/s 354.68 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1024
                        time:   [13.548 µs 13.617 µs 13.692 µs]
                        thrpt:  [356.62 GiB/s 358.57 GiB/s 360.41 GiB/s]

feature/optimized "simd"

construct/PoV: 131072 Chunks: 1023
                        time:   [617.80 µs 618.19 µs 618.52 µs]
                        thrpt:  [202.09 MiB/s 202.20 MiB/s 202.33 MiB/s]
construct/PoV: 131072 Chunks: 1024
                        time:   [509.04 µs 509.35 µs 509.51 µs]
                        thrpt:  [245.34 MiB/s 245.41 MiB/s 245.56 MiB/s]
construct/PoV: 1048576 Chunks: 1023
                        time:   [3.7070 ms 3.7095 ms 3.7114 ms]
                        thrpt:  [269.44 MiB/s 269.58 MiB/s 269.76 MiB/s]
construct/PoV: 1048576 Chunks: 1024
                        time:   [3.5996 ms 3.6012 ms 3.6029 ms]
                        thrpt:  [277.56 MiB/s 277.69 MiB/s 277.81 MiB/s]
construct/PoV: 5242880 Chunks: 1023
                        time:   [18.909 ms 18.966 ms 19.050 ms]
                        thrpt:  [262.46 MiB/s 263.62 MiB/s 264.43 MiB/s]
construct/PoV: 5242880 Chunks: 1024
                        time:   [20.120 ms 20.139 ms 20.154 ms]
                        thrpt:  [248.09 MiB/s 248.27 MiB/s 248.51 MiB/s]

reconstruct_regular/PoV: 131072 Chunks: 1023
                        time:   [941.99 µs 944.66 µs 947.65 µs]
                        thrpt:  [131.90 MiB/s 132.32 MiB/s 132.70 MiB/s]
reconstruct_regular/PoV: 131072 Chunks: 1024
                        time:   [933.19 µs 937.18 µs 940.97 µs]
                        thrpt:  [132.84 MiB/s 133.38 MiB/s 133.95 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1023
                        time:   [5.3582 ms 5.3729 ms 5.3848 ms]
                        thrpt:  [185.71 MiB/s 186.12 MiB/s 186.63 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1024
                        time:   [5.1137 ms 5.1234 ms 5.1289 ms]
                        thrpt:  [194.97 MiB/s 195.18 MiB/s 195.55 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1023
                        time:   [28.670 ms 28.712 ms 28.743 ms]
                        thrpt:  [173.96 MiB/s 174.14 MiB/s 174.40 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1024
                        time:   [29.408 ms 29.455 ms 29.499 ms]
                        thrpt:  [169.50 MiB/s 169.75 MiB/s 170.02 MiB/s]

reconstruct_systematic/PoV: 131072 Chunks: 1023
                        time:   [4.4606 µs 4.4706 µs 4.4823 µs]
                        thrpt:  [27.234 GiB/s 27.305 GiB/s 27.366 GiB/s]
reconstruct_systematic/PoV: 131072 Chunks: 1024
                        time:   [3.5982 µs 3.6523 µs 3.7379 µs]
                        thrpt:  [32.658 GiB/s 33.423 GiB/s 33.925 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1023
                        time:   [40.327 µs 40.892 µs 41.366 µs]
                        thrpt:  [23.608 GiB/s 23.881 GiB/s 24.216 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1024
                        time:   [44.721 µs 45.411 µs 45.861 µs]
                        thrpt:  [21.294 GiB/s 21.505 GiB/s 21.837 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1023
                        time:   [213.77 µs 214.14 µs 214.76 µs]
                        thrpt:  [22.736 GiB/s 22.802 GiB/s 22.842 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1024
                        time:   [204.88 µs 207.04 µs 209.81 µs]
                        thrpt:  [23.272 GiB/s 23.584 GiB/s 23.832 GiB/s]

merklize/PoV: 131072 Chunks: 1023
                        time:   [574.87 µs 579.48 µs 587.38 µs]
                        thrpt:  [212.81 MiB/s 215.71 MiB/s 217.44 MiB/s]
merklize/PoV: 131072 Chunks: 1024
                        time:   [485.66 µs 486.89 µs 488.34 µs]
                        thrpt:  [255.97 MiB/s 256.73 MiB/s 257.38 MiB/s]
merklize/PoV: 1048576 Chunks: 1023
                        time:   [2.6842 ms 2.6933 ms 2.7004 ms]
                        thrpt:  [370.31 MiB/s 371.30 MiB/s 372.55 MiB/s]
merklize/PoV: 1048576 Chunks: 1024
                        time:   [2.6215 ms 2.6251 ms 2.6272 ms]
                        thrpt:  [380.63 MiB/s 380.94 MiB/s 381.46 MiB/s]
merklize/PoV: 5242880 Chunks: 1023
                        time:   [12.834 ms 12.862 ms 12.887 ms]
                        thrpt:  [388.00 MiB/s 388.73 MiB/s 389.58 MiB/s]
merklize/PoV: 5242880 Chunks: 1024
                        time:   [12.780 ms 12.800 ms 12.824 ms]
                        thrpt:  [389.88 MiB/s 390.64 MiB/s 391.23 MiB/s]

verify_chunk/PoV: 131072 Chunks: 1023
                        time:   [1.3788 µs 1.3822 µs 1.3851 µs]
                        thrpt:  [88.128 GiB/s 88.314 GiB/s 88.532 GiB/s]
verify_chunk/PoV: 131072 Chunks: 1024
                        time:   [1.3032 µs 1.3047 µs 1.3065 µs]
                        thrpt:  [93.431 GiB/s 93.559 GiB/s 93.672 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1023
                        time:   [3.2892 µs 3.3222 µs 3.3598 µs]
                        thrpt:  [290.66 GiB/s 293.95 GiB/s 296.90 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1024
                        time:   [3.1637 µs 3.1658 µs 3.1686 µs]
                        thrpt:  [308.20 GiB/s 308.47 GiB/s 308.68 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1023
                        time:   [12.013 µs 12.021 µs 12.029 µs]
                        thrpt:  [405.91 GiB/s 406.18 GiB/s 406.46 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1024
                        time:   [11.938 µs 11.949 µs 11.959 µs]
                        thrpt:  [408.29 GiB/s 408.64 GiB/s 409.02 GiB/s]

Comparison

b-const
b-rec-reg
b-rec-sys
b-merk
b-ver

@iceseer iceseer force-pushed the feature/optimization branch from 4502f99 to dd9ae45 Compare October 20, 2025 09:54
[feature] prefetches
[feature] parallel calculations
[feature] arena allocator
@iceseer iceseer force-pushed the feature/optimization branch from 6e48681 to 251b82a Compare October 21, 2025 08:44
@alexggh alexggh self-requested a review October 29, 2025 13:34
@alindima alindima self-requested a review October 31, 2025 09:07
Copy link

@alexggh alexggh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, thank you!

@cheme, @ordian, @arkpar any idea if this crate is used by any project ?

src/lib.rs Outdated
{
// Parallel version: create shards in parallel
(0..original_count as usize)
.into_par_iter()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I undestand we want to speed it up, but this is a bit unsafe in the context of polkadot-sdk because all this threads will eat CPU from other important task, could we at least bound it to a fix number of threads if it is not already.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, the number of threads is equal to the number of logical cores.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but that could be a bit problematic if this spawn threads eat time from things like PVF execution for other candidates, which is time-bound.

Can we make it configurable somehow, so that users of API aka polkadot-sdk can select how many threads it wants to use.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Alex that for the purposes of polkadot-sdk, we likely want the erasure-coding to be single threaded as the parallelism comes from running it in parallel for different PoVs - otherwise it might starve other important computation. So i would be in favor of removing the parallel feature altogether

Signed-off-by: iceseer <iceseer@gmail.com>
@iceseer iceseer requested a review from alexggh November 11, 2025 14:16
src/subshard.rs Outdated
Comment on lines +97 to +101
let mut result = Vec::with_capacity(segments.len());

for _ in 0..segments.len() {
result.push(Box::new([[0u8; SUBSHARD_SIZE]; TOTAL_SHARDS]));
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut result = Vec::with_capacity(segments.len());
for _ in 0..segments.len() {
result.push(Box::new([[0u8; SUBSHARD_SIZE]; TOTAL_SHARDS]));
}
let mut result = vec_no_clone![Box::new([[0u8; SUBSHARD_SIZE]; TOTAL_SHARDS]); segments.len()];
macro_rules! vec_no_clone {
  ($elem:expr; $n:expr) => ({
    let n = $n;
    let mut result = Vec::with_capacity(n);
    for _ in 0..n {
      result.push($elem);
    }
    result
  })
}

src/merklize.rs Outdated
.par_iter()
.map(|chunk| Hash::from(hash_fn(chunk)))
.collect::<Vec<_>>();
h.resize(target_size, Hash::default());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resize is called on both branches, may deduplicate

src/merklize.rs Outdated
#[cfg(not(feature = "parallel"))]
let hashes = {
let mut h = Vec::with_capacity(target_size);
for chunk in chunks.iter() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May simplify changes by replacing .collect() with .collect_with_capacity()

trait CollectWithCapacity<T> : Iterator<Item = T> {
    fn collect_with_capacity(self) -> Vec<T>;
}
impl<I: Iterator> CollectWithCapacity<I::Item> for I {
    fn collect_with_capacity(self) -> Vec<I::Item> {
        let mut result = Vec::with_capacity(self.size_hint().0);
        for x in self {
            result.push(x);
        }
        result
    }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_hint().0 is equal chunks_len. It can be less than target_size. Which is cuase additional rellocation in a case where target_size > chunks_len.

Copy link

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea if this crate is used by any project ?

i don't know if its used in polkajam anymore since i don't have access to private paritytech repos, but i am assuming it will be used for https://polkadot-fellows.github.io/RFCs/approved/0139-faster-erasure-coding.html

[target.x86_64-unknown-linux-gnu]
rustflags = [
"-C", "target-cpu=native",
"-C", "target-feature=+avx2,+fma",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to be careful about this - maybe distribute 2 versions of the binaries - optimized and unoptimized, otherwise it might trigger an illegal instruction on older hardware - or communicate clearly the expected hardware changes

src/lib.rs Outdated
{
// Parallel version: create shards in parallel
(0..original_count as usize)
.into_par_iter()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Alex that for the purposes of polkadot-sdk, we likely want the erasure-coding to be single threaded as the parallelism comes from running it in parallel for different PoVs - otherwise it might starve other important computation. So i would be in favor of removing the parallel feature altogether

Comment on lines +168 to +176
#[cfg(feature = "arena")]
{
construct_chunks_arena(n_chunks, data)
}

#[cfg(not(feature = "arena"))]
{
construct_chunks_default(n_chunks, data)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a fan of introducing features especially if they are not recommended - what are the expected gains here from the arena allocator?

Copy link
Author

@iceseer iceseer Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t really understand what kind of starvation you’re talking about. Processing the PoV involves too many different computations to say that it needs all of the CPU time. In the tests, CPU utilization doesn’t even reach 30% of a core for multiple nodes. Asynchronous IO calls, syscalls, etc. let the CPU sit idle, not to mention bubbles when reading from memory.

Even if at peak these computations overlap with some other computations and cause a local slowdown, in the overall workflow the performance gain will be greater than this local collision.

@alexggh
Copy link

alexggh commented Nov 24, 2025

any idea if this crate is used by any project ?

i don't know if its used in polkajam anymore since i don't have access to private paritytech repos, but i am assuming it will be used for https://polkadot-fellows.github.io/RFCs/approved/0139-faster-erasure-coding.html

yeah, that's what triggered this PR, I was jut looking to understand how hardened/used this crate is.

Signed-off-by: iceseer <iceseer@gmail.com>
Signed-off-by: iceseer <iceseer@gmail.com>
Signed-off-by: iceseer <iceseer@gmail.com>
Signed-off-by: iceseer <iceseer@gmail.com>
@iceseer iceseer requested review from ordian and turuslan December 1, 2025 10:35
Signed-off-by: iceseer <iceseer@gmail.com>
@arkpar
Copy link
Member

arkpar commented Dec 10, 2025

any idea if this crate is used by any project ?

i don't know if its used in polkajam anymore since i don't have access to private paritytech repos, but i am assuming it will be used for https://polkadot-fellows.github.io/RFCs/approved/0139-faster-erasure-coding.html

JAM does not use this crate. It uses reed-solomon-simd directly in a way that is compatible with this RFC I believe.

[fix] `num_threads` for parallel execution

Signed-off-by: iceseer <iceseer@gmail.com>
@iceseer iceseer requested a review from alexggh December 11, 2025 06:06
@iceseer
Copy link
Author

iceseer commented Dec 11, 2025

@alexggh I made thread pool local and cached and made num_threads for parallel execution as a function arg.

Signed-off-by: iceseer <iceseer@gmail.com>
Signed-off-by: Alexander Lednev <iceseer@gmail.com>
2.Added a mechanism for determining data boundaries

Signed-off-by: Alexander Lednev <iceseer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants