[feature] additional optimizations by iceseer · Pull Request #16 · paritytech/erasure-coding

iceseer · 2025-10-20T09:23:31Z

Features

[feature] parallel calculations
[feature] branch prediction
[feature] prefetches
[feature] arena allocator

Additional options (features=)

parallel - enables most of the optimizations
arena - enables arena-allocator (at the moment, enabling this option is not recommended.)

The number of threads for parallel computing can be set using the RAYON_NUM_THREADS environment variable. By default, it is equal to the number of logical cores.

RAYON_NUM_THREADS=4 cargo build --features parallel,simd

Results

master

construct/PoV: 131072 Chunks: 1023
                        time:   [734.44 µs 735.22 µs 735.74 µs]
                        thrpt:  [169.90 MiB/s 170.02 MiB/s 170.20 MiB/s]
construct/PoV: 131072 Chunks: 1024
                        time:   [614.16 µs 616.57 µs 618.48 µs]
                        thrpt:  [202.11 MiB/s 202.74 MiB/s 203.53 MiB/s]
construct/PoV: 1048576 Chunks: 1023
                        time:   [4.1591 ms 4.1618 ms 4.1641 ms]
                        thrpt:  [240.15 MiB/s 240.28 MiB/s 240.44 MiB/s]
construct/PoV: 1048576 Chunks: 1024
                        time:   [4.0365 ms 4.0396 ms 4.0422 ms]
                        thrpt:  [247.39 MiB/s 247.55 MiB/s 247.74 MiB/s]
construct/PoV: 5242880 Chunks: 1023
                        time:   [20.881 ms 20.907 ms 20.943 ms]
                        thrpt:  [238.74 MiB/s 239.16 MiB/s 239.45 MiB/s]
construct/PoV: 5242880 Chunks: 1024
                        time:   [22.498 ms 22.520 ms 22.535 ms]
                        thrpt:  [221.88 MiB/s 222.03 MiB/s 222.24 MiB/s]

reconstruct_regular/PoV: 131072 Chunks: 1023
                        time:   [942.23 µs 944.27 µs 945.90 µs]
                        thrpt:  [132.15 MiB/s 132.38 MiB/s 132.66 MiB/s]
reconstruct_regular/PoV: 131072 Chunks: 1024
                        time:   [910.97 µs 912.27 µs 913.86 µs]
                        thrpt:  [136.78 MiB/s 137.02 MiB/s 137.22 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1023
                        time:   [5.3444 ms 5.3599 ms 5.3695 ms]
                        thrpt:  [186.24 MiB/s 186.57 MiB/s 187.11 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1024
                        time:   [5.3483 ms 5.3549 ms 5.3616 ms]
                        thrpt:  [186.51 MiB/s 186.75 MiB/s 186.97 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1023
                        time:   [28.427 ms 28.457 ms 28.476 ms]
                        thrpt:  [175.59 MiB/s 175.70 MiB/s 175.89 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1024
                        time:   [28.557 ms 28.575 ms 28.593 ms]
                        thrpt:  [174.87 MiB/s 174.98 MiB/s 175.09 MiB/s]

reconstruct_systematic/PoV: 131072 Chunks: 1023
                        time:   [3.8896 µs 3.8971 µs 3.9025 µs]
                        thrpt:  [31.280 GiB/s 31.324 GiB/s 31.384 GiB/s]
reconstruct_systematic/PoV: 131072 Chunks: 1024
                        time:   [3.6804 µs 3.6950 µs 3.7231 µs]
                        thrpt:  [32.787 GiB/s 33.037 GiB/s 33.168 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1023
                        time:   [40.176 µs 40.293 µs 40.367 µs]
                        thrpt:  [24.192 GiB/s 24.237 GiB/s 24.307 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1024
                        time:   [40.882 µs 41.266 µs 41.478 µs]
                        thrpt:  [23.544 GiB/s 23.665 GiB/s 23.887 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1023
                        time:   [202.30 µs 203.64 µs 204.52 µs]
                        thrpt:  [23.875 GiB/s 23.978 GiB/s 24.136 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1024
                        time:   [200.78 µs 201.15 µs 201.55 µs]
                        thrpt:  [24.226 GiB/s 24.275 GiB/s 24.320 GiB/s]

merklize/PoV: 131072 Chunks: 1023
                        time:   [747.82 µs 749.57 µs 751.27 µs]
                        thrpt:  [166.38 MiB/s 166.76 MiB/s 167.15 MiB/s]
merklize/PoV: 131072 Chunks: 1024
                        time:   [622.28 µs 623.07 µs 623.77 µs]
                        thrpt:  [200.39 MiB/s 200.62 MiB/s 200.87 MiB/s]
merklize/PoV: 1048576 Chunks: 1023
                        time:   [3.3391 ms 3.3513 ms 3.3614 ms]
                        thrpt:  [297.50 MiB/s 298.40 MiB/s 299.48 MiB/s]
merklize/PoV: 1048576 Chunks: 1024
                        time:   [3.2344 ms 3.2420 ms 3.2498 ms]
                        thrpt:  [307.71 MiB/s 308.46 MiB/s 309.18 MiB/s]
merklize/PoV: 5242880 Chunks: 1023
                        time:   [15.130 ms 15.152 ms 15.166 ms]
                        thrpt:  [329.68 MiB/s 330.00 MiB/s 330.47 MiB/s]
merklize/PoV: 5242880 Chunks: 1024
                        time:   [15.034 ms 15.066 ms 15.090 ms]
                        thrpt:  [331.34 MiB/s 331.88 MiB/s 332.57 MiB/s]

verify_chunk/PoV: 131072 Chunks: 1023
                        time:   [1.8991 µs 1.9013 µs 1.9046 µs]
                        thrpt:  [64.093 GiB/s 64.204 GiB/s 64.276 GiB/s]
verify_chunk/PoV: 131072 Chunks: 1024
                        time:   [1.7756 µs 1.7800 µs 1.7852 µs]
                        thrpt:  [68.380 GiB/s 68.580 GiB/s 68.750 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1023
                        time:   [4.2870 µs 4.3015 µs 4.3138 µs]
                        thrpt:  [226.38 GiB/s 227.03 GiB/s 227.80 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1024
                        time:   [4.1736 µs 4.1893 µs 4.2050 µs]
                        thrpt:  [232.24 GiB/s 233.11 GiB/s 233.99 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1023
                        time:   [15.244 µs 15.272 µs 15.298 µs]
                        thrpt:  [319.18 GiB/s 319.73 GiB/s 320.31 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1024
                        time:   [15.069 µs 15.114 µs 15.170 µs]
                        thrpt:  [321.88 GiB/s 323.07 GiB/s 324.02 GiB/s]

feature/optimized "simd,parallel"

construct/PoV: 131072 Chunks: 1023
                        time:   [510.44 µs 518.70 µs 523.39 µs]
                        thrpt:  [238.83 MiB/s 240.99 MiB/s 244.89 MiB/s]
construct/PoV: 131072 Chunks: 1024
                        time:   [491.78 µs 499.87 µs 505.45 µs]
                        thrpt:  [247.30 MiB/s 250.06 MiB/s 254.18 MiB/s]
construct/PoV: 1048576 Chunks: 1023
                        time:   [3.8102 ms 4.0263 ms 4.2182 ms]
                        thrpt:  [237.07 MiB/s 248.37 MiB/s 262.45 MiB/s]
construct/PoV: 1048576 Chunks: 1024
                        time:   [3.3056 ms 3.4247 ms 3.5951 ms]
                        thrpt:  [278.16 MiB/s 291.99 MiB/s 302.52 MiB/s]
construct/PoV: 5242880 Chunks: 1023
                        time:   [11.770 ms 12.076 ms 12.341 ms]
                        thrpt:  [405.16 MiB/s 414.05 MiB/s 424.81 MiB/s]
construct/PoV: 5242880 Chunks: 1024
                        time:   [10.573 ms 10.729 ms 10.889 ms]
                        thrpt:  [459.17 MiB/s 466.01 MiB/s 472.90 MiB/s]

reconstruct_regular/PoV: 131072 Chunks: 1023
                        time:   [1.1142 ms 1.1174 ms 1.1207 ms]
                        thrpt:  [111.54 MiB/s 111.86 MiB/s 112.19 MiB/s]
reconstruct_regular/PoV: 131072 Chunks: 1024
                        time:   [1.0434 ms 1.0461 ms 1.0485 ms]
                        thrpt:  [119.22 MiB/s 119.50 MiB/s 119.81 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1023
                        time:   [5.7017 ms 5.7171 ms 5.7359 ms]
                        thrpt:  [174.34 MiB/s 174.92 MiB/s 175.39 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1024
                        time:   [5.9081 ms 5.9204 ms 5.9383 ms]
                        thrpt:  [168.40 MiB/s 168.91 MiB/s 169.26 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1023
                        time:   [30.485 ms 30.520 ms 30.559 ms]
                        thrpt:  [163.62 MiB/s 163.83 MiB/s 164.01 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1024
                        time:   [30.348 ms 30.410 ms 30.463 ms]
                        thrpt:  [164.13 MiB/s 164.42 MiB/s 164.76 MiB/s]

reconstruct_systematic/PoV: 131072 Chunks: 1023
                        time:   [4.4633 µs 4.4745 µs 4.4908 µs]
                        thrpt:  [27.183 GiB/s 27.281 GiB/s 27.350 GiB/s]
reconstruct_systematic/PoV: 131072 Chunks: 1024
                        time:   [4.2831 µs 4.2918 µs 4.3010 µs]
                        thrpt:  [28.382 GiB/s 28.442 GiB/s 28.500 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1023
                        time:   [44.509 µs 44.593 µs 44.663 µs]
                        thrpt:  [21.865 GiB/s 21.899 GiB/s 21.941 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1024
                        time:   [43.186 µs 43.294 µs 43.375 µs]
                        thrpt:  [22.515 GiB/s 22.557 GiB/s 22.613 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1023
                        time:   [225.61 µs 226.08 µs 226.63 µs]
                        thrpt:  [21.546 GiB/s 21.598 GiB/s 21.643 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1024
                        time:   [213.87 µs 214.43 µs 214.87 µs]
                        thrpt:  [22.724 GiB/s 22.771 GiB/s 22.831 GiB/s]

merklize/PoV: 131072 Chunks: 1023
                        time:   [449.20 µs 452.96 µs 455.20 µs]
                        thrpt:  [274.60 MiB/s 275.97 MiB/s 278.27 MiB/s]
merklize/PoV: 131072 Chunks: 1024
                        time:   [452.59 µs 460.99 µs 469.91 µs]
                        thrpt:  [266.01 MiB/s 271.15 MiB/s 276.19 MiB/s]
merklize/PoV: 1048576 Chunks: 1023
                        time:   [1.1936 ms 1.2028 ms 1.2104 ms]
                        thrpt:  [826.17 MiB/s 831.42 MiB/s 837.79 MiB/s]
merklize/PoV: 1048576 Chunks: 1024
                        time:   [1.1964 ms 1.2112 ms 1.2219 ms]
                        thrpt:  [818.39 MiB/s 825.64 MiB/s 835.82 MiB/s]
merklize/PoV: 5242880 Chunks: 1023
                        time:   [3.9726 ms 4.0436 ms 4.1057 ms]
                        thrpt:  [1.1893 GiB/s 1.2075 GiB/s 1.2291 GiB/s]
merklize/PoV: 5242880 Chunks: 1024
                        time:   [3.8041 ms 3.8699 ms 3.9361 ms]
                        thrpt:  [1.2405 GiB/s 1.2617 GiB/s 1.2836 GiB/s]

verify_chunk/PoV: 131072 Chunks: 1023
                        time:   [1.6454 µs 1.6549 µs 1.6701 µs]
                        thrpt:  [73.093 GiB/s 73.762 GiB/s 74.191 GiB/s]
verify_chunk/PoV: 131072 Chunks: 1024
                        time:   [1.4936 µs 1.5016 µs 1.5135 µs]
                        thrpt:  [80.653 GiB/s 81.294 GiB/s 81.730 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1023
                        time:   [3.8267 µs 3.8389 µs 3.8525 µs]
                        thrpt:  [253.48 GiB/s 254.38 GiB/s 255.20 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1024
                        time:   [3.6486 µs 3.6626 µs 3.6818 µs]
                        thrpt:  [265.24 GiB/s 266.63 GiB/s 267.66 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1023
                        time:   [13.767 µs 13.829 µs 13.892 µs]
                        thrpt:  [351.49 GiB/s 353.08 GiB/s 354.68 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1024
                        time:   [13.548 µs 13.617 µs 13.692 µs]
                        thrpt:  [356.62 GiB/s 358.57 GiB/s 360.41 GiB/s]

feature/optimized "simd"

construct/PoV: 131072 Chunks: 1023
                        time:   [617.80 µs 618.19 µs 618.52 µs]
                        thrpt:  [202.09 MiB/s 202.20 MiB/s 202.33 MiB/s]
construct/PoV: 131072 Chunks: 1024
                        time:   [509.04 µs 509.35 µs 509.51 µs]
                        thrpt:  [245.34 MiB/s 245.41 MiB/s 245.56 MiB/s]
construct/PoV: 1048576 Chunks: 1023
                        time:   [3.7070 ms 3.7095 ms 3.7114 ms]
                        thrpt:  [269.44 MiB/s 269.58 MiB/s 269.76 MiB/s]
construct/PoV: 1048576 Chunks: 1024
                        time:   [3.5996 ms 3.6012 ms 3.6029 ms]
                        thrpt:  [277.56 MiB/s 277.69 MiB/s 277.81 MiB/s]
construct/PoV: 5242880 Chunks: 1023
                        time:   [18.909 ms 18.966 ms 19.050 ms]
                        thrpt:  [262.46 MiB/s 263.62 MiB/s 264.43 MiB/s]
construct/PoV: 5242880 Chunks: 1024
                        time:   [20.120 ms 20.139 ms 20.154 ms]
                        thrpt:  [248.09 MiB/s 248.27 MiB/s 248.51 MiB/s]

reconstruct_regular/PoV: 131072 Chunks: 1023
                        time:   [941.99 µs 944.66 µs 947.65 µs]
                        thrpt:  [131.90 MiB/s 132.32 MiB/s 132.70 MiB/s]
reconstruct_regular/PoV: 131072 Chunks: 1024
                        time:   [933.19 µs 937.18 µs 940.97 µs]
                        thrpt:  [132.84 MiB/s 133.38 MiB/s 133.95 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1023
                        time:   [5.3582 ms 5.3729 ms 5.3848 ms]
                        thrpt:  [185.71 MiB/s 186.12 MiB/s 186.63 MiB/s]
reconstruct_regular/PoV: 1048576 Chunks: 1024
                        time:   [5.1137 ms 5.1234 ms 5.1289 ms]
                        thrpt:  [194.97 MiB/s 195.18 MiB/s 195.55 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1023
                        time:   [28.670 ms 28.712 ms 28.743 ms]
                        thrpt:  [173.96 MiB/s 174.14 MiB/s 174.40 MiB/s]
reconstruct_regular/PoV: 5242880 Chunks: 1024
                        time:   [29.408 ms 29.455 ms 29.499 ms]
                        thrpt:  [169.50 MiB/s 169.75 MiB/s 170.02 MiB/s]

reconstruct_systematic/PoV: 131072 Chunks: 1023
                        time:   [4.4606 µs 4.4706 µs 4.4823 µs]
                        thrpt:  [27.234 GiB/s 27.305 GiB/s 27.366 GiB/s]
reconstruct_systematic/PoV: 131072 Chunks: 1024
                        time:   [3.5982 µs 3.6523 µs 3.7379 µs]
                        thrpt:  [32.658 GiB/s 33.423 GiB/s 33.925 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1023
                        time:   [40.327 µs 40.892 µs 41.366 µs]
                        thrpt:  [23.608 GiB/s 23.881 GiB/s 24.216 GiB/s]
reconstruct_systematic/PoV: 1048576 Chunks: 1024
                        time:   [44.721 µs 45.411 µs 45.861 µs]
                        thrpt:  [21.294 GiB/s 21.505 GiB/s 21.837 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1023
                        time:   [213.77 µs 214.14 µs 214.76 µs]
                        thrpt:  [22.736 GiB/s 22.802 GiB/s 22.842 GiB/s]
reconstruct_systematic/PoV: 5242880 Chunks: 1024
                        time:   [204.88 µs 207.04 µs 209.81 µs]
                        thrpt:  [23.272 GiB/s 23.584 GiB/s 23.832 GiB/s]

merklize/PoV: 131072 Chunks: 1023
                        time:   [574.87 µs 579.48 µs 587.38 µs]
                        thrpt:  [212.81 MiB/s 215.71 MiB/s 217.44 MiB/s]
merklize/PoV: 131072 Chunks: 1024
                        time:   [485.66 µs 486.89 µs 488.34 µs]
                        thrpt:  [255.97 MiB/s 256.73 MiB/s 257.38 MiB/s]
merklize/PoV: 1048576 Chunks: 1023
                        time:   [2.6842 ms 2.6933 ms 2.7004 ms]
                        thrpt:  [370.31 MiB/s 371.30 MiB/s 372.55 MiB/s]
merklize/PoV: 1048576 Chunks: 1024
                        time:   [2.6215 ms 2.6251 ms 2.6272 ms]
                        thrpt:  [380.63 MiB/s 380.94 MiB/s 381.46 MiB/s]
merklize/PoV: 5242880 Chunks: 1023
                        time:   [12.834 ms 12.862 ms 12.887 ms]
                        thrpt:  [388.00 MiB/s 388.73 MiB/s 389.58 MiB/s]
merklize/PoV: 5242880 Chunks: 1024
                        time:   [12.780 ms 12.800 ms 12.824 ms]
                        thrpt:  [389.88 MiB/s 390.64 MiB/s 391.23 MiB/s]

verify_chunk/PoV: 131072 Chunks: 1023
                        time:   [1.3788 µs 1.3822 µs 1.3851 µs]
                        thrpt:  [88.128 GiB/s 88.314 GiB/s 88.532 GiB/s]
verify_chunk/PoV: 131072 Chunks: 1024
                        time:   [1.3032 µs 1.3047 µs 1.3065 µs]
                        thrpt:  [93.431 GiB/s 93.559 GiB/s 93.672 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1023
                        time:   [3.2892 µs 3.3222 µs 3.3598 µs]
                        thrpt:  [290.66 GiB/s 293.95 GiB/s 296.90 GiB/s]
verify_chunk/PoV: 1048576 Chunks: 1024
                        time:   [3.1637 µs 3.1658 µs 3.1686 µs]
                        thrpt:  [308.20 GiB/s 308.47 GiB/s 308.68 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1023
                        time:   [12.013 µs 12.021 µs 12.029 µs]
                        thrpt:  [405.91 GiB/s 406.18 GiB/s 406.46 GiB/s]
verify_chunk/PoV: 5242880 Chunks: 1024
                        time:   [11.938 µs 11.949 µs 11.959 µs]
                        thrpt:  [408.29 GiB/s 408.64 GiB/s 409.02 GiB/s]

Comparison

[feature] prefetches [feature] parallel calculations [feature] arena allocator

alexggh

Good work, thank you!

@cheme, @ordian, @arkpar any idea if this crate is used by any project ?

src/lib.rs

alexggh · 2025-11-06T14:07:17Z

src/lib.rs

+	{
+		// Parallel version: create shards in parallel
+		(0..original_count as usize)
+			.into_par_iter()


I undestand we want to speed it up, but this is a bit unsafe in the context of polkadot-sdk because all this threads will eat CPU from other important task, could we at least bound it to a fix number of threads if it is not already.

By default, the number of threads is equal to the number of logical cores.

Yeah, but that could be a bit problematic if this spawn threads eat time from things like PVF execution for other candidates, which is time-bound.

Can we make it configurable somehow, so that users of API aka polkadot-sdk can select how many threads it wants to use.

I agree with Alex that for the purposes of polkadot-sdk, we likely want the erasure-coding to be single threaded as the parallelism comes from running it in parallel for different PoVs - otherwise it might starve other important computation. So i would be in favor of removing the parallel feature altogether

Signed-off-by: iceseer <iceseer@gmail.com>

turuslan · 2025-11-12T06:21:24Z

src/subshard.rs

+		let mut result = Vec::with_capacity(segments.len());
+
+		for _ in 0..segments.len() {
+			result.push(Box::new([[0u8; SUBSHARD_SIZE]; TOTAL_SHARDS]));
+		}


Suggested change

let mut result = Vec::with_capacity(segments.len());

for _ in 0..segments.len() {

result.push(Box::new([[0u8; SUBSHARD_SIZE]; TOTAL_SHARDS]));

}

let mut result = vec_no_clone![Box::new([[0u8; SUBSHARD_SIZE]; TOTAL_SHARDS]); segments.len()];

macro_rules! vec_no_clone { ($elem:expr; $n:expr) => ({ let n = $n; let mut result = Vec::with_capacity(n); for _ in 0..n { result.push($elem); } result }) }

turuslan · 2025-11-12T07:19:27Z

src/merklize.rs

+				.par_iter()
+				.map(|chunk| Hash::from(hash_fn(chunk)))
+				.collect::<Vec<_>>();
+			h.resize(target_size, Hash::default());


resize is called on both branches, may deduplicate

turuslan · 2025-11-12T07:34:49Z

src/merklize.rs

+		#[cfg(not(feature = "parallel"))]
+		let hashes = {
+			let mut h = Vec::with_capacity(target_size);
+			for chunk in chunks.iter() {


May simplify changes by replacing .collect() with .collect_with_capacity()

trait CollectWithCapacity<T> : Iterator<Item = T> { fn collect_with_capacity(self) -> Vec<T>; } impl<I: Iterator> CollectWithCapacity<I::Item> for I { fn collect_with_capacity(self) -> Vec<I::Item> { let mut result = Vec::with_capacity(self.size_hint().0); for x in self { result.push(x); } result } }

size_hint().0 is equal chunks_len. It can be less than target_size. Which is cuase additional rellocation in a case where target_size > chunks_len.

src/merklize.rs

ordian

any idea if this crate is used by any project ?

i don't know if its used in polkajam anymore since i don't have access to private paritytech repos, but i am assuming it will be used for https://polkadot-fellows.github.io/RFCs/approved/0139-faster-erasure-coding.html

Cargo.toml

ordian · 2025-11-24T03:49:54Z

.cargo/config.toml

+[target.x86_64-unknown-linux-gnu]
+rustflags = [
+    "-C", "target-cpu=native",
+    "-C", "target-feature=+avx2,+fma",


we need to be careful about this - maybe distribute 2 versions of the binaries - optimized and unoptimized, otherwise it might trigger an illegal instruction on older hardware - or communicate clearly the expected hardware changes

ordian · 2025-11-24T03:52:36Z

src/lib.rs

+	{
+		// Parallel version: create shards in parallel
+		(0..original_count as usize)
+			.into_par_iter()


I agree with Alex that for the purposes of polkadot-sdk, we likely want the erasure-coding to be single threaded as the parallelism comes from running it in parallel for different PoVs - otherwise it might starve other important computation. So i would be in favor of removing the parallel feature altogether

ordian · 2025-11-24T03:55:16Z

src/lib.rs

+	#[cfg(feature = "arena")]
+	{
+		construct_chunks_arena(n_chunks, data)
+	}
+
+	#[cfg(not(feature = "arena"))]
+	{
+		construct_chunks_default(n_chunks, data)
+	}


not a fan of introducing features especially if they are not recommended - what are the expected gains here from the arena allocator?

I don’t really understand what kind of starvation you’re talking about. Processing the PoV involves too many different computations to say that it needs all of the CPU time. In the tests, CPU utilization doesn’t even reach 30% of a core for multiple nodes. Asynchronous IO calls, syscalls, etc. let the CPU sit idle, not to mention bubbles when reading from memory.

Even if at peak these computations overlap with some other computations and cause a local slowdown, in the overall workflow the performance gain will be greater than this local collision.

alexggh · 2025-11-24T06:28:52Z

any idea if this crate is used by any project ?

i don't know if its used in polkajam anymore since i don't have access to private paritytech repos, but i am assuming it will be used for https://polkadot-fellows.github.io/RFCs/approved/0139-faster-erasure-coding.html

yeah, that's what triggered this PR, I was jut looking to understand how hardened/used this crate is.

Signed-off-by: iceseer <iceseer@gmail.com>

src/lib.rs

arkpar · 2025-12-10T12:35:44Z

any idea if this crate is used by any project ?

i don't know if its used in polkajam anymore since i don't have access to private paritytech repos, but i am assuming it will be used for https://polkadot-fellows.github.io/RFCs/approved/0139-faster-erasure-coding.html

JAM does not use this crate. It uses reed-solomon-simd directly in a way that is compatible with this RFC I believe.

[fix] `num_threads` for parallel execution Signed-off-by: iceseer <iceseer@gmail.com>

iceseer · 2025-12-11T06:08:38Z

@alexggh I made thread pool local and cached and made num_threads for parallel execution as a function arg.

src/lib.rs

Signed-off-by: iceseer <iceseer@gmail.com>

Signed-off-by: Alexander Lednev <iceseer@gmail.com>

2.Added a mechanism for determining data boundaries Signed-off-by: Alexander Lednev <iceseer@gmail.com>

iceseer force-pushed the feature/optimization branch from 4502f99 to dd9ae45 Compare October 20, 2025 09:54

[feature] branch prediction

251b82a

[feature] prefetches [feature] parallel calculations [feature] arena allocator

iceseer force-pushed the feature/optimization branch from 6e48681 to 251b82a Compare October 21, 2025 08:44

alexggh self-requested a review October 29, 2025 13:34

alindima self-requested a review October 31, 2025 09:07

alexggh reviewed Nov 6, 2025

View reviewed changes

fixup

0c602ce

Signed-off-by: iceseer <iceseer@gmail.com>

iceseer requested a review from alexggh November 11, 2025 14:16

turuslan reviewed Nov 12, 2025

View reviewed changes

turuslan approved these changes Nov 12, 2025

View reviewed changes

ordian reviewed Nov 24, 2025

View reviewed changes

iceseer added 4 commits November 28, 2025 18:18

issues fixup

1363394

Signed-off-by: iceseer <iceseer@gmail.com>

issues fixup

99d5f9a

Signed-off-by: iceseer <iceseer@gmail.com>

[feature] optional num threads for parallel computation

a508ef4

Signed-off-by: iceseer <iceseer@gmail.com>

formatting

4b27c8c

Signed-off-by: iceseer <iceseer@gmail.com>

iceseer requested review from ordian and turuslan December 1, 2025 10:35

turuslan approved these changes Dec 1, 2025

View reviewed changes

def. cores/2

0d58b0b

Signed-off-by: iceseer <iceseer@gmail.com>

turuslan approved these changes Dec 2, 2025

View reviewed changes

alexggh reviewed Dec 10, 2025

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

[fix] thread pool become local and cached

fdc7d00

[fix] `num_threads` for parallel execution Signed-off-by: iceseer <iceseer@gmail.com>

iceseer requested a review from alexggh December 11, 2025 06:06

turuslan reviewed Dec 11, 2025

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

[feature] ThreadMode

e8834e5

Signed-off-by: iceseer <iceseer@gmail.com>

turuslan approved these changes Dec 15, 2025

View reviewed changes

switch to https://crates.io/crates/reed-solomon-simd`

c1223b0

Signed-off-by: Alexander Lednev <iceseer@gmail.com>

1.Removed trailing zeros

f3b93ad

2.Added a mechanism for determining data boundaries Signed-off-by: Alexander Lednev <iceseer@gmail.com>

Conversation

iceseer commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Features

Additional options (features=)

Results

master

feature/optimized "simd,parallel"

feature/optimized "simd"

Comparison

Uh oh!

alexggh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ordian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iceseer Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexggh commented Nov 24, 2025

Uh oh!

Uh oh!

arkpar commented Dec 10, 2025

Uh oh!

iceseer commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

iceseer commented Oct 20, 2025 •

edited

Loading

iceseer Nov 24, 2025 •

edited

Loading