Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
1015096
parallel: in-house fixed thread pool; port eq_mle and sumcheck off rayon
TomWambsgans May 29, 2026
2c46cf9
sumcheck: port clean reductions to parallel::map_reduce
TomWambsgans May 29, 2026
aa773e0
parallel/sumcheck: per-worker scratch in sumcheck_compute_core
TomWambsgans May 29, 2026
56d2a59
parallel: lock-free spin-then-park dispatch (replaces std::Barrier)
TomWambsgans May 29, 2026
bde9674
parallel: nested dispatch falls back to sequential (no deadlock)
TomWambsgans May 29, 2026
9d67ff9
parallel: port field/poly/sumcheck off rayon (evals, utils, product_c…
TomWambsgans May 29, 2026
b19131a
parallel: port symetric merkle + fiat-shamir grinding off rayon
TomWambsgans May 29, 2026
dc3b813
parallel: port whir (dft/merkle/open/utils) off rayon
TomWambsgans May 29, 2026
f5dd79c
parallel: port sub_protocols (logup/gkr/air_sumcheck) off rayon; re-e…
TomWambsgans May 29, 2026
c861081
parallel: port lean_vm/lean_prover/xmss/rec_aggregation/lean_compiler…
TomWambsgans May 29, 2026
ad94319
parallel: teardown rayon — drop deps, flush_rayon hack, backend re-ex…
TomWambsgans May 29, 2026
60eb3c2
parallel: guided self-scheduling + flatten dft butterflies + chunk pr…
TomWambsgans May 29, 2026
258c38e
parallel: range-based core dispatch (for_each_chunk)
TomWambsgans May 29, 2026
36f7412
parallel: lower spin limit to 2^12 to reduce variance during sequenti…
TomWambsgans May 29, 2026
da01c45
sumcheck: product reductions use in-place per-worker accumulation
TomWambsgans May 29, 2026
acdf8ac
gitignore: ignore sibling /target-* build dirs and /.lm-* scratch bin…
TomWambsgans May 29, 2026
e638df9
lean_vm: in-place get_slice_into for Poseidon inputs (avoid ~100K Vec…
TomWambsgans May 29, 2026
22c66f7
alloc: replace zk-alloc bump arena with tuned mimalloc
TomWambsgans May 29, 2026
29bb74b
poseidon: stack-allocate per-row column pointer arrays in trace gen
TomWambsgans May 29, 2026
c7af020
whir: stack-allocate single-element dimensions array in verify
TomWambsgans May 29, 2026
6eabeef
sumcheck: reuse per-worker point/rows scratch in fold+split-eq comput…
TomWambsgans May 29, 2026
c594a79
poseidon: bind fixed-size array ref to elide per-row bounds checks in…
TomWambsgans May 29, 2026
de4a5e4
Merge branch 'main' into remove-small-allocs
TomWambsgans May 29, 2026
dc884da
Merge branch 'remove-small-allocs' into custom-thread-pool
TomWambsgans May 29, 2026
96c0ee9
alloc: disable THP for the prover (fixes ~50% intermittent slowdown o…
TomWambsgans May 29, 2026
cb39573
parallel: centralize per-worker-slot unsafe; tidy dispatch
TomWambsgans May 29, 2026
c7d6883
parallel: centralize SendPtr + chunk-size heuristic across the workspace
TomWambsgans May 29, 2026
c5e90c8
eq_mle: trim the duplicated 10-line packing-width caveat to 2 lines
TomWambsgans May 29, 2026
d3fdad1
eq_mle: extract shared par_eval_eq tail from the compute_eval_eq vari…
TomWambsgans May 29, 2026
39d2cf9
sumcheck: tighten build_evals body to a one-line map_or
TomWambsgans May 29, 2026
b6d26c2
fmt
TomWambsgans May 29, 2026
8be26e5
lean_compiler: collapse if into match guard (clippy -Dwarnings)
TomWambsgans May 29, 2026
3df0314
Merge branch 'main' into custom-thread-pool
TomWambsgans May 29, 2026
5313b1a
simplify
TomWambsgans May 30, 2026
a9e53ea
remove tests
TomWambsgans May 30, 2026
e27782f
muucch faster prepare_evals_for_fft_unpacked
TomWambsgans May 30, 2026
3c2239c
Merge remote-tracking branch 'origin/main' into custom-thread-pool
TomWambsgans May 30, 2026
4bddf8d
simplify
TomWambsgans May 30, 2026
72d8198
perf: faster `prepare_evals_for_fft_packed_extension`
TomWambsgans May 30, 2026
ad84254
forbid nested parallelism
TomWambsgans May 30, 2026
b89e978
Merge branch 'main' into custom-thread-pool
TomWambsgans May 30, 2026
6b3e6e9
Merge branch 'main' into custom-thread-pool
TomWambsgans May 30, 2026
4abdbe3
Merge branch 'main' into custom-thread-pool
TomWambsgans May 30, 2026
34bf194
improve parallele lib.rs
TomWambsgans May 30, 2026
63cfa7b
wip
TomWambsgans May 30, 2026
2467601
fix: forbidden nested parallel dispatch in XMSS benchmark cache gener…
TomWambsgans May 31, 2026
a152f02
fix: forbidden nested parallel dispatch in XMSS benchmark cache gener…
TomWambsgans May 31, 2026
bad57a0
Merge branch 'main' into custom-thread-pool
TomWambsgans May 31, 2026
52eba27
Merge branch 'main' into custom-thread-pool
TomWambsgans May 31, 2026
c2b77b0
replace mimalloc by smalloc
TomWambsgans Jun 2, 2026
5ea02c0
Merge branch 'custom-thread-pool-with-zkalloc' into custom-thread-pool
TomWambsgans Jun 2, 2026
ba681d0
Merge branch 'main' into custom-thread-pool
TomWambsgans Jun 2, 2026
87f4205
cleaning + handle panics gracefully in parallel crate
TomWambsgans Jun 2, 2026
ef6b00f
wip
TomWambsgans Jun 2, 2026
4a24bc7
fix perf
TomWambsgans Jun 2, 2026
7b7e1ba
clean
TomWambsgans Jun 3, 2026
583eead
wip
TomWambsgans Jun 3, 2026
f69feb0
remove "Disable Transparent Huge Pages"
TomWambsgans Jun 3, 2026
d603440
wip
TomWambsgans Jun 3, 2026
be7ca95
par_map_collect
TomWambsgans Jun 3, 2026
19210e2
wip
TomWambsgans Jun 3, 2026
83b9c4d
Merge branch 'main' into custom-thread-pool
TomWambsgans Jun 4, 2026
899187f
wip
TomWambsgans Jun 4, 2026
da59c36
wip
TomWambsgans Jun 4, 2026
e992569
par_for_each_mut2
TomWambsgans Jun 4, 2026
a5f578d
simplify
TomWambsgans Jun 4, 2026
9c290e9
wip
TomWambsgans Jun 4, 2026
2b47175
wip
TomWambsgans Jun 4, 2026
260f818
wip
TomWambsgans Jun 4, 2026
0adbef2
wip
TomWambsgans Jun 4, 2026
8d388ba
wip
TomWambsgans Jun 4, 2026
2f69524
parallel: chunk par_map_collect over recommended_chunk_size
TomWambsgans Jun 4, 2026
8dd5f7b
allocator_api2
TomWambsgans Jun 4, 2026
3b4d531
air_sumcheck: revert a5f578dd's par_map_collect swap (thin-LTO inlini…
TomWambsgans Jun 4, 2026
a94fbfd
Revert "parallel: chunk par_map_collect over recommended_chunk_size"
TomWambsgans Jun 4, 2026
8cb6b20
wip
TomWambsgans Jun 5, 2026
6dca18b
faster combine_statement
TomWambsgans Jun 5, 2026
b7be815
Merge branch 'main' into custom-thread-pool
TomWambsgans Jun 5, 2026
e48882d
Merge branch 'custom-thread-pool' into custom-thread-pool-without-glo…
TomWambsgans Jun 5, 2026
4e681c4
wip
TomWambsgans Jun 5, 2026
70d2882
update benchmarks
TomWambsgans Jun 5, 2026
d082b6e
Merge branch 'custom-thread-pool' into custom-thread-pool-without-glo…
TomWambsgans Jun 5, 2026
dd152f0
Merge branch 'main' into custom-thread-pool-without-global-allocator
TomWambsgans Jun 5, 2026
88c2fb7
koala-bear: avoid `[T;N]::map` in packed quintic arithmetic (thin-LTO…
TomWambsgans Jun 5, 2026
f08b13b
Merge branch 'main' into custom-thread-pool-without-global-allocator
TomWambsgans Jun 6, 2026
e736aad
Merge branch 'custom-thread-pool-without-global-allocator' of https:/…
TomWambsgans Jun 6, 2026
513039f
remove global allocator
TomWambsgans Jun 6, 2026
7b8c170
wip
TomWambsgans Jun 6, 2026
05a9f1e
wip
TomWambsgans Jun 7, 2026
cf16e2d
Merge branch 'main' into custom-thread-pool-without-global-allocator
TomWambsgans Jun 7, 2026
bd48d80
much better API (no more zkalloc stuff to handle for users)
TomWambsgans Jun 7, 2026
29dec71
Merge remote-tracking branch 'origin/HEAD' into custom-thread-pool-wi…
TomWambsgans Jun 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 1 addition & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,6 @@ include_dir = "0.7"

[features]
prox-gaps-conjecture = ["rec_aggregation/prox-gaps-conjecture"]
# Build with the plain system allocator instead of zk-alloc (for comparison/debugging).
standard-alloc = ["rec_aggregation/standard-alloc"]

[dependencies]
clap.workspace = true
Expand All @@ -102,3 +100,4 @@ system-info.workspace = true

[profile.release]
lto = "thin"
codegen-units = 1
1 change: 1 addition & 0 deletions crates/backend/field/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ edition.workspace = true

[dependencies]
utils = { path = "../utils", package = "utils" }
zk-alloc.workspace = true

itertools.workspace = true
num-bigint = "*"
Expand Down
25 changes: 25 additions & 0 deletions crates/backend/field/src/field.rs
Original file line number Diff line number Diff line change
Expand Up @@ -526,6 +526,31 @@ pub trait BasedVectorSpace<F: PrimeCharacteristicRing>: Sized {
/// different basis might have been used.
#[must_use]
fn reconstitute_from_base(vec: Vec<F>) -> Vec<Self>;

/// [`flatten_to_base`](Self::flatten_to_base) for `ArenaVec` proof buffers. Defaults to a layout
/// reinterpret; override alongside `flatten_to_base` if `Self` isn't `[F; DIMENSION]`.
///
/// # Safety
/// Same basis-portability caveat as [`flatten_to_base`](Self::flatten_to_base).
#[must_use]
fn flatten_to_base_in(vec: zk_alloc::ArenaVec<Self>) -> zk_alloc::ArenaVec<F> {
// SAFETY: `Self` is `[F; DIMENSION]` (the `flatten_to_base` contract); the const asserts reject mismatches.
unsafe { utils::flatten_to_base_in::<F, Self>(vec) }
}

/// [`reconstitute_from_base`](Self::reconstitute_from_base) for `ArenaVec` proof buffers (see
/// [`flatten_to_base_in`](Self::flatten_to_base_in)).
///
/// # Safety
/// Same basis-portability caveat as [`reconstitute_from_base`](Self::reconstitute_from_base).
#[must_use]
fn reconstitute_from_base_in(vec: zk_alloc::ArenaVec<F>) -> zk_alloc::ArenaVec<Self>
where
Self: Clone,
{
// SAFETY: as above.
unsafe { utils::reconstitute_from_base_in::<F, Self>(vec) }
}
}

impl<F: PrimeCharacteristicRing> BasedVectorSpace<F> for F {
Expand Down
26 changes: 15 additions & 11 deletions crates/backend/koala-bear/src/quintic_extension/packed_extension.rs
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ impl<F: Field, PF: PackedField<Scalar = F>> From<QuinticExtensionField<F>> for P
#[inline]
fn from(x: QuinticExtensionField<F>) -> Self {
Self {
value: x.value.map(Into::into),
value: array::from_fn(|i| x.value[i].into()),
}
}
}
Expand Down Expand Up @@ -117,10 +117,11 @@ macro_rules! impl_packed_ext_scalar_ops {
impl Mul<KoalaBear> for PackedQuinticExtensionField<KoalaBear, $pf> {
type Output = Self;
#[inline]
fn mul(self, rhs: KoalaBear) -> Self {
Self {
value: self.value.map(|x| x * rhs),
fn mul(mut self, rhs: KoalaBear) -> Self {
for v in &mut self.value {
*v *= rhs;
}
self
}
}

Expand Down Expand Up @@ -281,10 +282,12 @@ where
type Output = Self;

#[inline]
fn neg(self) -> Self {
Self {
value: self.value.map(PF::neg),
fn neg(mut self) -> Self {
// Loop, not `self.value.map(..)`: avoids a thin-LTO de-inlined `Wrapped` closure.
for v in &mut self.value {
*v = -*v;
}
self
}
}

Expand Down Expand Up @@ -478,7 +481,7 @@ where

#[inline(always)]
fn mul(self, rhs: QuinticExtensionField<F>) -> Self {
let b: [PF; 5] = rhs.value.map(|x| x.into());
let b: [PF; 5] = array::from_fn(|i| rhs.value[i].into());
Self {
value: super::extension::quintic_mul(&self.value, &b, PF::dot_product::<5>),
}
Expand All @@ -493,10 +496,11 @@ where
type Output = Self;

#[inline]
fn mul(self, rhs: PF) -> Self {
Self {
value: self.value.map(|x| x * rhs),
fn mul(mut self, rhs: PF) -> Self {
for v in &mut self.value {
*v *= rhs;
}
self
}
}

Expand Down
2 changes: 2 additions & 0 deletions crates/backend/poly/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ field = { path = "../field", package = "mt-field" }
utils = { path = "../utils", package = "utils" }
system-info.workspace = true
parallel.workspace = true
zk-alloc.workspace = true

tracing.workspace = true
itertools.workspace = true
rand.workspace = true
Expand Down
63 changes: 63 additions & 0 deletions crates/backend/poly/src/alloc.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
//! Explicit arena allocation for proof buffers.
//!
//! [`ArenaVec`] is [`zk_alloc::ArenaVec`]: an owning vector that bumps from the proving arena inside a
//! phase and falls back to the system allocator outside one. `Deref<Target = [T]>` lets it drop
//! into slice-based APIs unchanged.

pub use zk_alloc::{ArenaVec, PhaseGuard, enter_phase};

// Construct empty / pre-sized arena vectors with the inherent `ArenaVec::new()` and
// `ArenaVec::with_capacity()`. The helpers below cover the cases with no inherent equivalent
// (`vec![v; n]`, collect, `to_vec`, uninitialized, parallel fill).

/// Arena-backed `vec![value; n]`.
#[inline]
#[must_use]
pub fn arena_filled<T: Clone>(value: T, n: usize) -> ArenaVec<T> {
let mut v = ArenaVec::with_capacity(n);
v.resize(n, value);
v
}

/// Collect an iterator into an `ArenaVec`.
#[inline]
#[must_use]
pub fn arena_collect<T, I: IntoIterator<Item = T>>(iter: I) -> ArenaVec<T> {
let iter = iter.into_iter();
let mut v = ArenaVec::with_capacity(iter.size_hint().0);
v.extend(iter);
v
}

/// Arena-backed `slice.to_vec()`.
#[inline]
#[must_use]
pub fn arena_from_slice<T: Clone>(slice: &[T]) -> ArenaVec<T> {
let mut v = ArenaVec::with_capacity(slice.len());
v.extend_from_slice(slice);
v
}

/// Arena-backed [`uninitialized_vec`](crate::uninitialized_vec): `len` uninitialized slots.
///
/// # Safety
/// Every element must be overwritten before it is read.
#[inline]
#[must_use]
pub unsafe fn uninitialized_arena_vec<T>(len: usize) -> ArenaVec<T> {
let mut v = ArenaVec::with_capacity(len);
// SAFETY: caller guarantees all `len` slots are written before being read.
unsafe { v.set_len(len) };
v
}

/// Arena-backed parallel `(0..n).map(f).collect()`: fill an `ArenaVec` of length `n` in parallel.
/// The single allocation happens on the calling thread; workers write disjoint slots.
#[inline]
#[must_use]
pub fn arena_par_collect<T: Send, F: Fn(usize) -> T + Sync>(n: usize, f: F) -> ArenaVec<T> {
// SAFETY: `par_fill` writes every slot in `0..n` exactly once before any is read.
let mut v = unsafe { uninitialized_arena_vec(n) };
parallel::par_fill(&mut v, f);
v
}
18 changes: 8 additions & 10 deletions crates/backend/poly/src/evals.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ use crate::{EFPacking, PF};
use ::utils::log2_ceil_usize;
use field::{ExtensionField, Field, PrimeCharacteristicRing};
use itertools::Itertools;
use std::borrow::Borrow;

pub trait EvaluationsList<F: Field> {
fn num_variables(&self) -> usize;
fn num_evals(&self) -> usize;
Expand All @@ -14,30 +12,30 @@ pub trait EvaluationsList<F: Field> {
fn evaluate_sparse<EF: ExtensionField<F>>(&self, selector: usize, point: &MultilinearPoint<EF>) -> EF;
}

impl<F: Field, EL: Borrow<[F]>> EvaluationsList<F> for EL {
impl<F: Field, EL: AsRef<[F]>> EvaluationsList<F> for EL {
fn num_variables(&self) -> usize {
self.borrow().len().ilog2() as usize
self.as_ref().len().ilog2() as usize
}

fn num_evals(&self) -> usize {
self.borrow().len()
self.as_ref().len()
}

fn evaluate<EF: ExtensionField<F>>(&self, point: &MultilinearPoint<EF>) -> EF {
eval_multilinear::<_, _, true>(self.borrow(), point)
eval_multilinear::<_, _, true>(self.as_ref(), point)
}

fn evaluate_sequential<EF: ExtensionField<F>>(&self, point: &MultilinearPoint<EF>) -> EF {
eval_multilinear::<_, _, false>(self.borrow(), point)
eval_multilinear::<_, _, false>(self.as_ref(), point)
}

fn as_constant(&self) -> F {
assert_eq!(self.borrow().len(), 1);
self.borrow()[0]
assert_eq!(self.as_ref().len(), 1);
self.as_ref()[0]
}

fn evaluate_sparse<EF: ExtensionField<F>>(&self, selector: usize, point: &MultilinearPoint<EF>) -> EF {
(&self.borrow()[selector << point.len()..][..(1 << point.len())]).evaluate(point)
(&self.as_ref()[selector << point.len()..][..(1 << point.len())]).evaluate(point)
}
}

Expand Down
3 changes: 3 additions & 0 deletions crates/backend/poly/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#![cfg_attr(not(test), warn(unused_crate_dependencies))]

mod alloc;
pub use alloc::*;

mod mle;
pub use mle::*;

Expand Down
14 changes: 7 additions & 7 deletions crates/backend/poly/src/mle/mle_group_owned.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,28 +4,28 @@ use field::ExtensionField;

#[derive(Debug)]
pub enum MleGroupOwned<EF: ExtensionField<PF<EF>>> {
Base(Vec<Vec<PF<EF>>>),
Extension(Vec<Vec<EF>>),
BasePacked(Vec<Vec<PFPacking<EF>>>),
ExtensionPacked(Vec<Vec<EFPacking<EF>>>),
Base(Vec<ArenaVec<PF<EF>>>),
Extension(Vec<ArenaVec<EF>>),
BasePacked(Vec<ArenaVec<PFPacking<EF>>>),
ExtensionPacked(Vec<ArenaVec<EFPacking<EF>>>),
}

impl<EF: ExtensionField<PF<EF>>> MleGroupOwned<EF> {
pub fn as_extension_mut(&mut self) -> Option<&mut Vec<Vec<EF>>> {
pub fn as_extension_mut(&mut self) -> Option<&mut Vec<ArenaVec<EF>>> {
match self {
Self::Extension(e) => Some(e),
_ => None,
}
}

pub fn as_extension_packed_mut(&mut self) -> Option<&mut Vec<Vec<EFPacking<EF>>>> {
pub fn as_extension_packed_mut(&mut self) -> Option<&mut Vec<ArenaVec<EFPacking<EF>>>> {
match self {
Self::ExtensionPacked(e) => Some(e),
_ => None,
}
}

pub fn as_extension(self) -> Option<Vec<Vec<EF>>> {
pub fn as_extension(self) -> Option<Vec<ArenaVec<EF>>> {
match self {
Self::Extension(e) => Some(e),
_ => None,
Expand Down
14 changes: 8 additions & 6 deletions crates/backend/poly/src/mle/mle_group_ref.rs
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ impl<'a, EF: ExtensionField<PF<EF>>> MleGroupRef<'a, EF> {
}
Self::Extension(ext) => {
// the only case where there is real work
MleGroupOwned::ExtensionPacked(ext.iter().map(|v| pack_extension(v)).collect()).into()
MleGroupOwned::ExtensionPacked(ext.iter().map(|v| pack_extension_in(v)).collect()).into()
}
Self::BasePacked(_) | Self::ExtensionPacked(_) => self.soft_clone().into(),
}
Expand All @@ -99,7 +99,7 @@ impl<'a, EF: ExtensionField<PF<EF>>> MleGroupRef<'a, EF> {
MleGroupRef::Base(pols.iter().map(|v| PFPacking::<EF>::unpack_slice(v)).collect()).into()
}
Self::ExtensionPacked(pols) => {
MleGroupOwned::Extension(pols.iter().map(|v| unpack_extension(v)).collect()).into()
MleGroupOwned::Extension(pols.iter().map(|v| unpack_extension_in(v)).collect()).into()
}
}
}
Expand Down Expand Up @@ -158,10 +158,12 @@ impl<'a, EF: ExtensionField<PF<EF>>> MleGroupRef<'a, EF> {

pub fn clone_to_owned(&self) -> MleGroupOwned<EF> {
match self {
Self::Base(pols) => MleGroupOwned::Base(pols.iter().map(|v| v.to_vec()).collect()),
Self::Extension(pols) => MleGroupOwned::Extension(pols.iter().map(|v| v.to_vec()).collect()),
Self::BasePacked(pols) => MleGroupOwned::BasePacked(pols.iter().map(|v| v.to_vec()).collect()),
Self::ExtensionPacked(pols) => MleGroupOwned::ExtensionPacked(pols.iter().map(|v| v.to_vec()).collect()),
Self::Base(pols) => MleGroupOwned::Base(pols.iter().map(|v| arena_from_slice(v)).collect()),
Self::Extension(pols) => MleGroupOwned::Extension(pols.iter().map(|v| arena_from_slice(v)).collect()),
Self::BasePacked(pols) => MleGroupOwned::BasePacked(pols.iter().map(|v| arena_from_slice(v)).collect()),
Self::ExtensionPacked(pols) => {
MleGroupOwned::ExtensionPacked(pols.iter().map(|v| arena_from_slice(v)).collect())
}
}
}
}
Loading
Loading