Add streaming one-wesolowski compaction APIs by hoffmang9 · Pull Request #325 · Chia-Network/chiavdf

hoffmang9 · 2026-02-24T07:35:45Z

Summary

add a new c_bindings/fast_wrapper API surface for streaming one-wesolowski proof generation when y_ref is known up front
include incremental GetBlock and memory-budgeted (k, l) tuning to improve compaction-worker throughput/memory behavior
add minimal embedding/build support (vdf_fast_pairindex, quiet_mode, fastlib target, PIC/PIE options) and document the path in docs/bluebox_compaction.md
fix merge blockers from this review cycle: guard fast counter-slot configuration against zero and export Homebrew cmake into PATH for macOS wheel/c-library workflows

Scope and follow-up plan

This PR is intentionally scoped to PR1 (streaming prover + supporting infra).
enable_threads remains unchanged (true) upstream; downstream can force single-thread proving via repeated_square_fast_single_thread as discussed with @Ealrann.
@Ealrann clarified that Trick 2 is already merged on bbr (f3c73bf046e155aad2a8a9417496550b1e2d8cd5) and highlighted the important RSS/memory-leak fix (445cb0d54fbd43e479e9479c2605bbd3d0c2ddd5).
Planned follow-up PR will upstream discriminant-reuse batch proving (Trick 2) and include the memory-leak/RSS stabilization work.

Test plan

Verify commit signature on branch tip (git log -1 --show-signature)
Confirm branch diff against origin/main only includes intended PR1 files
Address unresolved review thread(s)
Build/test on target platforms (Linux x86_64, macOS Intel, macOS ARM, Windows)

Made with Cursor

Note

Medium Risk
Adds a new C API and a substantial new proving path that hooks into repeated_square, plus changes fast-path counter-slot selection and output suppression; regressions could affect VDF correctness/performance in embedded or multi-threaded use.

Overview
Adds a new src/c_bindings/fast_wrapper C API for fast one-Wesolowski proving, including a streaming variant that requires known y_ref and supports progress callbacks, batch jobs, optional incremental GetBlock mapping, and memory-budgeted (k,l) tuning with per-thread debug stats.

Updates the VDF engine to better support embedding/multi-worker execution by introducing quiet_mode (to suppress stdout diagnostics), adding WesolowskiCallback batch lifecycle hooks (OnBatchStart/OnBatchReplay) for fast-path fallback handling, and allocating per-thread vdf_fast counter slots via vdf_fast_pairindex() with a configurable CHIA_VDF_FAST_COUNTER_SLOTS guard.

Build/CI support is extended by adding a fastlib static library target (libchiavdf_fastc.a), optional PIC/PIE flags in Makefile.vdf-client, improved clean rules/asm compilation, macOS workflows that ensure Homebrew cmake is installed/on PATH, and new documentation in docs/bluebox_compaction.md.

^{Reviewed by Cursor Bugbot for commit 91a2af9. Bugbot is set up for automated code reviews on this repo. Configure here.}

Introduce a fast C wrapper with streaming proof generation, incremental GetBlock optimization, and memory-budgeted (k,l) tuning, plus the minimal runtime/build infrastructure needed to embed chiavdf in multi-worker clients. Co-authored-by: Cursor <cursoragent@cursor.com>

hoffmang9 · 2026-02-24T07:39:00Z

@Ealrann I'm hoping to cleanly upstream your changes so you can rely on chiavdf directly and not a fork. See the plan here:
https://gist.github.com/hoffmang9/6a848ff22f7cd29f4b6507600b099a5c

Guard the fast pairindex slot selection behind the existing x86/asm feature checks and return slot 0 on non-x86 targets, where threading counters are not compiled. Co-authored-by: Cursor <cursoragent@cursor.com>

Install cmake via Homebrew and export its bin path in the C libraries and wheel workflows so self-hosted macOS jobs don't fail when cmake is missing from PATH. Co-authored-by: Cursor <cursoragent@cursor.com>

…ocation. Track and roll back per-batch checkpoints when replaying a failed fast batch, and switch pairindex slot allocation to unsigned atomics to avoid negative modulo indexing after counter wraparound. Co-authored-by: Cursor <cursoragent@cursor.com>

Document that batch bounds use completed-iteration base values while OnIteration is normalized to 1-based indices to avoid ambiguity in replay tracking. Co-authored-by: Cursor <cursoragent@cursor.com>

Expose missing batch C bindings and debug visibility so downstream Rust tests can validate tuner behavior end-to-end. Co-authored-by: Cursor <cursoragent@cursor.com>

Default CHIA_VDF_FAST_COUNTER_SLOTS to 100 in threading.h so upstream builds keep lower BSS usage while allowing embedded deployments to override via compiler defines. Co-authored-by: Cursor <cursoragent@cursor.com>

Use one program-wide atomic slot allocator for `vdf_fast_pairindex()` so concurrent VDF computations started from different translation units cannot collide on shared fast counter slots. Co-authored-by: Cursor <cursoragent@cursor.com>

Reject k>=64 before any 64-bit left-shift and reuse validated bucket spans for allocation, indexing, and finalization loops so invalid parameter tuning cannot trigger undefined behavior. Co-authored-by: Cursor <cursoragent@cursor.com>

Ealrann · 2026-02-24T11:52:53Z

Oh, yes the plan is perfect. I'll definitely update the wesoforge client to use the main chiavdf.
It's ok if you keep enable_threads=true indeed, I'll use repeated_square_fast_single_thread instead.

Note: the plan mentions "Trick 2 (unreleased, on side branch)", but in fact trick 2 was properly merged on bbr branch (it's the commit Ealrann@f3c73bf). This one is important because it allows the group optimisation.
Commit Ealrann@445cb0d is also very important because it fixes a memory leak (also in bbr branch), it was problematic on high core count servers.

github-actions · 2026-04-26T11:17:37Z

'This PR has been flagged as stale due to no activity for over 60
days. It will not be automatically closed, but it has been given
a stale-pr label and should be manually reviewed.'

Add compile-time guards that reject zero fast-counter slot configurations before modulo indexing, and export Homebrew's cmake path in macOS workflows so cmake is available within the same step on Intel runners. Co-authored-by: Cursor <cursoragent@cursor.com>

hoffmang9 · 2026-05-13T02:19:00Z

Pushed follow-up commit 0c11002 to address remaining blockers.

fixed unresolved Bugbot thread by hard-guarding fast counter-slot count (CHIA_VDF_FAST_COUNTER_SLOTS > 0) so % kSlots cannot be undefined
fixed macOS workflow fragility by exporting Homebrew cmake into PATH in both wheel and C-library workflows before cmake --version
updated PR description scope to match @Ealrann’s note: Trick 2 is already on bbr (f3c73bf046e155aad2a8a9417496550b1e2d8cd5) and memory-leak/RSS fix (445cb0d54fbd43e479e9479c2605bbd3d0c2ddd5) is tracked for follow-up upstreaming

I’ll keep watching CI and triage anything else that pops up.

Drop the root-level development patch file that diverged from the live implementation, and adjust the streaming tuner cost model so bucket-update work scales with checkpoint count and `l` instead of only `k`. Co-authored-by: Cursor <cursoragent@cursor.com>

Replace per-iteration modulo checks with next-checkpoint tracking in the streaming callback, and integrate the scheduling update with batch replay boundaries so rollback/replay semantics remain correct in the current upstreamed implementation. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 61e9280. Configure here.}

Lease fast counter slots with per-slot in-use tracking so long-lived processes can recycle released slots safely, and restore the one-weso proof diagnostic behind quiet_mode to keep client logging behavior consistent. Co-authored-by: Cursor <cursoragent@cursor.com>

Fix non-x86 build break in vdf_fast_pairindex.

7be0752

Guard the fast pairindex slot selection behind the existing x86/asm feature checks and return slot 0 on non-x86 targets, where threading counters are not compiled. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Feb 24, 2026

View reviewed changes

Comment thread src/c_bindings/fast_wrapper.cpp

Comment thread src/vdf.h Outdated

hoffmang9 and others added 2 commits February 23, 2026 23:57

Ensure cmake is present on macOS CI runners.

3755be2

Install cmake via Homebrew and export its bin path in the C libraries and wheel workflows so self-hosted macOS jobs don't fail when cmake is missing from PATH. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Feb 24, 2026

View reviewed changes

Comment thread src/vdf.h

Clarify batch iteration indexing in streaming callback.

fd000ab

Document that batch bounds use completed-iteration base values while OnIteration is normalized to 1-based indices to avoid ambiguity in replay tracking. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Feb 24, 2026

View reviewed changes

Comment thread src/vdf.h

Comment thread src/c_bindings/fast_wrapper.cpp Outdated

hoffmang9 and others added 4 commits February 24, 2026 00:48

Add streaming tuner diagnostics and batch fast-wrapper APIs.

3f82dc2

Expose missing batch C bindings and debug visibility so downstream Rust tests can validate tuner behavior end-to-end. Co-authored-by: Cursor <cursoragent@cursor.com>

Make fast-thread counter slots build-configurable.

95f8ff1

Default CHIA_VDF_FAST_COUNTER_SLOTS to 100 in threading.h so upstream builds keep lower BSS usage while allowing embedded deployments to override via compiler defines. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Feb 24, 2026

View reviewed changes

Comment thread src/vdf.h

github-actions Bot added the stale-pr label Apr 26, 2026

hoffmang9 removed the stale-pr label May 13, 2026

cursor Bot reviewed May 13, 2026

View reviewed changes

Comment thread pr1_upstream_ready.patch Outdated

Comment thread src/c_bindings/fast_wrapper.cpp Outdated

hoffmang9 and others added 2 commits May 12, 2026 19:37

cursor Bot reviewed May 13, 2026

View reviewed changes

Comment thread src/vdf.h

Comment thread src/vdf.h Outdated

hoffmang9 mentioned this pull request May 13, 2026

Batch proving with discriminant reuse (stacked on #325) #359

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add streaming one-wesolowski compaction APIs#325

Add streaming one-wesolowski compaction APIs#325
hoffmang9 wants to merge 13 commits into
Chia-Network:mainfrom
hoffmang9:pr1-streaming-prover-upstream

hoffmang9 commented Feb 24, 2026 •

edited by cursor Bot

Loading

Uh oh!

hoffmang9 commented Feb 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ealrann commented Feb 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

hoffmang9 commented May 13, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hoffmang9 commented Feb 24, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope and follow-up plan

Test plan

Uh oh!

hoffmang9 commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ealrann commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

hoffmang9 commented May 13, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hoffmang9 commented Feb 24, 2026 •

edited by cursor Bot

Loading

hoffmang9 commented Feb 24, 2026 •

edited

Loading

Ealrann commented Feb 24, 2026 •

edited

Loading