Add streaming one-wesolowski compaction APIs#325
Conversation
Introduce a fast C wrapper with streaming proof generation, incremental GetBlock optimization, and memory-budgeted (k,l) tuning, plus the minimal runtime/build infrastructure needed to embed chiavdf in multi-worker clients. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@Ealrann I'm hoping to cleanly upstream your changes so you can rely on chiavdf directly and not a fork. See the plan here: |
Guard the fast pairindex slot selection behind the existing x86/asm feature checks and return slot 0 on non-x86 targets, where threading counters are not compiled. Co-authored-by: Cursor <cursoragent@cursor.com>
Install cmake via Homebrew and export its bin path in the C libraries and wheel workflows so self-hosted macOS jobs don't fail when cmake is missing from PATH. Co-authored-by: Cursor <cursoragent@cursor.com>
…ocation. Track and roll back per-batch checkpoints when replaying a failed fast batch, and switch pairindex slot allocation to unsigned atomics to avoid negative modulo indexing after counter wraparound. Co-authored-by: Cursor <cursoragent@cursor.com>
Document that batch bounds use completed-iteration base values while OnIteration is normalized to 1-based indices to avoid ambiguity in replay tracking. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose missing batch C bindings and debug visibility so downstream Rust tests can validate tuner behavior end-to-end. Co-authored-by: Cursor <cursoragent@cursor.com>
Default CHIA_VDF_FAST_COUNTER_SLOTS to 100 in threading.h so upstream builds keep lower BSS usage while allowing embedded deployments to override via compiler defines. Co-authored-by: Cursor <cursoragent@cursor.com>
Use one program-wide atomic slot allocator for `vdf_fast_pairindex()` so concurrent VDF computations started from different translation units cannot collide on shared fast counter slots. Co-authored-by: Cursor <cursoragent@cursor.com>
Reject k>=64 before any 64-bit left-shift and reuse validated bucket spans for allocation, indexing, and finalization loops so invalid parameter tuning cannot trigger undefined behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Oh, yes the plan is perfect. I'll definitely update the wesoforge client to use the main chiavdf. Note: the plan mentions "Trick 2 (unreleased, on side branch)", but in fact trick 2 was properly merged on |
|
'This PR has been flagged as stale due to no activity for over 60 |
Add compile-time guards that reject zero fast-counter slot configurations before modulo indexing, and export Homebrew's cmake path in macOS workflows so cmake is available within the same step on Intel runners. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Pushed follow-up commit
I’ll keep watching CI and triage anything else that pops up. |
Drop the root-level development patch file that diverged from the live implementation, and adjust the streaming tuner cost model so bucket-update work scales with checkpoint count and `l` instead of only `k`. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace per-iteration modulo checks with next-checkpoint tracking in the streaming callback, and integrate the scheduling update with batch replay boundaries so rollback/replay semantics remain correct in the current upstreamed implementation. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 61e9280. Configure here.
Lease fast counter slots with per-slot in-use tracking so long-lived processes can recycle released slots safely, and restore the one-weso proof diagnostic behind quiet_mode to keep client logging behavior consistent. Co-authored-by: Cursor <cursoragent@cursor.com>

Summary
c_bindings/fast_wrapperAPI surface for streaming one-wesolowski proof generation wheny_refis known up frontGetBlockand memory-budgeted(k, l)tuning to improve compaction-worker throughput/memory behaviorvdf_fast_pairindex,quiet_mode,fastlibtarget, PIC/PIE options) and document the path indocs/bluebox_compaction.mdcmakeinto PATH for macOS wheel/c-library workflowsScope and follow-up plan
enable_threadsremains unchanged (true) upstream; downstream can force single-thread proving viarepeated_square_fast_single_threadas discussed with @Ealrann.bbr(f3c73bf046e155aad2a8a9417496550b1e2d8cd5) and highlighted the important RSS/memory-leak fix (445cb0d54fbd43e479e9479c2605bbd3d0c2ddd5).Test plan
git log -1 --show-signature)origin/mainonly includes intended PR1 filesMade with Cursor
Note
Medium Risk
Adds a new C API and a substantial new proving path that hooks into
repeated_square, plus changes fast-path counter-slot selection and output suppression; regressions could affect VDF correctness/performance in embedded or multi-threaded use.Overview
Adds a new
src/c_bindings/fast_wrapperC API for fast one-Wesolowski proving, including a streaming variant that requires knowny_refand supports progress callbacks, batch jobs, optional incrementalGetBlockmapping, and memory-budgeted(k,l)tuning with per-thread debug stats.Updates the VDF engine to better support embedding/multi-worker execution by introducing
quiet_mode(to suppress stdout diagnostics), addingWesolowskiCallbackbatch lifecycle hooks (OnBatchStart/OnBatchReplay) for fast-path fallback handling, and allocating per-threadvdf_fastcounter slots viavdf_fast_pairindex()with a configurableCHIA_VDF_FAST_COUNTER_SLOTSguard.Build/CI support is extended by adding a
fastlibstatic library target (libchiavdf_fastc.a), optional PIC/PIE flags inMakefile.vdf-client, improved clean rules/asm compilation, macOS workflows that ensure Homebrewcmakeis installed/onPATH, and new documentation indocs/bluebox_compaction.md.Reviewed by Cursor Bugbot for commit 91a2af9. Bugbot is set up for automated code reviews on this repo. Configure here.