Batch proving with discriminant reuse (stacked on #325)#359
Open
hoffmang9 wants to merge 21 commits into
Open
Conversation
Introduce a fast C wrapper with streaming proof generation, incremental GetBlock optimization, and memory-budgeted (k,l) tuning, plus the minimal runtime/build infrastructure needed to embed chiavdf in multi-worker clients. Co-authored-by: Cursor <cursoragent@cursor.com>
Guard the fast pairindex slot selection behind the existing x86/asm feature checks and return slot 0 on non-x86 targets, where threading counters are not compiled. Co-authored-by: Cursor <cursoragent@cursor.com>
Install cmake via Homebrew and export its bin path in the C libraries and wheel workflows so self-hosted macOS jobs don't fail when cmake is missing from PATH. Co-authored-by: Cursor <cursoragent@cursor.com>
…ocation. Track and roll back per-batch checkpoints when replaying a failed fast batch, and switch pairindex slot allocation to unsigned atomics to avoid negative modulo indexing after counter wraparound. Co-authored-by: Cursor <cursoragent@cursor.com>
Document that batch bounds use completed-iteration base values while OnIteration is normalized to 1-based indices to avoid ambiguity in replay tracking. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose missing batch C bindings and debug visibility so downstream Rust tests can validate tuner behavior end-to-end. Co-authored-by: Cursor <cursoragent@cursor.com>
Default CHIA_VDF_FAST_COUNTER_SLOTS to 100 in threading.h so upstream builds keep lower BSS usage while allowing embedded deployments to override via compiler defines. Co-authored-by: Cursor <cursoragent@cursor.com>
Use one program-wide atomic slot allocator for `vdf_fast_pairindex()` so concurrent VDF computations started from different translation units cannot collide on shared fast counter slots. Co-authored-by: Cursor <cursoragent@cursor.com>
Reject k>=64 before any 64-bit left-shift and reuse validated bucket spans for allocation, indexing, and finalization loops so invalid parameter tuning cannot trigger undefined behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
Add compile-time guards that reject zero fast-counter slot configurations before modulo indexing, and export Homebrew's cmake path in macOS workflows so cmake is available within the same step on Intel runners. Co-authored-by: Cursor <cursoragent@cursor.com>
Drop the root-level development patch file that diverged from the live implementation, and adjust the streaming tuner cost model so bucket-update work scales with checkpoint count and `l` instead of only `k`. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace per-iteration modulo checks with next-checkpoint tracking in the streaming callback, and integrate the scheduling update with batch replay boundaries so rollback/replay semantics remain correct in the current upstreamed implementation. Co-authored-by: Cursor <cursoragent@cursor.com>
Lease fast counter slots with per-slot in-use tracking so long-lived processes can recycle released slots safely, and restore the one-weso proof diagnostic behind quiet_mode to keep client logging behavior consistent. Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit 750df66)
(cherry picked from commit 445cb0d)
Member
Author
|
@Ealrann Assuming these two PRs are merged - is there anything else you need to be able to abandon your chiavdf fork and just use a release of this repo? |
Ensure batch proving joins pending finalizer work and frees allocated output arrays on exceptions so stack-referenced state cannot outlive the call frame. Also remove an unused internal batch-free helper that duplicated the public C API. Co-authored-by: Cursor <cursoragent@cursor.com>
Handle replay notifications in streaming Wesolowski callbacks by rejecting replayed batches instead of reusing irreversibly accumulated bucket state. This prevents silent incorrect proofs when the fast squaring path replays a corrupted batch. Co-authored-by: Cursor <cursoragent@cursor.com>
Keep discriminant and stop-flag state alive for catch-path finalizer joins so callback references never dangle during exception unwinding. This preserves safe cleanup when batch proving fails mid-flight. Co-authored-by: Cursor <cursoragent@cursor.com>
Add a reducer-aware streaming proof finalization method used by batch worker threads and remove stale unused bucket replay members. This keeps batch finalization functional while cleaning dead scaffolding flagged by review. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 88602ba. Configure here.
Drop the unused batch progress trampoline scaffolding in fast_wrapper to reduce maintenance noise and keep the batch callback flow aligned with current direct progress handling. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
bbrfork (f3c73bf,750df66) so jobs sharing(challenge, size_bits, x0)can reuse one squaring trajectory445cb0d) for the batch finalization pathNotes
Test plan
mainMade with Cursor
Note
High Risk
Adds a large new C/C++ proving path that changes how Wesolowski proofs are generated and introduces new multi-threaded batch finalization and fast-path counter slot allocation, which could affect correctness and stability under concurrency.
Overview
Adds a new
fast_wrapperC ABI for compaction-oriented one-Wesolowski proving that can stream bucket updates during squaring (requires knowny_ref), optionally using an incrementalGetBlockmapping to avoid expensive per-block exponentiation, plus a batch API that reuses a single squaring trajectory across many jobs sharing(challenge, x, discriminant_size_bits)and offloads per-job finalization to a small worker pool.Updates the core squaring loop integration to support batch lifecycle hooks (
OnBatchStart/OnBatchReplay), introducesquiet_modeto suppress stdout in library use, and assigns per-thread fast-counter slots viavdf_fast_pairindex()to reduce collisions when running multiple VDFs in-process.Build/CI updates ensure
cmakeis present on macOS runners and extendMakefile.vdf-clientwith optionalPICbuilds and a newlibchiavdf_fastc.atarget; adds docs describing the new bluebox compaction optimizations.Reviewed by Cursor Bugbot for commit 8e76d33. Bugbot is set up for automated code reviews on this repo. Configure here.