Skip to content

Batch proving with discriminant reuse (stacked on #325)#359

Open
hoffmang9 wants to merge 21 commits into
Chia-Network:mainfrom
hoffmang9:pr2-batch-discriminant-reuse
Open

Batch proving with discriminant reuse (stacked on #325)#359
hoffmang9 wants to merge 21 commits into
Chia-Network:mainfrom
hoffmang9:pr2-batch-discriminant-reuse

Conversation

@hoffmang9
Copy link
Copy Markdown
Member

@hoffmang9 hoffmang9 commented May 13, 2026

Summary

Notes

Test plan

Made with Cursor


Note

High Risk
Adds a large new C/C++ proving path that changes how Wesolowski proofs are generated and introduces new multi-threaded batch finalization and fast-path counter slot allocation, which could affect correctness and stability under concurrency.

Overview
Adds a new fast_wrapper C ABI for compaction-oriented one-Wesolowski proving that can stream bucket updates during squaring (requires known y_ref), optionally using an incremental GetBlock mapping to avoid expensive per-block exponentiation, plus a batch API that reuses a single squaring trajectory across many jobs sharing (challenge, x, discriminant_size_bits) and offloads per-job finalization to a small worker pool.

Updates the core squaring loop integration to support batch lifecycle hooks (OnBatchStart/OnBatchReplay), introduces quiet_mode to suppress stdout in library use, and assigns per-thread fast-counter slots via vdf_fast_pairindex() to reduce collisions when running multiple VDFs in-process.

Build/CI updates ensure cmake is present on macOS runners and extend Makefile.vdf-client with optional PIC builds and a new libchiavdf_fastc.a target; adds docs describing the new bluebox compaction optimizations.

Reviewed by Cursor Bugbot for commit 8e76d33. Bugbot is set up for automated code reviews on this repo. Configure here.

hoffmang9 and others added 16 commits February 23, 2026 23:34
Introduce a fast C wrapper with streaming proof generation, incremental GetBlock optimization, and memory-budgeted (k,l) tuning, plus the minimal runtime/build infrastructure needed to embed chiavdf in multi-worker clients.

Co-authored-by: Cursor <cursoragent@cursor.com>
Guard the fast pairindex slot selection behind the existing x86/asm feature checks and return slot 0 on non-x86 targets, where threading counters are not compiled.

Co-authored-by: Cursor <cursoragent@cursor.com>
Install cmake via Homebrew and export its bin path in the C libraries and wheel workflows so self-hosted macOS jobs don't fail when cmake is missing from PATH.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ocation.

Track and roll back per-batch checkpoints when replaying a failed fast batch, and switch pairindex slot allocation to unsigned atomics to avoid negative modulo indexing after counter wraparound.

Co-authored-by: Cursor <cursoragent@cursor.com>
Document that batch bounds use completed-iteration base values while OnIteration is normalized to 1-based indices to avoid ambiguity in replay tracking.

Co-authored-by: Cursor <cursoragent@cursor.com>
Expose missing batch C bindings and debug visibility so downstream Rust tests can validate tuner behavior end-to-end.

Co-authored-by: Cursor <cursoragent@cursor.com>
Default CHIA_VDF_FAST_COUNTER_SLOTS to 100 in threading.h so upstream builds keep lower BSS usage while allowing embedded deployments to override via compiler defines.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use one program-wide atomic slot allocator for `vdf_fast_pairindex()` so concurrent VDF computations started from different translation units cannot collide on shared fast counter slots.

Co-authored-by: Cursor <cursoragent@cursor.com>
Reject k>=64 before any 64-bit left-shift and reuse validated bucket spans for allocation, indexing, and finalization loops so invalid parameter tuning cannot trigger undefined behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add compile-time guards that reject zero fast-counter slot configurations before modulo indexing, and export Homebrew's cmake path in macOS workflows so cmake is available within the same step on Intel runners.

Co-authored-by: Cursor <cursoragent@cursor.com>
Drop the root-level development patch file that diverged from the live implementation, and adjust the streaming tuner cost model so bucket-update work scales with checkpoint count and `l` instead of only `k`.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace per-iteration modulo checks with next-checkpoint tracking in the streaming callback, and integrate the scheduling update with batch replay boundaries so rollback/replay semantics remain correct in the current upstreamed implementation.

Co-authored-by: Cursor <cursoragent@cursor.com>
Lease fast counter slots with per-slot in-use tracking so long-lived processes can recycle released slots safely, and restore the one-weso proof diagnostic behind quiet_mode to keep client logging behavior consistent.

Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit f3c73bf)
(cherry picked from commit 750df66)
(cherry picked from commit 445cb0d)
@hoffmang9
Copy link
Copy Markdown
Member Author

@Ealrann Assuming these two PRs are merged - is there anything else you need to be able to abandon your chiavdf fork and just use a release of this repo?

@hoffmang9 hoffmang9 marked this pull request as ready for review May 13, 2026 03:59
Comment thread src/c_bindings/fast_wrapper.cpp
Comment thread src/c_bindings/fast_wrapper.cpp Outdated
Ensure batch proving joins pending finalizer work and frees allocated output arrays on exceptions so stack-referenced state cannot outlive the call frame. Also remove an unused internal batch-free helper that duplicated the public C API.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/c_bindings/fast_wrapper.cpp
Handle replay notifications in streaming Wesolowski callbacks by rejecting replayed batches instead of reusing irreversibly accumulated bucket state. This prevents silent incorrect proofs when the fast squaring path replays a corrupted batch.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/c_bindings/fast_wrapper.cpp
Keep discriminant and stop-flag state alive for catch-path finalizer joins so callback references never dangle during exception unwinding. This preserves safe cleanup when batch proving fails mid-flight.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/c_bindings/fast_wrapper.cpp
Comment thread src/c_bindings/fast_wrapper.cpp Outdated
Add a reducer-aware streaming proof finalization method used by batch worker threads and remove stale unused bucket replay members. This keeps batch finalization functional while cleaning dead scaffolding flagged by review.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 88602ba. Configure here.

Comment thread src/c_bindings/fast_wrapper.cpp Outdated
Drop the unused batch progress trampoline scaffolding in fast_wrapper to reduce maintenance noise and keep the batch callback flow aligned with current direct progress handling.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants