Skip to content

Add streaming one-wesolowski compaction APIs#325

Open
hoffmang9 wants to merge 13 commits into
Chia-Network:mainfrom
hoffmang9:pr1-streaming-prover-upstream
Open

Add streaming one-wesolowski compaction APIs#325
hoffmang9 wants to merge 13 commits into
Chia-Network:mainfrom
hoffmang9:pr1-streaming-prover-upstream

Conversation

@hoffmang9
Copy link
Copy Markdown
Member

@hoffmang9 hoffmang9 commented Feb 24, 2026

Summary

  • add a new c_bindings/fast_wrapper API surface for streaming one-wesolowski proof generation when y_ref is known up front
  • include incremental GetBlock and memory-budgeted (k, l) tuning to improve compaction-worker throughput/memory behavior
  • add minimal embedding/build support (vdf_fast_pairindex, quiet_mode, fastlib target, PIC/PIE options) and document the path in docs/bluebox_compaction.md
  • fix merge blockers from this review cycle: guard fast counter-slot configuration against zero and export Homebrew cmake into PATH for macOS wheel/c-library workflows

Scope and follow-up plan

  • This PR is intentionally scoped to PR1 (streaming prover + supporting infra).
  • enable_threads remains unchanged (true) upstream; downstream can force single-thread proving via repeated_square_fast_single_thread as discussed with @Ealrann.
  • @Ealrann clarified that Trick 2 is already merged on bbr (f3c73bf046e155aad2a8a9417496550b1e2d8cd5) and highlighted the important RSS/memory-leak fix (445cb0d54fbd43e479e9479c2605bbd3d0c2ddd5).
  • Planned follow-up PR will upstream discriminant-reuse batch proving (Trick 2) and include the memory-leak/RSS stabilization work.

Test plan

  • Verify commit signature on branch tip (git log -1 --show-signature)
  • Confirm branch diff against origin/main only includes intended PR1 files
  • Address unresolved review thread(s)
  • Build/test on target platforms (Linux x86_64, macOS Intel, macOS ARM, Windows)

Made with Cursor


Note

Medium Risk
Adds a new C API and a substantial new proving path that hooks into repeated_square, plus changes fast-path counter-slot selection and output suppression; regressions could affect VDF correctness/performance in embedded or multi-threaded use.

Overview
Adds a new src/c_bindings/fast_wrapper C API for fast one-Wesolowski proving, including a streaming variant that requires known y_ref and supports progress callbacks, batch jobs, optional incremental GetBlock mapping, and memory-budgeted (k,l) tuning with per-thread debug stats.

Updates the VDF engine to better support embedding/multi-worker execution by introducing quiet_mode (to suppress stdout diagnostics), adding WesolowskiCallback batch lifecycle hooks (OnBatchStart/OnBatchReplay) for fast-path fallback handling, and allocating per-thread vdf_fast counter slots via vdf_fast_pairindex() with a configurable CHIA_VDF_FAST_COUNTER_SLOTS guard.

Build/CI support is extended by adding a fastlib static library target (libchiavdf_fastc.a), optional PIC/PIE flags in Makefile.vdf-client, improved clean rules/asm compilation, macOS workflows that ensure Homebrew cmake is installed/on PATH, and new documentation in docs/bluebox_compaction.md.

Reviewed by Cursor Bugbot for commit 91a2af9. Bugbot is set up for automated code reviews on this repo. Configure here.

Introduce a fast C wrapper with streaming proof generation, incremental GetBlock optimization, and memory-budgeted (k,l) tuning, plus the minimal runtime/build infrastructure needed to embed chiavdf in multi-worker clients.

Co-authored-by: Cursor <cursoragent@cursor.com>
@hoffmang9
Copy link
Copy Markdown
Member Author

hoffmang9 commented Feb 24, 2026

@Ealrann I'm hoping to cleanly upstream your changes so you can rely on chiavdf directly and not a fork. See the plan here:
https://gist.github.com/hoffmang9/6a848ff22f7cd29f4b6507600b099a5c

Guard the fast pairindex slot selection behind the existing x86/asm feature checks and return slot 0 on non-x86 targets, where threading counters are not compiled.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/c_bindings/fast_wrapper.cpp
Comment thread src/vdf.h Outdated
hoffmang9 and others added 2 commits February 23, 2026 23:57
Install cmake via Homebrew and export its bin path in the C libraries and wheel workflows so self-hosted macOS jobs don't fail when cmake is missing from PATH.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ocation.

Track and roll back per-batch checkpoints when replaying a failed fast batch, and switch pairindex slot allocation to unsigned atomics to avoid negative modulo indexing after counter wraparound.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/vdf.h
Document that batch bounds use completed-iteration base values while OnIteration is normalized to 1-based indices to avoid ambiguity in replay tracking.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/vdf.h
Comment thread src/c_bindings/fast_wrapper.cpp Outdated
hoffmang9 and others added 4 commits February 24, 2026 00:48
Expose missing batch C bindings and debug visibility so downstream Rust tests can validate tuner behavior end-to-end.

Co-authored-by: Cursor <cursoragent@cursor.com>
Default CHIA_VDF_FAST_COUNTER_SLOTS to 100 in threading.h so upstream builds keep lower BSS usage while allowing embedded deployments to override via compiler defines.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use one program-wide atomic slot allocator for `vdf_fast_pairindex()` so concurrent VDF computations started from different translation units cannot collide on shared fast counter slots.

Co-authored-by: Cursor <cursoragent@cursor.com>
Reject k>=64 before any 64-bit left-shift and reuse validated bucket spans for allocation, indexing, and finalization loops so invalid parameter tuning cannot trigger undefined behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/vdf.h
@Ealrann
Copy link
Copy Markdown

Ealrann commented Feb 24, 2026

Oh, yes the plan is perfect. I'll definitely update the wesoforge client to use the main chiavdf.
It's ok if you keep enable_threads=true indeed, I'll use repeated_square_fast_single_thread instead.

Note: the plan mentions "Trick 2 (unreleased, on side branch)", but in fact trick 2 was properly merged on bbr branch (it's the commit Ealrann@f3c73bf). This one is important because it allows the group optimisation.
Commit Ealrann@445cb0d is also very important because it fixes a memory leak (also in bbr branch), it was problematic on high core count servers.

@github-actions
Copy link
Copy Markdown

'This PR has been flagged as stale due to no activity for over 60
days. It will not be automatically closed, but it has been given
a stale-pr label and should be manually reviewed.'

Add compile-time guards that reject zero fast-counter slot configurations before modulo indexing, and export Homebrew's cmake path in macOS workflows so cmake is available within the same step on Intel runners.

Co-authored-by: Cursor <cursoragent@cursor.com>
@hoffmang9
Copy link
Copy Markdown
Member Author

Pushed follow-up commit 0c11002 to address remaining blockers.

  • fixed unresolved Bugbot thread by hard-guarding fast counter-slot count (CHIA_VDF_FAST_COUNTER_SLOTS > 0) so % kSlots cannot be undefined
  • fixed macOS workflow fragility by exporting Homebrew cmake into PATH in both wheel and C-library workflows before cmake --version
  • updated PR description scope to match @Ealrann’s note: Trick 2 is already on bbr (f3c73bf046e155aad2a8a9417496550b1e2d8cd5) and memory-leak/RSS fix (445cb0d54fbd43e479e9479c2605bbd3d0c2ddd5) is tracked for follow-up upstreaming

I’ll keep watching CI and triage anything else that pops up.

Comment thread pr1_upstream_ready.patch Outdated
Comment thread src/c_bindings/fast_wrapper.cpp Outdated
hoffmang9 and others added 2 commits May 12, 2026 19:37
Drop the root-level development patch file that diverged from the live implementation, and adjust the streaming tuner cost model so bucket-update work scales with checkpoint count and `l` instead of only `k`.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace per-iteration modulo checks with next-checkpoint tracking in the streaming callback, and integrate the scheduling update with batch replay boundaries so rollback/replay semantics remain correct in the current upstreamed implementation.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 61e9280. Configure here.

Comment thread src/vdf.h
Comment thread src/vdf.h Outdated
Lease fast counter slots with per-slot in-use tracking so long-lived processes can recycle released slots safely, and restore the one-weso proof diagnostic behind quiet_mode to keep client logging behavior consistent.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants