Skip to content

Implement ROADMAP.md Phase 4: all 5 deferred features#36

Merged
gvonness-apolitical merged 1 commit intomainfrom
roadmap/phase-4-deferred-features
Mar 14, 2026
Merged

Implement ROADMAP.md Phase 4: all 5 deferred features#36
gvonness-apolitical merged 1 commit intomainfrom
roadmap/phase-4-deferred-features

Conversation

@gvonness-apolitical
Copy link
Contributor

Summary

  • 4.1 Indefinite dense STDE: dense_stde_2nd_indefinite() — eigendecomposition + sign-splitting for arbitrary symmetric C matrices, with epsilon clamping and single-pass optimization for same-sign eigenvalues (requires stde + nalgebra)
  • 4.5 GpuBackend trait method: taylor_forward_2nd_batch lifted into GpuBackend trait; all stde_gpu functions now generic over B: GpuBackend; old backend-specific functions deprecated
  • 4.4 Generic laplacian_with_control_gpu: works with any GpuBackend, enabling CUDA variance-reduced Laplacian
  • 4.3 Chunked GPU Taylor dispatch: taylor_forward_2nd_batch_chunked — splits large batches by buffer size (128 MiB default) and WebGPU workgroup dispatch limits (65535×256)
  • 4.2 General-K GPU Taylor kernels: runtime codegen (taylor_codegen.rs) generates K-specialized WGSL/CUDA shaders for K=1..5 with fully unrolled Cauchy products and recurrences for all 43 opcodes; taylor_forward_kth_batch returns TaylorKthBatchResult

Test plan

  • 6 indefinite STDE tests (PD cross-validation, diagonal indefinite, full indefinite, all-negative, zero matrix, near-zero eigenvalue clamping)
  • 19 existing GPU STDE tests pass (trait method backward compat)
  • 4 chunked dispatch tests (single-chunk, multi-chunk with c1s, exact boundary, zero batch)
  • 5 general-K Taylor tests (polynomial all orders K=1..5, K=3 cross-validation, exp higher-order, multi-batch deinterleaving, unsupported K error)
  • CUDA tested on vast.ai A100 (CUDA 12.8) — compilation, trait impl, doctest all pass
  • Full test suite: cargo test --features "bytecode,taylor,laurent,stde,serde,faer,nalgebra,ndarray,parallel,diffop,gpu-wgpu" — all pass
  • cargo clippy -- -D warnings clean
  • cargo fmt --check clean

Phase 4 items, previously deferred for lack of concrete use cases:

4.1 — Indefinite dense STDE: dense_stde_2nd_indefinite() with
eigendecomposition, epsilon-clamped sign-splitting, and optimized
single-pass for same-sign eigenvalues. Requires stde+nalgebra.

4.5 — GpuBackend trait method: taylor_forward_2nd_batch lifted from
inherent methods into the trait. All stde_gpu functions (laplacian_gpu,
hessian_diagonal_gpu, laplacian_with_control_gpu) are now generic over
B: GpuBackend. Old backend-specific functions deprecated.

4.4 — Generic laplacian_with_control_gpu: works with any GpuBackend,
enabling CUDA variance-reduced Laplacian without separate functions.

4.3 — Chunked GPU Taylor dispatch: taylor_forward_2nd_batch_chunked
splits large batches by buffer size limits (128 MiB default) and
WebGPU workgroup dispatch limits (65535×256).

4.2 — General-K GPU Taylor kernels: runtime codegen (taylor_codegen.rs)
generates K-specialized WGSL and CUDA shaders for K=1..5 with fully
unrolled Cauchy products and recurrences across all 43 opcodes.
WgpuContext compiles K=1..5 pipelines at init; taylor_forward_kth_batch
returns TaylorKthBatchResult with K coefficient vectors.
@gvonness-apolitical gvonness-apolitical merged commit 32a8dd7 into main Mar 14, 2026
6 checks passed
@gvonness-apolitical gvonness-apolitical deleted the roadmap/phase-4-deferred-features branch March 14, 2026 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant