Implement ROADMAP.md Phase 4: all 5 deferred features by gvonness-apolitical · Pull Request #36 · Entrolution/echidna

gvonness-apolitical · 2026-03-14T21:23:30Z

Summary

4.1 Indefinite dense STDE: dense_stde_2nd_indefinite() — eigendecomposition + sign-splitting for arbitrary symmetric C matrices, with epsilon clamping and single-pass optimization for same-sign eigenvalues (requires stde + nalgebra)
4.5 GpuBackend trait method: taylor_forward_2nd_batch lifted into GpuBackend trait; all stde_gpu functions now generic over B: GpuBackend; old backend-specific functions deprecated
4.4 Generic laplacian_with_control_gpu: works with any GpuBackend, enabling CUDA variance-reduced Laplacian
4.3 Chunked GPU Taylor dispatch: taylor_forward_2nd_batch_chunked — splits large batches by buffer size (128 MiB default) and WebGPU workgroup dispatch limits (65535×256)
4.2 General-K GPU Taylor kernels: runtime codegen (taylor_codegen.rs) generates K-specialized WGSL/CUDA shaders for K=1..5 with fully unrolled Cauchy products and recurrences for all 43 opcodes; taylor_forward_kth_batch returns TaylorKthBatchResult

Test plan

6 indefinite STDE tests (PD cross-validation, diagonal indefinite, full indefinite, all-negative, zero matrix, near-zero eigenvalue clamping)
19 existing GPU STDE tests pass (trait method backward compat)
4 chunked dispatch tests (single-chunk, multi-chunk with c1s, exact boundary, zero batch)
5 general-K Taylor tests (polynomial all orders K=1..5, K=3 cross-validation, exp higher-order, multi-batch deinterleaving, unsupported K error)
CUDA tested on vast.ai A100 (CUDA 12.8) — compilation, trait impl, doctest all pass
Full test suite: cargo test --features "bytecode,taylor,laurent,stde,serde,faer,nalgebra,ndarray,parallel,diffop,gpu-wgpu" — all pass
cargo clippy -- -D warnings clean
cargo fmt --check clean

Phase 4 items, previously deferred for lack of concrete use cases: 4.1 — Indefinite dense STDE: dense_stde_2nd_indefinite() with eigendecomposition, epsilon-clamped sign-splitting, and optimized single-pass for same-sign eigenvalues. Requires stde+nalgebra. 4.5 — GpuBackend trait method: taylor_forward_2nd_batch lifted from inherent methods into the trait. All stde_gpu functions (laplacian_gpu, hessian_diagonal_gpu, laplacian_with_control_gpu) are now generic over B: GpuBackend. Old backend-specific functions deprecated. 4.4 — Generic laplacian_with_control_gpu: works with any GpuBackend, enabling CUDA variance-reduced Laplacian without separate functions. 4.3 — Chunked GPU Taylor dispatch: taylor_forward_2nd_batch_chunked splits large batches by buffer size limits (128 MiB default) and WebGPU workgroup dispatch limits (65535×256). 4.2 — General-K GPU Taylor kernels: runtime codegen (taylor_codegen.rs) generates K-specialized WGSL and CUDA shaders for K=1..5 with fully unrolled Cauchy products and recurrences across all 43 opcodes. WgpuContext compiles K=1..5 pipelines at init; taylor_forward_kth_batch returns TaylorKthBatchResult with K coefficient vectors.

gvonness-apolitical merged commit 32a8dd7 into main Mar 14, 2026
6 checks passed

gvonness-apolitical deleted the roadmap/phase-4-deferred-features branch March 14, 2026 21:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement ROADMAP.md Phase 4: all 5 deferred features#36

Implement ROADMAP.md Phase 4: all 5 deferred features#36
gvonness-apolitical merged 1 commit intomainfrom
roadmap/phase-4-deferred-features

gvonness-apolitical commented Mar 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gvonness-apolitical commented Mar 14, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant