feat(kde): kernel functions with statistical properties and LOESS support #360
feat(kde): kernel functions with statistical properties and LOESS support #360thisisamirv wants to merge 2 commits intostatrs-dev:masterfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #360 +/- ##
==========================================
- Coverage 94.99% 94.73% -0.27%
==========================================
Files 61 59 -2
Lines 13615 14004 +389
==========================================
+ Hits 12934 13266 +332
- Misses 681 738 +57 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hey, would you be willing to break this one out across a few PRs? One partition could be,
|
|
Hi @YeungOnion , sorry for going radio silent on this PR earlier. I got excited about exploring optimizations and ended up building a much more feature-complete and performant standalone LOWESS/LOESS implementation instead. I’ve since added parallelism, streaming/online modes, confidence intervals, better robustness, cross-validation, no-std support, and validated it against R’s implementations, which are highly optimized and are built on fortran/BLAS (https://github.com/thisisamirv/lowess.git and https://github.com/thisisamirv/loess-rs.git). For the lowess crate the only dependencies are num traits and wide, and for the loess crate it is num traits, wide, and nalgebra. |
Add kernel functions with statistical properties and LOESS support
This PR prepares the foundation for KDE and LOESS implementations.
Summary
Major enhancement to the kernel module adding:
consts.rskernel.rswith AMISE efficiency metrics, dual evaluation modes, and LOESS integrationChanges to
consts.rs∫ u² K(u) dufor 9 kernels∫ K(u)² dufor 9 kernelsSQRT_PIconstantChanges to
kernel.rs1. Statistical Correctness
evaluate(): Normalized for KDE (integrates to 1)evaluate_weight(): Unnormalized for LOESS (local regression weights)2. New Kernels & Renaming
Quartic→Bisquare(it's a more standard name)Cosinekernel (high efficiency ≈ 0.9995)Logistickernel (heavy-tailed unbounded)Sigmoidkernel (hyperbolic secant)Sigmoidformula: wasexp(πx), now correctlyexp(x)3. Enhanced API
KernelTypeenum: Runtime kernel selection for LOESS and other applicationsCustomKernel: User-defined kernels with metadataevaluate_batch(),compute_distance_weights()robust_reweights(),normalize_weights()recommended_for_kde(),recommended_for_loess(),most_efficient()4. Boundary Behavior Fix
|x| <= 1to|x| >= 1for consistency(-1, 1)as mathematically correctx = ±15. Documentation & Testing
Breaking Changes
|x| = 1now returns 0 (was non-zero)(70/81)(1-|x|³)³(was unnormalized)Fixed
distribution(removed unused import crate::distribution::internal::testing_boiler).Geometric::inverse_cdfplatform-dependent behavior on Windows:Problem
The
test_inverse_cdftest was failing on Windows with:12forinverse_cdf(0.0)Root Cause
Floating-point precision differences across platforms caused
inverse_cdf(0.0)to compute inconsistent results when using the formulaceil(log(1-p) / log(1-self.p)).Solution
Added an explicit implementation of
inverse_cdffor theGeometricdistribution that handles edge cases consistently:min()(1) when input probabilityp <= 0.01when distribution parameterself.p == 1.0max()(u64::MAX) when input probabilityp >= 1.0This ensures consistent behavior across all platforms (macOS, Linux, Windows).