sparse strips: Bump to fearless_simd v0.4.0#1462
Conversation
This brings a nice little boost to flattening, but badly regresses analytic AA in strip rendering. I've traced the regression back to `max`/`max_precise` semantics introduced here: linebender/fearless_simd#136. I'll open a PR based on top of this one to fix the issue. Benchmarks: Flattening ``` flatten/Ghostscript_Tiger time: [180.70 µs 181.08 µs 181.46 µs] change: [-6.4035% -6.0571% -5.6921%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe flatten/paris-30k time: [9.2019 ms 9.2259 ms 9.2511 ms] change: [-2.0521% -1.5976% -1.1616%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild ``` Strip generation ``` render_strips/Ghostscript_Tiger_simd time: [244.27 µs 244.83 µs 245.40 µs] change: [+25.354% +25.929% +26.391%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 50 measurements (2.00%) 1 (2.00%) high severe render_strips/paris-30k_simd time: [26.936 ms 27.010 ms 27.093 ms] change: [+14.664% +15.809% +16.826%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 50 measurements (6.00%) 3 (6.00%) high mild ```
|
Have you already tested whether that fix works? |
Yep! Some more info here: #simd > fearless_simd v0.4 @ 💬. |
8214859 to
37fb31e
Compare
There was a problem hiding this comment.
For testing, since we want to explicitly test the fallback, I think the safest option would be to enable the force_support_fallback feature and then use fallback, like before. This way, we can be sure that it will always use that.
There was a problem hiding this comment.
(This is also why you are getting the CI failures in WASM)
There was a problem hiding this comment.
vello_sparse_tests now requires that feature, and I've reverted the tests that explicitly requested fallback before to do that again.
There was a problem hiding this comment.
For vello_sparse_tests that is, maybe we also want that for tests in vello_cpu and vello_hybrid? Not sure. They currently request baseline.
There was a problem hiding this comment.
I think it's fine for to leave as is for those. The visreg tests are the ones where it actually matters.
There was a problem hiding this comment.
vello_cpu seems to mostly have been constructing Level::try_detect().unwrap_or(Level::fallback()) in its tests. vello_hybrid had a lot of explicit Level::fallback().
Part One of fixing the performance regression caused by bumping to `fearless_simd` 0.4 in linebender#1462. I have a Part 2 that completely fixes the regression, but that probably requires more discussion, whereas the changes here will probably be uncontroversial. Relative to `main` before bumping `fearless_simd`, this now benches as as follows on my x86 machine (i7-13700k). ``` render_strips/Ghostscript_Tiger_simd time: [214.62 µs 214.97 µs 215.34 µs] change: [+10.312% +10.793% +11.168%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 50 measurements (2.00%) 1 (2.00%) high mild render_strips/paris-30k_simd time: [24.577 ms 24.668 ms 24.763 ms] change: [+4.6786% +5.7653% +6.7171%] (p = 0.00 < 0.05) Performance has regressed. ```
Part One of fixing the performance regression caused by bumping to `fearless_simd` 0.4 in linebender#1462. I have a Part 2 that completely fixes the regression, but that probably requires more discussion, whereas the changes here will probably be uncontroversial. Relative to `main` before bumping `fearless_simd`, this now benches as as follows on my x86 machine (i7-13700k). ``` render_strips/Ghostscript_Tiger_simd time: [214.62 µs 214.97 µs 215.34 µs] change: [+10.312% +10.793% +11.168%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 50 measurements (2.00%) 1 (2.00%) high mild render_strips/paris-30k_simd time: [24.577 ms 24.668 ms 24.763 ms] change: [+4.6786% +5.7653% +6.7171%] (p = 0.00 < 0.05) Performance has regressed. ```
…min` semantics) (#1463) Part One relaxes `f32x4::min_precise` to `f32x4::min`, fixing part of the regression caused by bumping to `fearless_simd` 0.4 in #1462. I have a Part 2 that completely fixes the regression, but that probably requires more discussion, whereas the changes here will probably be uncontroversial. Relative to `main` before bumping `fearless_simd`, this now benches as as follows on my x86 machine (i7-13700k). ``` render_strips/Ghostscript_Tiger_simd time: [214.62 µs 214.97 µs 215.34 µs] change: [+10.312% +10.793% +11.168%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 50 measurements (2.00%) 1 (2.00%) high mild render_strips/paris-30k_simd time: [24.577 ms 24.668 ms 24.763 ms] change: [+4.6786% +5.7653% +6.7171%] (p = 0.00 < 0.05) Performance has regressed. ```
This brings a nice little boost to flattening, but badly regresses analytic AA in strip rendering.
I've traced the regression back to
max/max_precisesemantics introduced here: linebender/fearless_simd#136.I'll open a PR based on top of this one to fix the issue.
Benchmarks:
Flattening
Strip generation