sparse strips: Bump to `fearless_simd` v0.4.0 by tomcur · Pull Request #1462 · linebender/vello

tomcur · 2026-02-20T21:26:33Z

This brings a nice little boost to flattening, but badly regresses analytic AA in strip rendering.

I've traced the regression back to max/max_precise semantics introduced here: linebender/fearless_simd#136.

I'll open a PR based on top of this one to fix the issue.

Benchmarks:

Flattening

flatten/Ghostscript_Tiger
                        time:   [180.70 µs 181.08 µs 181.46 µs]
                        change: [-6.4035% -6.0571% -5.6921%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
flatten/paris-30k       time:   [9.2019 ms 9.2259 ms 9.2511 ms]
                        change: [-2.0521% -1.5976% -1.1616%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Strip generation

render_strips/Ghostscript_Tiger_simd
                        time:   [244.27 µs 244.83 µs 245.40 µs]
                        change: [+25.354% +25.929% +26.391%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high severe
render_strips/paris-30k_simd
                        time:   [26.936 ms 27.010 ms 27.093 ms]
                        change: [+14.664% +15.809% +16.826%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 50 measurements (6.00%)
  3 (6.00%) high mild

This brings a nice little boost to flattening, but badly regresses analytic AA in strip rendering. I've traced the regression back to `max`/`max_precise` semantics introduced here: linebender/fearless_simd#136. I'll open a PR based on top of this one to fix the issue. Benchmarks: Flattening ``` flatten/Ghostscript_Tiger time: [180.70 µs 181.08 µs 181.46 µs] change: [-6.4035% -6.0571% -5.6921%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe flatten/paris-30k time: [9.2019 ms 9.2259 ms 9.2511 ms] change: [-2.0521% -1.5976% -1.1616%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild ``` Strip generation ``` render_strips/Ghostscript_Tiger_simd time: [244.27 µs 244.83 µs 245.40 µs] change: [+25.354% +25.929% +26.391%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 50 measurements (2.00%) 1 (2.00%) high severe render_strips/paris-30k_simd time: [26.936 ms 27.010 ms 27.093 ms] change: [+14.664% +15.809% +16.826%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 50 measurements (6.00%) 3 (6.00%) high mild ```

LaurenzV · 2026-02-20T21:38:32Z

Have you already tested whether that fix works?

tomcur · 2026-02-20T21:39:12Z

Have you already tested whether that fix works?

Yep! Some more info here: #simd > fearless_simd v0.4 @ 💬.

LaurenzV · 2026-02-20T21:45:48Z

sparse_strips/vello_dev_macros/src/test.rs

For testing, since we want to explicitly test the fallback, I think the safest option would be to enable the force_support_fallback feature and then use fallback, like before. This way, we can be sure that it will always use that.

(This is also why you are getting the CI failures in WASM)

vello_sparse_tests now requires that feature, and I've reverted the tests that explicitly requested fallback before to do that again.

For vello_sparse_tests that is, maybe we also want that for tests in vello_cpu and vello_hybrid? Not sure. They currently request baseline.

I think it's fine for to leave as is for those. The visreg tests are the ones where it actually matters.

vello_cpu seems to mostly have been constructing Level::try_detect().unwrap_or(Level::fallback()) in its tests. vello_hybrid had a lot of explicit Level::fallback().

Part One of fixing the performance regression caused by bumping to `fearless_simd` 0.4 in linebender#1462. I have a Part 2 that completely fixes the regression, but that probably requires more discussion, whereas the changes here will probably be uncontroversial. Relative to `main` before bumping `fearless_simd`, this now benches as as follows on my x86 machine (i7-13700k). ``` render_strips/Ghostscript_Tiger_simd time: [214.62 µs 214.97 µs 215.34 µs] change: [+10.312% +10.793% +11.168%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 50 measurements (2.00%) 1 (2.00%) high mild render_strips/paris-30k_simd time: [24.577 ms 24.668 ms 24.763 ms] change: [+4.6786% +5.7653% +6.7171%] (p = 0.00 < 0.05) Performance has regressed. ```

…min` semantics) (#1463) Part One relaxes `f32x4::min_precise` to `f32x4::min`, fixing part of the regression caused by bumping to `fearless_simd` 0.4 in #1462. I have a Part 2 that completely fixes the regression, but that probably requires more discussion, whereas the changes here will probably be uncontroversial. Relative to `main` before bumping `fearless_simd`, this now benches as as follows on my x86 machine (i7-13700k). ``` render_strips/Ghostscript_Tiger_simd time: [214.62 µs 214.97 µs 215.34 µs] change: [+10.312% +10.793% +11.168%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 50 measurements (2.00%) 1 (2.00%) high mild render_strips/paris-30k_simd time: [24.577 ms 24.668 ms 24.763 ms] change: [+4.6786% +5.7653% +6.7171%] (p = 0.00 < 0.05) Performance has regressed. ```

Level fixes

37fb31e

tomcur force-pushed the push-vzsvxnpmplwo branch from 8214859 to 37fb31e Compare February 20, 2026 21:41

LaurenzV reviewed Feb 20, 2026

View reviewed changes

tomcur added 2 commits February 20, 2026 23:01

Test fallback

e678cd5

Clippy, fmt

18da368

LaurenzV approved these changes Feb 20, 2026

View reviewed changes

tomcur enabled auto-merge February 20, 2026 22:15

tomcur added this pull request to the merge queue Feb 20, 2026

Merged via the queue into linebender:main with commit 953a475 Feb 20, 2026
17 checks passed

tomcur deleted the push-vzsvxnpmplwo branch February 20, 2026 22:31

tomcur mentioned this pull request Feb 20, 2026

vello_common: Part One of the strip rendering regression fix (relax min semantics) #1463

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

sparse strips: Bump to `fearless_simd` v0.4.0#1462

sparse strips: Bump to `fearless_simd` v0.4.0#1462
tomcur merged 4 commits intolinebender:mainfrom
tomcur:push-vzsvxnpmplwo

tomcur commented Feb 20, 2026

Uh oh!

LaurenzV commented Feb 20, 2026

Uh oh!

tomcur commented Feb 20, 2026

Uh oh!

LaurenzV Feb 20, 2026

Uh oh!

LaurenzV Feb 20, 2026

Uh oh!

tomcur Feb 20, 2026

Uh oh!

tomcur Feb 20, 2026

Uh oh!

LaurenzV Feb 20, 2026

Uh oh!

tomcur Feb 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

tomcur commented Feb 20, 2026

Uh oh!

LaurenzV commented Feb 20, 2026

Uh oh!

tomcur commented Feb 20, 2026

Uh oh!

LaurenzV Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

LaurenzV Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

tomcur Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

tomcur Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

LaurenzV Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

tomcur Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tomcur Feb 20, 2026 •

edited

Loading