Skip to content

Comments

sparse strips: Bump to fearless_simd v0.4.0#1462

Merged
tomcur merged 4 commits intolinebender:mainfrom
tomcur:push-vzsvxnpmplwo
Feb 20, 2026
Merged

sparse strips: Bump to fearless_simd v0.4.0#1462
tomcur merged 4 commits intolinebender:mainfrom
tomcur:push-vzsvxnpmplwo

Conversation

@tomcur
Copy link
Member

@tomcur tomcur commented Feb 20, 2026

This brings a nice little boost to flattening, but badly regresses analytic AA in strip rendering.

I've traced the regression back to max/max_precise semantics introduced here: linebender/fearless_simd#136.

I'll open a PR based on top of this one to fix the issue.

Benchmarks:

Flattening

flatten/Ghostscript_Tiger
                        time:   [180.70 µs 181.08 µs 181.46 µs]
                        change: [-6.4035% -6.0571% -5.6921%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
flatten/paris-30k       time:   [9.2019 ms 9.2259 ms 9.2511 ms]
                        change: [-2.0521% -1.5976% -1.1616%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Strip generation

render_strips/Ghostscript_Tiger_simd
                        time:   [244.27 µs 244.83 µs 245.40 µs]
                        change: [+25.354% +25.929% +26.391%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high severe
render_strips/paris-30k_simd
                        time:   [26.936 ms 27.010 ms 27.093 ms]
                        change: [+14.664% +15.809% +16.826%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 50 measurements (6.00%)
  3 (6.00%) high mild

This brings a nice little boost to flattening, but badly regresses
analytic AA in strip rendering.

I've traced the regression back to `max`/`max_precise` semantics
introduced here: linebender/fearless_simd#136.

I'll open a PR based on top of this one to fix the issue.

Benchmarks:

Flattening

```
flatten/Ghostscript_Tiger
                        time:   [180.70 µs 181.08 µs 181.46 µs]
                        change: [-6.4035% -6.0571% -5.6921%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
flatten/paris-30k       time:   [9.2019 ms 9.2259 ms 9.2511 ms]
                        change: [-2.0521% -1.5976% -1.1616%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
```

Strip generation

```
render_strips/Ghostscript_Tiger_simd
                        time:   [244.27 µs 244.83 µs 245.40 µs]
                        change: [+25.354% +25.929% +26.391%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high severe
render_strips/paris-30k_simd
                        time:   [26.936 ms 27.010 ms 27.093 ms]
                        change: [+14.664% +15.809% +16.826%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 50 measurements (6.00%)
  3 (6.00%) high mild
```
@LaurenzV
Copy link
Collaborator

Have you already tested whether that fix works?

@tomcur
Copy link
Member Author

tomcur commented Feb 20, 2026

Have you already tested whether that fix works?

Yep! Some more info here: #simd > fearless_simd v0.4 @ 💬.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing, since we want to explicitly test the fallback, I think the safest option would be to enable the force_support_fallback feature and then use fallback, like before. This way, we can be sure that it will always use that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is also why you are getting the CI failures in WASM)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vello_sparse_tests now requires that feature, and I've reverted the tests that explicitly requested fallback before to do that again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For vello_sparse_tests that is, maybe we also want that for tests in vello_cpu and vello_hybrid? Not sure. They currently request baseline.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine for to leave as is for those. The visreg tests are the ones where it actually matters.

Copy link
Member Author

@tomcur tomcur Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vello_cpu seems to mostly have been constructing Level::try_detect().unwrap_or(Level::fallback()) in its tests. vello_hybrid had a lot of explicit Level::fallback().

@tomcur tomcur enabled auto-merge February 20, 2026 22:15
@tomcur tomcur added this pull request to the merge queue Feb 20, 2026
Merged via the queue into linebender:main with commit 953a475 Feb 20, 2026
17 checks passed
@tomcur tomcur deleted the push-vzsvxnpmplwo branch February 20, 2026 22:31
tomcur added a commit to tomcur/vello that referenced this pull request Feb 20, 2026
Part One of fixing the performance regression caused by bumping to
`fearless_simd` 0.4 in linebender#1462.

I have a Part 2 that completely fixes the regression, but that probably
requires more discussion, whereas the changes here will probably be
uncontroversial.

Relative to `main` before bumping `fearless_simd`, this now benches as
as follows on my x86 machine (i7-13700k).

```
render_strips/Ghostscript_Tiger_simd
                        time:   [214.62 µs 214.97 µs 215.34 µs]
                        change: [+10.312% +10.793% +11.168%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
render_strips/paris-30k_simd
                        time:   [24.577 ms 24.668 ms 24.763 ms]
                        change: [+4.6786% +5.7653% +6.7171%] (p = 0.00 < 0.05)
                        Performance has regressed.
```
tomcur added a commit to tomcur/vello that referenced this pull request Feb 20, 2026
Part One of fixing the performance regression caused by bumping to
`fearless_simd` 0.4 in linebender#1462.

I have a Part 2 that completely fixes the regression, but that probably
requires more discussion, whereas the changes here will probably be
uncontroversial.

Relative to `main` before bumping `fearless_simd`, this now benches as
as follows on my x86 machine (i7-13700k).

```
render_strips/Ghostscript_Tiger_simd
                        time:   [214.62 µs 214.97 µs 215.34 µs]
                        change: [+10.312% +10.793% +11.168%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
render_strips/paris-30k_simd
                        time:   [24.577 ms 24.668 ms 24.763 ms]
                        change: [+4.6786% +5.7653% +6.7171%] (p = 0.00 < 0.05)
                        Performance has regressed.
```
github-merge-queue bot pushed a commit that referenced this pull request Feb 21, 2026
…min` semantics) (#1463)

Part One relaxes `f32x4::min_precise` to `f32x4::min`, fixing part of
the regression caused by bumping to `fearless_simd` 0.4 in
#1462.

I have a Part 2 that completely fixes the regression, but that probably
requires more discussion, whereas the changes here will probably be
uncontroversial.

Relative to `main` before bumping `fearless_simd`, this now benches as
as follows on my x86 machine (i7-13700k).

```
render_strips/Ghostscript_Tiger_simd
                        time:   [214.62 µs 214.97 µs 215.34 µs]
                        change: [+10.312% +10.793% +11.168%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
render_strips/paris-30k_simd
                        time:   [24.577 ms 24.668 ms 24.763 ms]
                        change: [+4.6786% +5.7653% +6.7171%] (p = 0.00 < 0.05)
                        Performance has regressed.
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants