fix(standalone): enable flush-to-zero on the JACK RT thread#252
Closed
OpenSauce wants to merge 2 commits into
Closed
fix(standalone): enable flush-to-zero on the JACK RT thread#252OpenSauce wants to merge 2 commits into
OpenSauce wants to merge 2 commits into
Conversation
Denormal (subnormal) float arithmetic is extremely slow, especially on ARM (Raspberry Pi). As signals decay toward silence, the IR convolver and filter tails can drive intermediate values into the denormal range, causing erratic CPU spikes that don't track IR length — some IRs run fine, others struggle, with no relation to how heavily they're trimmed. There was no global flush-to-zero anywhere; only a few amp stages manually flush their state at 1e-20 (itself a normal f32, and not covering the convolver). The VST3/CLAP plugin already gets FTZ from nih-plug's process wrapper, but the standalone JACK thread set nothing. Set the CPU flush-to-zero flag on the JACK process thread (MXCSR bit 15 on x86 SSE, FPCR bit 24 on AArch64), mirroring nih-plug's approach via inline asm since Rust 1.75 deprecated the _mm_setcsr intrinsics. Idempotent and cheap, so it runs each process callback. The per-stage manual flushes stay as belt-and-suspenders. Refs #251.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses standalone JACK real-time audio performance spikes by enabling CPU flush-to-zero behavior for denormal floating-point values on the audio process thread.
Changes:
- Adds a standalone
audio::denormalsmodule with architecture-specific inline assembly. - Calls denormal handling at the start of each JACK
process()callback. - Exposes the new module from
audio::mod.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
rustortion-standalone/src/audio/mod.rs |
Exposes the new denormal-handling module. |
rustortion-standalone/src/audio/jack.rs |
Enables denormal handling at the top of the JACK RT callback. |
rustortion-standalone/src/audio/denormals.rs |
Implements FTZ setup for x86 SSE and AArch64. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+24
to
+33
| // MXCSR bit 15 = Flush-To-Zero. | ||
| const SSE_FTZ_BIT: u32 = 1 << 15; | ||
| let mut mxcsr: u32 = 0; | ||
| // SAFETY: stmxcsr/ldmxcsr only read/write the current thread's MXCSR register. | ||
| unsafe { | ||
| std::arch::asm!("stmxcsr [{}]", in(reg) std::ptr::addr_of_mut!(mxcsr)); | ||
| if mxcsr & SSE_FTZ_BIT == 0 { | ||
| let updated = mxcsr | SSE_FTZ_BIT; | ||
| std::arch::asm!("ldmxcsr [{}]", in(reg) std::ptr::addr_of!(updated)); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #251.
Problem
IR cabinet performance on the Raspberry Pi is inconsistent in a way that doesn't track IR length — some IRs run fine, others struggle, with no relation to how heavily they're trimmed. Root cause is denormal (subnormal) float arithmetic: as signals decay toward silence, the IR convolver and filter tails drive intermediate values into the denormal range, which is ~10–100× slower per op on ARM without flush-to-zero.
There was no global FTZ anywhere. A few amp stages (
eq,delay,reverb,poweramp) manually flush their state at1e-20, but that's incomplete: it doesn't cover the convolver, and1e-20is itself a normalf32. The VST3/CLAP plugin already gets FTZ from nih-plug's process wrapper — only the standalone JACK thread set nothing.Fix
Set the CPU flush-to-zero flag on the JACK process thread:
Mirrors nih-plug's
ScopedFtzapproach using inline asm (Rust 1.75 deprecated the_mm_setcsrintrinsics). Idempotent and cheap (a register read + conditional write), so it runs at the top of each process callback. The existing per-stage manual flushes stay as belt-and-suspenders.Scope / notes
make lintclean (the inline-asm operands neededaddr_of!to satisfy pedantic clippy).I couldn't measure the Pi effect from here, but FTZ on the RT thread is standard, non-optional real-time-audio practice and is correct regardless.