Skip to content

fix(standalone): enable flush-to-zero on the JACK RT thread#252

Closed
OpenSauce wants to merge 2 commits into
mainfrom
fix/standalone-ftz-denormals
Closed

fix(standalone): enable flush-to-zero on the JACK RT thread#252
OpenSauce wants to merge 2 commits into
mainfrom
fix/standalone-ftz-denormals

Conversation

@OpenSauce
Copy link
Copy Markdown
Owner

Closes #251.

Problem

IR cabinet performance on the Raspberry Pi is inconsistent in a way that doesn't track IR length — some IRs run fine, others struggle, with no relation to how heavily they're trimmed. Root cause is denormal (subnormal) float arithmetic: as signals decay toward silence, the IR convolver and filter tails drive intermediate values into the denormal range, which is ~10–100× slower per op on ARM without flush-to-zero.

There was no global FTZ anywhere. A few amp stages (eq, delay, reverb, poweramp) manually flush their state at 1e-20, but that's incomplete: it doesn't cover the convolver, and 1e-20 is itself a normal f32. The VST3/CLAP plugin already gets FTZ from nih-plug's process wrapper — only the standalone JACK thread set nothing.

Fix

Set the CPU flush-to-zero flag on the JACK process thread:

  • x86 SSE: MXCSR bit 15
  • AArch64: FPCR bit 24

Mirrors nih-plug's ScopedFtz approach using inline asm (Rust 1.75 deprecated the _mm_setcsr intrinsics). Idempotent and cheap (a register read + conditional write), so it runs at the top of each process callback. The existing per-stage manual flushes stay as belt-and-suspenders.

Scope / notes

  • Standalone only — the plugin is already covered by nih-plug.
  • No behavioural change on desktop beyond consistency; the win is on ARM.
  • Complements IR cabinet: make convolver type and IR trim length user-configurable #250 (configurable convolver type + trim length): FTZ makes FIR cost consistent, that issue lets users tune the absolute level.
  • Verified locally: make lint clean (the inline-asm operands needed addr_of! to satisfy pedantic clippy).

I couldn't measure the Pi effect from here, but FTZ on the RT thread is standard, non-optional real-time-audio practice and is correct regardless.

Denormal (subnormal) float arithmetic is extremely slow, especially on ARM
(Raspberry Pi). As signals decay toward silence, the IR convolver and filter
tails can drive intermediate values into the denormal range, causing erratic
CPU spikes that don't track IR length — some IRs run fine, others struggle, with
no relation to how heavily they're trimmed.

There was no global flush-to-zero anywhere; only a few amp stages manually flush
their state at 1e-20 (itself a normal f32, and not covering the convolver). The
VST3/CLAP plugin already gets FTZ from nih-plug's process wrapper, but the
standalone JACK thread set nothing.

Set the CPU flush-to-zero flag on the JACK process thread (MXCSR bit 15 on x86
SSE, FPCR bit 24 on AArch64), mirroring nih-plug's approach via inline asm since
Rust 1.75 deprecated the _mm_setcsr intrinsics. Idempotent and cheap, so it runs
each process callback. The per-stage manual flushes stay as belt-and-suspenders.

Refs #251.
Copilot AI review requested due to automatic review settings May 31, 2026 15:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses standalone JACK real-time audio performance spikes by enabling CPU flush-to-zero behavior for denormal floating-point values on the audio process thread.

Changes:

  • Adds a standalone audio::denormals module with architecture-specific inline assembly.
  • Calls denormal handling at the start of each JACK process() callback.
  • Exposes the new module from audio::mod.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
rustortion-standalone/src/audio/mod.rs Exposes the new denormal-handling module.
rustortion-standalone/src/audio/jack.rs Enables denormal handling at the top of the JACK RT callback.
rustortion-standalone/src/audio/denormals.rs Implements FTZ setup for x86 SSE and AArch64.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +24 to +33
// MXCSR bit 15 = Flush-To-Zero.
const SSE_FTZ_BIT: u32 = 1 << 15;
let mut mxcsr: u32 = 0;
// SAFETY: stmxcsr/ldmxcsr only read/write the current thread's MXCSR register.
unsafe {
std::arch::asm!("stmxcsr [{}]", in(reg) std::ptr::addr_of_mut!(mxcsr));
if mxcsr & SSE_FTZ_BIT == 0 {
let updated = mxcsr | SSE_FTZ_BIT;
std::arch::asm!("ldmxcsr [{}]", in(reg) std::ptr::addr_of!(updated));
}
@OpenSauce OpenSauce closed this Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable flush-to-zero (FTZ/DAZ) on the real-time audio thread

2 participants