The audio path lives in src-tauri/src/audio/. It is a 3-thread lock-free pipeline; see audio architecture for the wider topology and invariants.
- Decoder β
symphonia 0.5over MP3, FLAC, WAV, OGG Vorbis, AAC, ALAC (M4A). Source samples are converted to interleavedf32, channel-mapped (mono β stereo, 5.1 β stereo Lo/Ro per ITU BS.775), then resampled to the device rate byrubato 2.0(Fft<f32>+FixedSync::Input, with a fastPassthroughvariant when source rate already matches the device). - DSD pipeline β symphonia 0.5 doesn't decode 1-bit DSD, so DSF (Sony) and DFF (Philips) containers route through
audio/dsd/: a custom container parser reads the layout (DSD64 β DSD1024, mono / stereo / multichannel), and a 256-tap windowed-sinc FIR with a Blackman-Harris envelope decimates the bitstream by 64 to land DSD64 at 44.1 kHz, DSD128 at 88.2 kHz, etc. The resulting PCM joins the same channel-convert + resample + ring-buffer pipeline as symphonia output.ActiveStreamcarries aStreamBackendenum (Symphonia / Dsd) so seeking and decoder reset stay uniform from the engine's perspective. Limitation: real audiophile players use multi-stage halfband cascades for lower CPU at the same SNR; ours prioritises code clarity. DoP (DSD-over-PCM) is not yet wired β the converter always produces PCM. - Output β
cpal 0.17on a dedicated thread becausecpal::Streamis!Sendon Windows. Samples cross the thread via anrtrb 0.3SPSC ring (RING_CAPACITY = 96 000f32s β 1 s @ 48 kHz stereo). - Hot-path rules β the cpal callback never allocates, locks or logs. It only reads the
rtrb::ConsumerandAtomic*fields inSharedPlayback.
Real-time FFT bars surfaced in the immersive Now Playing overlay. Implementation:
- Backend:
audio/spectrum.rsruns on the decoder thread (NOT in the cpal callback β too constrained). Post-EQ samples go throughSpectrumAnalyzer::feed, which mono-mixes, applies a Hann window, runs a 2048-pt real FFT viarealfft, then buckets the magnitudes into 48 log-spaced bands (30 Hz β 16 kHz). 50% overlap between successive frames so the visual feels continuous. Throttled to ~30 Hz via a manualInstantclock. - Output is a
player:spectrumTauri event carrying aVec<f32>of normalised band magnitudes (0..1, peaks may briefly overshoot). - A
SharedPlayback::visualizer_enabledatomic gates the entire path: when off,feedreturns at the first atomic load β zero allocations, zero FFT cost. Persisted inprofile_setting['ui.visualizer'], default OFF. - Frontend:
SpectrumVisualizersubscribes to the event and drives a<canvas>withrequestAnimationFrame. Asymmetric decay (jump up fast, fall slow) so transients pop without making the bars look glitchy. Auto-fades to zero on pause so the bars don't freeze mid-pose.
Real dual-decoder mix in crossfade.rs. When the user enables crossfade, the decoder maintains two ActiveStreams during the fade window and feeds an equal-power gain pair (cos(tΒ·Ο/2) / sin(tΒ·Ο/2)) into each so the summed RMS stays flat β no mid-fade dip. The window is clamped to min(user_ms, duration / 2) so 30 s clips with a 12 s setting don't start mixing at the 18 s mark.
A separate SharedPlayback::smart_crossfade_enabled toggle (default OFF β opt-in because it's an opinionated behaviour change, persisted in profile_setting['audio.smart_crossfade']) suppresses the fade for two consecutive tracks belonging to the same album β concept records / live sets hand off naturally instead of getting smeared. Mechanism:
- The analytics worker's
PrefetchNexthandler looks up the current track'salbum_idand the upcoming track'salbum_idin a single SQLite round trip and writes the boolean result toSharedPlayback::pending_next_same_albumright before sendingSetNextTrack. - The decoder, at mix-decision time, checks both atomics: if smart crossfade is on AND the prefetched track shares an album, it skips the mix branch and falls through to the existing gapless EOF swap (which already handles a sample-accurate hand-off when
pending_next.is_some()). - The hint is naturally one-shot: each new prefetch overwrites it, and
LoadAndPlaypaths (manual user clicks) don't go through the mix decision at all, so a stale value can't bleed into an unrelated transition.
A separate SharedPlayback::dynamic_crossfade_enabled toggle (default OFF, persisted in profile_setting['audio.dynamic_crossfade']) scales each upcoming fade by the BPM gap between the current and next tracks. Same one-shot hint pattern as smart crossfade:
- The analytics
PrefetchNexthandler readstrack_analysis.bpmfor both tracks. If either is missing or zero, no override is written and the decoder falls back to the user's staticcrossfade_ms. - When both BPMs are known, the worker scales
crossfade_msby a tier factor (β€8 BPM gap β 100%, β€20 β 75%, β€40 β 50%, otherwise 30%) with a 1500 ms floor (clamped against the base when the user picked a shorter window). The result lands inSharedPlayback::pending_next_crossfade_msright beforeSetNextTrack. - The decoder reads the override as the effective
cf_mswhen non-zero and clears it the instant the mix actually starts so the next prefetch starts from a clean slate. Toggling dynamic OFF also clears any in-flight override so the next transition snaps back to the static window immediately.
Smart and dynamic crossfade compose: the album skip wins (it's a hard "no fade" decision); when the album differs, the dynamic scaling applies.
ReplayGain is applied per-stream before the mix so the two tracks can have very different gains without the louder one swamping the fade.
format.seek() + decoder.reset() + resampler.flush(). The cpal callback enters drain_silent mode, which (since 70c1968) drains the ring in one bulk while consumer.pop() pass instead of one sample per output slot β total perceived gap on seek dropped from ~270 ms (one full ring at 44.1 kHz Γ 8 ch) to ~10-15 ms (one cpal callback period).
After the drain, MP3 sources will emit a few invalid main_data_begin, underflow warnings from symphonia: the bit reservoir is invalidated by the seek and the codec recovers within 3-4 frames. Inherent to the format; not a bug.
commands/player.rs::list_output_devices β cpal device enumeration. The display name uses description().extended()[0] (Windows DEVPKEY_Device_FriendlyName β Speakers (Logitech PRO X Wireless Gaming Headset)) instead of description().name() (DEVPKEY_Device_DeviceDesc β just Speakers) so multiple endpoints in the same device class stay distinguishable.
The chosen device's name is persisted in profile_setting['audio.output_device']. lib.rs::setup reads it during boot and forwards it to the audio engine, so playback resumes on the user's preferred sink without waiting for the frontend to settle.
On Linux, enumeration uses ALSA's hint database (snd_device_name_hint("pcm")) instead of cpal's output_devices() to avoid a 1-2 s freeze + pcm_dmix / pcm_route stderr spam from probing every PCM card.
media_controls.rs bridges the engine to souvlaki 0.8:
- Windows β SMTC. Now-Playing artwork is served to SMTC over a tiny localhost HTTP shim because Windows expects a URL, not a file path.
- Linux β MPRIS via D-Bus.
- macOS β MediaRemote (NowPlayingInfoCenter).
Initialised after the main window exists (needs an HWND on Windows). State transitions are driven through transition_state() so the OS overlay flips at the same instant as the in-app controls; the brief Loading state is skipped to avoid a 50 ms "controls flash off" between tracks.
The same transition_state() hook also feeds discord_presence.rs so the user's Discord profile mirrors the playing/paused state. Documented separately under Integrations β Discord Rich Presence.
Resampler-shift approach β same trick VLC uses for its default playback rate, costs ~zero CPU and works uniformly across every codec (symphonia + DSD). Pitch is NOT preserved: 1.5Γ speed lifts the pitch by ~7 semitones. Proper pitch-locked time-stretching needs a phase vocoder; this is out of scope for the MVP.
The decoder feeds rubato a fake source rate of actual_rate Γ speed. Each cpal output sample then represents speed source samples of audio, so the device clock plays the track faster (speed > 1) or slower (speed < 1) without changing the device's real sample rate. Concretely:
SharedPlayback::playback_speed_bits(AtomicU32holdingf32::to_bits, clamped to[0.5, 2.0]).SharedPlayback::speed_dirtyβ flipped byset_playback_speed; the decoder polls it once per'pktloop iteration and rebuilds every active stream's resampler (primary + crossfade prefetched secondary). Rebuild cost is a singleResampler::newcall; rubato'sFft<f32>is fixed-rate and can't be reconfigured in place.- Local already-resampled buffers (
primary_resampled,secondary_resampled) are cleared on rebuild so old-speed samples don't get pushed alongside new-speed ones, anddrain_silentflushes the rtrb ring so the audible transition is < 20 ms. ActiveStreamcaches its truesrc_sample_ratethe first timedecode_nextbuilds a resampler so subsequent rebuilds (mid-track speed change) know what to multiply by. New tracks (LoadAndPlay,SetNextTrack) inherit the active speed before their first decode, so the lazy resampler init picks the right effective rate from packet #1.
set_playback_speed snapshots the current position at the old speed, rebases samples_played to 0 and stores the snapshot in base_offset_ms before flipping the speed atomic. Without this, the next call to current_position_ms() would re-scale the existing samples_played counter by the new factor β the progress bar would jump backwards (slowing down) or forwards (speeding up) at the exact moment the user changed speed. Tested in audio/state.rs::speed_change_preserves_position_continuity.
Both current_position_ms() and session_listened_ms() multiply the wall-clock delta by the active speed, so analytics credit and the 15 s "Recently played" threshold fire on track-time covered, not wall-clock listened. Listening to a 6 min track at 2Γ for 3 min wall-clock counts as 6 min of that track for the heatmap / Top Tracks aggregates.
profile_setting['audio.playback_speed'] (float). Restored at boot in player_get_state via a raw atomic write β NOT through set_playback_speed, because the rebase would otherwise move the persisted resume point off the persisted value. Tauri surface: player_set_speed(value) + player_get_speed. Frontend hydrates via playerGetSpeed on mount.
Speed lives inside the player-bar overflow ("β―") menu β range slider (step 0.05) + five preset buttons (0.75 / 1 / 1.25 / 1.5 / 2) β rather than a dedicated pill, since most users never touch it. When speed β 1Γ, the "β―" trigger surfaces a compact 1.25Γ badge in emerald so the user keeps a live indicator without opening the menu. Hidden entirely in Spotify mode (the Web Playback SDK has no speed control).
Musicolet-style intra-track loop. Two AtomicU64 endpoints on SharedPlayback (loop_a_ms, loop_b_ms) β when both are set and b > a, the decoder loop in audio/decoder.rs::play_track checks the playhead once per packet and seeks back to A whenever it crosses B. Skipped during a crossfade because the loop is a single-track concern (looping mid-fade would fight the cross-track mix). Auto-cleared on every LoadAndPlay so the new track doesn't inherit stale endpoints from the previous one.
Three commands cover the lifecycle: player_set_ab_loop (set one or both endpoints), player_clear_ab_loop, player_get_ab_loop. Each one emits player:ab-loop so the UI button + ProgressBar markers stay in sync across views without polling.
UI is a tri-state click cycle in AbLoopButton β idle β A captured (amber) β A+B armed (emerald) β clear β with an "A" / "AB" badge over the icon. The PlayerBar's ProgressBar renders the endpoints as coloured pin markers (amber A, rose B) with a tinted region between them so the loop is legible at a glance. By default the button lives in the player-bar overflow ("β―") menu wrapped as a labelled row; pinning it to a primary slot is a one-click toggle in Settings β Lecture (profile_setting['ui.show_ab_loop']).
queue.rs β persistent SQLite-backed queue with shuffle (Fisher-Yates with seeded xorshift), repeat (off/all/one), auto-advance and drag-and-drop reorder. The frontend operates on a virtualised list so a 6000-track shuffle doesn't lock the UI.