Skip to content

feat: Add streaming support #3

@SuaveIV

Description

@SuaveIV

What's missing

Every audio command currently works on files. You point it at a path, it does its thing, done. That's fine for basic use but it means audio is a dead end in the pipeline — you can't feed data in from an HTTP request, you can't chain commands together, and you can't build anything more complex without touching the filesystem.

The --data flag on sound make already spits out raw bytes, which is a start. But sound play can't consume them. The two halves exist and don't talk to each other.

What streaming would look like

sound make 1000 200ms --data | sound play
open audio.mp3 | sound play
http get https://example.com/audio.mp3 | sound play
sound decode audio.flac | sound play

That last one is new — an explicit sound decode command that takes a file or binary input in a container format and outputs a raw audio stream. This makes decoding a visible pipeline step rather than something that happens silently inside sound play.

Wire format

Rather than a custom binary format, the pipeline stream format is a valid MKV file — an A_PCM/FLOAT/IEEE audio track carrying the raw f32 samples, plus a Matroska Tags block carrying whatever metadata was read from the source file.

This is a real format that other tools already understand, which matters a lot for the ffmpeg interop story. It also means tags ride along for free through any chain of commands, since they're just part of the container.

The read side is already handled by Symphonia — symphonia-format-mkv explicitly supports A_PCM/FLOAT/IEEE at 32-bit. The write side is covered by mkv-element, which provides typed Rust structs for every MKV element and a write_to() method on any std::io::Write. Constructing the stream is just building a Segment with a Tracks entry, a Tags block, and Cluster blocks as samples come in.

Where the actual work is

Rodio's Source trait is already an iterator. Symphonia already decodes by packet. The pieces are there.

The real work is the Nushell boundary. nu-protocol has ByteStream for binary pipeline data, and bridging between that and a Source implementation is the core task:

  • On the write side: a helper that consumes any Source and writes it out as a streaming MKV via mkv-element, flushing clusters incrementally rather than buffering everything
  • On the read side: a PcmStream struct that implements Source by reading the MKV stream via Symphonia's MkvReader

Tags come along automatically — when writing, Lofty-read tag data gets serialized into the MKV Tags block. When reading back, Symphonia's existing tag parsing picks them up from the same place.

Error handling across the boundary needs attention. A truncated or malformed stream mid-playback should produce a clean LabeledError, not a panic.

Command changes

sound play — file path argument becomes optional when binary input is present. Detects whether input is the MKV stream format or another container format and routes accordingly.

sound make--data changes to output MKV stream format instead of WAV, so it composes naturally with other commands. New --wav flag for when you actually want a WAV file for disk.

sound decode — new command. Accepts a file path or binary pipeline input in any supported container format, outputs MKV stream with tags preserved. Handles Opus streams by routing through the Opus decoder wrapper rather than Symphonia's codec path — this covers the common case of yt-dlp downloads, which default to Opus-in-WebM.

Dependencies

  • mkv-element for writing the stream format. Symphonia already handles reading it.
  • ffmpeg-next for gap formats Symphonia can't decode — Opus, HE-AAC, WMA, AC3, DTS. The tiered decode path is: try Symphonia first, fall through to ffmpeg-next for everything else. Requires FFmpeg system libraries to be installed; if missing, sound decode should error clearly rather than silently fail. This is a reasonable assumption for the yt-dlp use case and likely already satisfied on most systems where this plugin is useful.
  • Everything else is already in the tree.

Why this comes first

The manipulation and analysis work planned for later both want to operate on a sample stream. Building them against files first would mean refactoring them again once streaming arrives. Getting the streaming interface settled now means everything that comes after it just slots in.


References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions