Skip to content

Add support for Audio Processing Unit.#19

Open
toots wants to merge 23 commits intolinoscope:mainfrom
toots:apu
Open

Add support for Audio Processing Unit.#19
toots wants to merge 23 commits intolinoscope:mainfrom
toots:apu

Conversation

@toots
Copy link
Contributor

@toots toots commented Jan 10, 2026

I've been experimenting with the claude code assistant to implement audio processing in CAMLBOY.

Please apologize in advance if that is a complicated issue for you.

Here's what I have tried to do:

  • This is an exercise for me to learn and understand how the hardware emulation of the audio layer works.
  • Instead of coding it from scratch, I have been directing the code assistant to use existing code and documentation to generate its own.
Original prompt

This is the repository for CAMLBOY, a gameboy emulator written in OCaml.

The design and implementation of the emulator has been discussed here: https://linoscope.github.io/writing-a-game-boy-emulator-in-ocaml/

During our work, I would like to add support for sounds to the emulator. Here are the key points:

I want the sound support to be as natural as possible w.r.t. the original design.
The original design was also incredibly good at both mapping the original hardware specs and finding the right paradigms from OCaml to match it the most naturally. Let's work the same way.
I also want to use this opportunity to learn how sound is supported, generated and etc on gameboy hardware.

For reference, you can use this code:
https://github.com/LIJI32/SameBoy/blob/master/Core/apu.h and https://github.com/LIJI32/SameBoy/blob/master/Core/apu.c from SameBoy
A description of the gameboy sound chip here: https://gbdev.gg8.se/wiki/articles/Gameboy_sound_hardware

We do not have to support all possible hardware first. We could focus on a reasonable subset and design a great implementation with the understanding that it would be easily extensible to more hardware.

  • I am aware of the maintenance burden that large sections of automatically generated code can create so I have made all efforts to keep each commit self-explanatory.

  • I have also tweaked the following explicitly:

    • When adding the BLEP module to correct audio aliasing, I have explicitly kept the separation between hardware emulation and blep with a functor
    • Decided to let the audio callback drive the latency when running with audio. Since modern hardware have different clocks for audio and CPU while the gameboy has a single clock for both audio and CPU, it makes sense to delegate running the emulation to the audio clock.
    • Did as much code cleanup as possible, in particular collecting constants and etc.

I have reviewed most of the modules and now understand most of what they are doing and the code makes sense to me.

I am still reviewing and updating but I wanted to send now to get your opinion about this approach.

Thanks.

toots and others added 21 commits January 9, 2026 19:04
Add README.md documenting the Game Boy APU architecture and our
implementation approach. Covers:
- Memory map and register layout
- Frame sequencer timing
- Channel signal pipeline
- Module structure for the implementation

This is the first step toward adding sound support to CAMLBOY.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add the APU module with basic register storage and power control:
- Accepts addresses 0xFF10-0xFF26 (sound registers) and 0xFF30-0xFF3F (wave RAM)
- Implements NR52 power on/off behavior (bit 7 controls power)
- When powered off, all registers are cleared and writes are ignored
- Wave RAM can be accessed regardless of power state

Integration:
- Add APU to bus address routing
- Create APU in camlboy and call Apu.run each instruction cycle
- Add unit tests for APU register access and power control

The APU skeleton accepts all sound register reads/writes but does not
yet generate audio. This allows games to run without crashing on
sound register access.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement the frame sequencer that drives modulation at specific intervals:
- Runs at 512 Hz (2048 M-cycles per step)
- 8 steps cycling 0-7
- Step 0,4: Length counter clock
- Step 2,6: Length counter + Sweep clock
- Step 7: Envelope clock

The timing parameters (cpu_freq, frame_seq_freq, tcycles_per_mcycle) are
configurable via optional arguments for testing purposes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement the length counter that can automatically disable channels:
- Clocked at 256 Hz by frame sequencer (steps 0, 2, 4, 6)
- Decrements when enabled, disables channel when reaching 0
- Supports configurable max_length (64 for square/noise, 256 for wave)
- Provides trigger() for channel re-trigger behavior
- load_from_register() converts hardware register format (max - value)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement the volume envelope that automatically adjusts channel volume:
- Clocked at 64 Hz by frame sequencer (step 7)
- Volume ranges 0-15, adjusts by 1 each period
- Direction can be Up or Down
- Period 0 disables envelope (but timer treats 0 as 8)
- Volume clamps at 0 and 15
- is_dac_enabled() checks if DAC would produce output

Register format (NRx2):
- Bits 7-4: Initial volume
- Bit 3: Direction (0=down, 1=up)
- Bits 2-0: Period

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement the square wave channel for pulse wave generation:
- Four duty cycles: 12.5%, 25%, 50%, 75%
- 11-bit frequency controls waveform speed
- Integrates length counter and envelope for modulation
- DAC enable depends on envelope settings
- has_sweep flag distinguishes Square 1 from Square 2

Frequency Timer:
The timer period = (2048 - frequency) M-cycles, where 2048 = 2^11
is one more than the max 11-bit frequency value. This means:
- frequency=0    -> slowest (~512 Hz output)
- frequency=2047 -> fastest (~1 MHz output)

Timer period is clamped to minimum of 1 to prevent infinite loops
when frequency reaches 2048 (which can occur during sweep overflow
calculations before the channel is disabled).

The channel advances through 8 waveform positions, outputting
0 or 1 multiplied by envelope volume to produce samples 0-15.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement the frequency sweep for Square 1 channel:
- Clocked at 128 Hz by frame sequencer (steps 2 and 6)
- Modulates frequency up or down based on shift amount
- Uses shadow register to track calculated frequency
- Overflow (frequency >= 2048) disables the channel
- Period 0 means sweep is disabled
- Tracks negate usage for obscure hardware behavior

Frequency calculation:
  new_freq = shadow_freq +/- (shadow_freq >> shift)

The sweep performs overflow checks both at trigger time (if shift > 0)
and after each sweep calculation (including a second check after updating).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Wire Square 1 and Square 2 channels into the APU with full register handling:

- NR10 (0xFF10): Square 1 sweep - period, negate, shift
- NR11 (0xFF11): Square 1 duty/length
- NR12 (0xFF12): Square 1 envelope
- NR13/NR14: Square 1 frequency (11-bit split) and trigger
- NR21-NR24: Square 2 registers (same as Square 1 minus sweep)
- NR50/NR51: Master volume and panning (stored for later use)
- NR52: Power control and channel status

The APU now:
- Runs frame sequencer to clock length/envelope/sweep
- Processes sweep frequency changes and overflow detection
- Reports channel enable status in NR52 bits 0-1
- Properly resets all channels on power off
- Applies correct read masks (unused bits read as 1)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Audio_buffer module: ring buffer using interleaved stereo bigarray
- Use int16 bigarray format (L R L R ...) for efficient SDL blitting
- Add pop_into with blit for zero-copy transfer to SDL buffers
- Add sample generation at configurable rate (default 44100 Hz)
- Add mixing logic with NR50/NR51 master volume and panning
- Use Int64 fixed-point timing for accurate sample rate
  (prevents integer overflow in js_of_ocaml which uses 32-bit ints)
- Generate silent samples when APU is powered off
- Expose audio buffer access for SDL2 callback

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Wave_channel module for custom 32-sample waveform playback
- Support 4 volume levels (mute, 100%, 50%, 25%)
- Integrate into APU with register read/write support (NR30-NR34)
- Add length counter support for wave channel
- Include wave channel in NR52 status and audio mixing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Noise_channel module with LFSR-based pseudo-random generation
- Support 15-bit and 7-bit LFSR modes
- Configurable polynomial counter (clock shift, divisor code)
- Integrate into APU with register read/write support (NR41-NR44)
- Add length counter and envelope support for noise channel
- Include noise channel in NR52 status and audio mixing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement polyBLEP (polynomial Band-Limited stEP) synthesis to reduce
aliasing artifacts in square wave channels. This provides higher audio
quality by smoothing discontinuities in the waveforms.

Key changes:
- Add Blep module with polyBLEP implementation and functor over hardware
- Add Square_channel.mli exposing hardware state for BLEP
- Add Mixer module parameterized by square channel sampling strategy
- Add --no-blep CLI option to disable BLEP for raw hardware output
- Update APU to support both BLEP and naive sampling modes
- Add comprehensive unit tests for BLEP module

The implementation uses a functor design that keeps hardware modules pure
(implementing only specs) while wrapper modules add emulation adjustments.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Integrate APU audio output into the SDL2 frontend:
- Audio callback pulls samples from APU buffer via direct blit
- Main thread runs emulation, signals via mutex/condition
- Tight lock-step synchronization between audio and main threads
- Performance stats helper tracking FPS and CPU usage
- Frame rate naturally syncs to ~59.7 fps based on audio timing
- Modes: default (audio-sync), 60fps, withtrace, no-throttle
- Optional --no-blep flag to disable band-limited synthesis
- Optional --save-audio flag to save audio to WAV file
- Logs audio configuration at startup

The synchronization strategy between audio callback and main loop
is due to a limitation with how tsdl implements OCaml callbacks.
When calling the main processing loop inside the audio callback, the
program segfaults.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add Web Audio API support to the JS/WASM frontend using ScriptProcessorNode.
The audio callback drives emulation similar to SDL2's audio-sync mode.

- Add web_audio.ml/mli with bindings for AudioContext, ScriptProcessorNode
- Add audio checkbox to UI to toggle between frame-driven and audio-driven modes
- Extract FPS counter into reusable helper (matching SDL2 pattern)
- Audio mode shows buffer fill percentage alongside FPS

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Mark APU as implemented in TODO section
- Document available modes for SDL2 frontend:
  - default (audio-synced), 60fps, withtrace, no-throttle
- Document audio options: --no-blep, --save-audio
- Note audio checkbox in web UI
- Remove outdated footnote about missing APU

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
On trigger, the sample buffer is NOT updated. The first sample played
is whatever was previously in the buffer; the new position 0 sample
isn't read until the waveform advances.

Reference: https://gbdev.gg8.se/wiki/articles/Gameboy_sound_hardware#Trigger_Event

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Using clock shift 14 or 15 results in the LFSR receiving no clocks,
producing a static output.

Reference: https://gbdev.gg8.se/wiki/articles/Gameboy_sound_hardware#Noise_Channel

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Clearing the sweep negate bit in NR10 after at least one sweep
calculation used negate mode (since the last trigger) immediately
disables the channel.

Reference: https://gbdev.gg8.se/wiki/articles/Gameboy_sound_hardware#Frequency_Sweep

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Both envelope and sweep timers treat a period of 0 as 8 for timer
reload purposes. This prevents infinite-speed operation.

Reference: https://gbdev.gg8.se/wiki/articles/Gameboy_sound_hardware

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When triggering a channel with length enabled, if the counter was 0
(being reloaded to max) and the next frame sequencer step won't clock
length, the counter is set to max-1 instead of max.

This affects all channels and is tested for both 64-max (square/noise)
and 256-max (wave) length counters.

Reference: https://gbdev.gg8.se/wiki/articles/Gameboy_sound_hardware#Trigger_Event

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When writing to NRx4 and the next frame sequencer step won't clock
length, if length was previously disabled and is now enabled and
the counter is non-zero, the counter is decremented. If it reaches
0, the channel is disabled.

This quirk interacts with the trigger length reload quirk in subtle
ways - if extra clocking brings the counter to 0, trigger will
reload it to max (or max-1 depending on timing).

Reference: https://gbdev.gg8.se/wiki/articles/Gameboy_sound_hardware#Length_Counter

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@linoscope
Copy link
Owner

@toots Wow, sounds cool, would love to have APU support! And thanks for being upfront about using AI & taking your time to review the code, really appreciate that (wish more people did the same, based on experience in other repos). For complexity, it seems mostly contained within the lib/apu directory, so I think it's ok.

I will be happy to review & merge once you are finished with the review and have tested that it works locally.

Also, haven't checked if it's already supported, but is it possible to run without audio processing? I know that some teams use CAMLBOY for benchmarking, and it might be helpful if we could let them run without audio, so the audio processing won't introduce unexpected benchmark regression.

@toots
Copy link
Contributor Author

toots commented Jan 10, 2026

Also, haven't checked if it's already supported, but is it possible to run without audio processing? I know that some teams use CAMLBOY for benchmarking, and it might be helpful if we could let them run without audio, so the audio processing won't introduce unexpected benchmark regression.

Absolutely! The default for SDL is with audio but all the previous default remain. Current default is renamed to 60fps mode.

Also for web, the default is actually without audio. You need some user interaction to kick start the audio engine on browsers and I have not bothered trying to fix that more cleverly. For now you have to manually check the audio checkbox 🙂

- Explain the aliasing problem for digital audio emulation
- Add visual comparison showing BLEP vs aliased output
- Document why original hardware didn't have this problem
- Explain how polyBLEP works with the polynomial formulas
- Add architecture diagram showing hardware/synthesis separation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@toots
Copy link
Contributor Author

toots commented Jan 16, 2026

I've been re-reading and testing for a while and this seems worth checking out now @linoscope.

I added some documentations and let the comments explaining the code in place. Let me know if you want to change any of that.

Thanks!

@linoscope
Copy link
Owner

@toots Ah, missed this comment. Will take a look next week, thanks!

@toots
Copy link
Contributor Author

toots commented Feb 8, 2026

For sure. Let me know if there is anything I can do to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants