fix(timing): correct DMA speed, RETI IME enable, and DMA start delay#21
Merged
eddmann merged 7 commits intoNov 9, 2025
Merged
Conversation
668dd75 to
52b9148
Compare
What was implemented: - OAM DMA now correctly converts T-cycles to M-cycles (was 4x too fast) - RETI instruction enables IME immediately (not delayed like EI) - OAM DMA has proper 1 M-cycle startup delay before first byte - Comprehensive ROM check analysis documenting root causes Why this approach: - Aligns with SameBoy reference implementation - Necessary prerequisites for M-cycle accurate CPU - Fixes are technically correct even though tests don't pass yet - All failing Mooneye tests require M-cycle granular CPU execution Test results: - Blargg: 12/12 passing (100%) ✅ No regression - Mooneye: 10/39 passing (25.6%) - No change (expected) - Mooneye tests require M-cycle accurate CPU to pass Technical details: - DMA tick() now uses intdiv($cycles, 4) to convert T-cycles - Added Cpu::setIMEImmediate() for RETI vs EI distinction - DMA startup delay tracked in $dmaDelay property - Created ROM_CHECK_ANALYSIS.md with detailed comparison to SameBoy - Created QUICK_WINS_ANALYSIS.md documenting why fixes don't improve tests Root cause analysis: - All failing tests check sub-instruction timing at M-cycle boundaries - Current CPU executes instructions atomically (all-at-once) - Tests like call_timing.gb check if interrupts fire between M-cycles - No peripheral fix will help without M-cycle accurate CPU refactor Next steps: - Implement M-cycle accurate CPU (24-40 hours, +12-20 tests) - Implement timer bit-selection model (4-6 hours, +4-8 tests) - Implement TIMA reload state machine (2-3 hours, +2-4 tests) - Realistic target: 50-85% Mooneye pass rate after CPU refactor References: - docs/ROM_CHECK_ANALYSIS.md (comprehensive comparison with SameBoy) - docs/QUICK_WINS_ANALYSIS.md (post-mortem of quick wins) - SameBoy: https://github.com/LIJI32/SameBoy - Pan Docs: https://gbdev.io/pandocs/Timing.html Verification: - make test (Blargg: 12/12, Mooneye: 10/39) - DMA fixes verified correct via SameBoy source comparison - RETI fix matches SameBoy reti() implementation
Implement M-cycle (machine cycle) accurate CPU execution based on SameBoy's approach. This allows components to advance in real-time during instruction execution rather than only after completion. **CPU (src/Cpu/Cpu.php):** - Add `pendingCycles` and `cycleCallback` for M-cycle tracking - Implement `cycleRead()` - flush pending cycles, read, set 4 cycles - Implement `cycleWrite()` - flush pending cycles, write, set 4 cycles - Implement `cycleNoAccess()` - add 4 cycles for internal operations - Update `fetch()` to use `cycleRead()` for 1 M-cycle timing - Update `serviceInterrupt()` with proper 5 M-cycle breakdown - Update `step()` to flush pending cycles after instruction execution **Emulator (src/Emulator.php):** - Set CPU cycle callback to advance all components in real-time - Remove redundant component stepping after CPU execution - Components (PPU, APU, Timer, DMA) now advance during instructions **InstructionSet (src/Cpu/InstructionSet.php):** - Replace all 99 `getBus()->readByte()` calls with `cycleRead()` - Replace all `getBus()->writeByte()` calls with `cycleWrite()` - Add `cycleNoAccess()` for internal operations: - 16-bit register inc/dec (INC BC, DEC DE, etc.) - 1 M-cycle - 16-bit ADD operations (ADD HL,BC, etc.) - 1 M-cycle - PUSH/POP - 1 M-cycle internal delay - CALL/RET - 1 M-cycle internal delay - Conditional branches when taken - 1 M-cycle for branch - RST instructions - 1 M-cycle internal delay **Tests (tests/Unit/Dma/OamDmaTest.php):** - Fix OAM DMA tests to use correct T-cycle values - DMA transfer takes 161 M-cycles (1 delay + 160 transfer) = 644 T-cycles - ✅ All 397 unit tests pass - ✅ 430/461 total tests pass (93%) - ❌ 31 timing-sensitive ROM tests fail (requires fine-tuning) The failing tests are precision timing tests (Mooneye/Blargg) that test exact M-cycle timing of specific instruction sequences. These will be addressed in follow-up commits. Based on SameBoy's M-cycle accurate implementation: - github.com/LIJI32/SameBoy/blob/master/Core/sm83_cpu.c
Fix critical M-cycle timing bugs discovered during integration testing: ## Bugs Fixed **1. JP instructions - Missing internal cycle** - JP nn (0xC3) and all conditional JP instructions were missing the internal 1 M-cycle delay that occurs during the jump operation - Added `cycleNoAccess()` for the jump delay in: - JP nn (0xC3) - JP NZ,nn JP Z,nn, JP NC,nn, JP C,nn (conditional) **2. LD (HL+/-) instructions - Extra incorrect cycle** - LD A,(HL+) (0x2A), LD (HL+),A (0x22) - LD A,(HL-) (0x3A), LD (HL-),A (0x32) - These were incorrectly adding `cycleNoAccess()` for the HL increment/ decrement, but this is an internal operation with no cycle cost - Removed the incorrect `cycleNoAccess()` calls ## Verification Traced cycle advancement vs returned cycle counts: - Before: JP returned 16 but advanced 12 (diff: +4) - Before: LD A,(HL+) returned 8 but advanced 12 (diff: -4) - After: All instructions match returned and advanced cycles ✓ ## Impact - ✅ All 397 unit tests still pass - ✅ Cycle accuracy significantly improved -⚠️ Some Mooneye timing tests still fail (require additional precision) These fixes resolve major timing discrepancies and bring the emulator closer to hardware-accurate M-cycle timing.
Add missing cycleNoAccess() calls for internal M-cycle delays in stack pointer manipulation instructions. ## Instructions Fixed **ADD SP,e (0xE8)** - 4 M-cycles total - Fetch: 1 M-cycle (handled in step()) - Read immediate: 1 M-cycle (readImm8 → cycleRead) - Internal operations: 2 M-cycles (ADDED) - Total: 16 T-cycles ✓ **LD HL,SP+e (0xF8)** - 3 M-cycles total - Fetch: 1 M-cycle (handled in step()) - Read immediate: 1 M-cycle (readImm8 → cycleRead) - Internal operation: 1 M-cycle (ADDED) - Total: 12 T-cycles ✓ **LD SP,HL (0xF9)** - 2 M-cycles total - Fetch: 1 M-cycle (handled in step()) - Internal transfer: 1 M-cycle (ADDED) - Total: 8 T-cycles ✓ ## Testing - ✅ All 397 unit tests still pass -⚠️ Mooneye timing tests still fail (require more precise timing work) These fixes continue to improve M-cycle accuracy by ensuring all internal operations are properly timed.
…al delay Remove incorrect cycleNoAccess() from all POP instructions. According to hardware specifications, POP instructions should be 3 M-cycles total, not 4. ## Bug Fix **POP BC/DE/HL/AF (0xC1/0xD1/0xE1/0xF1)** - Incorrect internal delay - Before: Had cycleNoAccess() at start → 4 M-cycles (16 T-cycles) - After: Removed cycleNoAccess() → 3 M-cycles (12 T-cycles) ✓ **Correct M-cycle breakdown:** - M-cycle 1: Fetch opcode (handled in CPU.step()) - M-cycle 2: Read low byte from stack (cycleRead) - M-cycle 3: Read high byte from stack (cycleRead) - Total: 3 M-cycles = 12 T-cycles **Note:** PUSH instructions correctly have an internal delay and remain at 4 M-cycles (16 T-cycles). ## Testing - ✅ All 397 unit tests still pass - ✅ pop_timing Mooneye test now passes (was failing) - ✅ Mooneye test failures reduced from 29 to 28 This fix aligns POP instruction timing with hardware behavior and brings the emulator closer to cycle-accurate execution.
…ctions All conditional RET instructions (RET Z, RET NZ, RET C, RET NC) were missing the internal M-cycle for the condition check. This caused incorrect timing: Before: - Non-taken: 4 T-cycles (fetch only) - should be 8 - Taken: 16 T-cycles (fetch + 2 reads + jump) - should be 20 After: - Non-taken: 8 T-cycles (fetch + condition check) - Taken: 20 T-cycles (fetch + condition check + 2 reads + jump) The fix adds cycleNoAccess() before the condition check, matching SameBoy's implementation where conditional RET is: cycle_no_access + ret(). Changes: - RET NZ (0xC0): Added condition check cycle - RET Z (0xC8): Added condition check cycle - RET NC (0xD0): Added condition check cycle - RET C (0xD8): Added condition check cycle All unit tests (397/397) continue to pass.
52b9148 to
6c6def2
Compare
Added implementations for all 10 illegal opcodes on the Game Boy CPU: - 0xD3, 0xDB, 0xDD - 0xE3, 0xE4 - 0xEB, 0xEC, 0xED - 0xF4 - 0xFC, 0xFD Behavior: - These opcodes are undefined on the Game Boy hardware - They now lock up the CPU (set stopped = true) matching actual hardware - This prevents crashes when ROMs execute these opcodes - Takes 4 T-cycles (1 M-cycle) to execute Previously these would throw "Unknown opcode" exceptions and crash the emulator. Now they execute correctly, allowing better ROM compatibility including test ROMs that may test illegal opcode behavior. All unit tests (397/397) continue to pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What was implemented:
Why this approach:
Test results:
Technical details:
Root cause analysis:
Next steps:
References:
Verification: