Skip to content

fix(timing): correct DMA speed, RETI IME enable, and DMA start delay#21

Merged
eddmann merged 7 commits into
mainfrom
claude/fix-rom-check-tests-011CUvhrwoDuE1ENDgvstnSS
Nov 9, 2025
Merged

fix(timing): correct DMA speed, RETI IME enable, and DMA start delay#21
eddmann merged 7 commits into
mainfrom
claude/fix-rom-check-tests-011CUvhrwoDuE1ENDgvstnSS

Conversation

@eddmann

@eddmann eddmann commented Nov 9, 2025

Copy link
Copy Markdown
Owner

What was implemented:

  • OAM DMA now correctly converts T-cycles to M-cycles (was 4x too fast)
  • RETI instruction enables IME immediately (not delayed like EI)
  • OAM DMA has proper 1 M-cycle startup delay before first byte
  • Comprehensive ROM check analysis documenting root causes

Why this approach:

  • Aligns with SameBoy reference implementation
  • Necessary prerequisites for M-cycle accurate CPU
  • Fixes are technically correct even though tests don't pass yet
  • All failing Mooneye tests require M-cycle granular CPU execution

Test results:

  • Blargg: 12/12 passing (100%) ✅ No regression
  • Mooneye: 10/39 passing (25.6%) - No change (expected)
  • Mooneye tests require M-cycle accurate CPU to pass

Technical details:

  • DMA tick() now uses intdiv($cycles, 4) to convert T-cycles
  • Added Cpu::setIMEImmediate() for RETI vs EI distinction
  • DMA startup delay tracked in $dmaDelay property
  • Created ROM_CHECK_ANALYSIS.md with detailed comparison to SameBoy
  • Created QUICK_WINS_ANALYSIS.md documenting why fixes don't improve tests

Root cause analysis:

  • All failing tests check sub-instruction timing at M-cycle boundaries
  • Current CPU executes instructions atomically (all-at-once)
  • Tests like call_timing.gb check if interrupts fire between M-cycles
  • No peripheral fix will help without M-cycle accurate CPU refactor

Next steps:

  • Implement M-cycle accurate CPU (24-40 hours, +12-20 tests)
  • Implement timer bit-selection model (4-6 hours, +4-8 tests)
  • Implement TIMA reload state machine (2-3 hours, +2-4 tests)
  • Realistic target: 50-85% Mooneye pass rate after CPU refactor

References:

Verification:

  • make test (Blargg: 12/12, Mooneye: 10/39)
  • DMA fixes verified correct via SameBoy source comparison
  • RETI fix matches SameBoy reti() implementation

@eddmann eddmann force-pushed the claude/fix-rom-check-tests-011CUvhrwoDuE1ENDgvstnSS branch from 668dd75 to 52b9148 Compare November 9, 2025 12:10
What was implemented:
- OAM DMA now correctly converts T-cycles to M-cycles (was 4x too fast)
- RETI instruction enables IME immediately (not delayed like EI)
- OAM DMA has proper 1 M-cycle startup delay before first byte
- Comprehensive ROM check analysis documenting root causes

Why this approach:
- Aligns with SameBoy reference implementation
- Necessary prerequisites for M-cycle accurate CPU
- Fixes are technically correct even though tests don't pass yet
- All failing Mooneye tests require M-cycle granular CPU execution

Test results:
- Blargg: 12/12 passing (100%) ✅ No regression
- Mooneye: 10/39 passing (25.6%) - No change (expected)
- Mooneye tests require M-cycle accurate CPU to pass

Technical details:
- DMA tick() now uses intdiv($cycles, 4) to convert T-cycles
- Added Cpu::setIMEImmediate() for RETI vs EI distinction
- DMA startup delay tracked in $dmaDelay property
- Created ROM_CHECK_ANALYSIS.md with detailed comparison to SameBoy
- Created QUICK_WINS_ANALYSIS.md documenting why fixes don't improve tests

Root cause analysis:
- All failing tests check sub-instruction timing at M-cycle boundaries
- Current CPU executes instructions atomically (all-at-once)
- Tests like call_timing.gb check if interrupts fire between M-cycles
- No peripheral fix will help without M-cycle accurate CPU refactor

Next steps:
- Implement M-cycle accurate CPU (24-40 hours, +12-20 tests)
- Implement timer bit-selection model (4-6 hours, +4-8 tests)
- Implement TIMA reload state machine (2-3 hours, +2-4 tests)
- Realistic target: 50-85% Mooneye pass rate after CPU refactor

References:
- docs/ROM_CHECK_ANALYSIS.md (comprehensive comparison with SameBoy)
- docs/QUICK_WINS_ANALYSIS.md (post-mortem of quick wins)
- SameBoy: https://github.com/LIJI32/SameBoy
- Pan Docs: https://gbdev.io/pandocs/Timing.html

Verification:
- make test (Blargg: 12/12, Mooneye: 10/39)
- DMA fixes verified correct via SameBoy source comparison
- RETI fix matches SameBoy reti() implementation
Implement M-cycle (machine cycle) accurate CPU execution based on
SameBoy's approach. This allows components to advance in real-time
during instruction execution rather than only after completion.

**CPU (src/Cpu/Cpu.php):**
- Add `pendingCycles` and `cycleCallback` for M-cycle tracking
- Implement `cycleRead()` - flush pending cycles, read, set 4 cycles
- Implement `cycleWrite()` - flush pending cycles, write, set 4 cycles
- Implement `cycleNoAccess()` - add 4 cycles for internal operations
- Update `fetch()` to use `cycleRead()` for 1 M-cycle timing
- Update `serviceInterrupt()` with proper 5 M-cycle breakdown
- Update `step()` to flush pending cycles after instruction execution

**Emulator (src/Emulator.php):**
- Set CPU cycle callback to advance all components in real-time
- Remove redundant component stepping after CPU execution
- Components (PPU, APU, Timer, DMA) now advance during instructions

**InstructionSet (src/Cpu/InstructionSet.php):**
- Replace all 99 `getBus()->readByte()` calls with `cycleRead()`
- Replace all `getBus()->writeByte()` calls with `cycleWrite()`
- Add `cycleNoAccess()` for internal operations:
  - 16-bit register inc/dec (INC BC, DEC DE, etc.) - 1 M-cycle
  - 16-bit ADD operations (ADD HL,BC, etc.) - 1 M-cycle
  - PUSH/POP - 1 M-cycle internal delay
  - CALL/RET - 1 M-cycle internal delay
  - Conditional branches when taken - 1 M-cycle for branch
  - RST instructions - 1 M-cycle internal delay

**Tests (tests/Unit/Dma/OamDmaTest.php):**
- Fix OAM DMA tests to use correct T-cycle values
- DMA transfer takes 161 M-cycles (1 delay + 160 transfer) = 644 T-cycles

- ✅ All 397 unit tests pass
- ✅ 430/461 total tests pass (93%)
- ❌ 31 timing-sensitive ROM tests fail (requires fine-tuning)

The failing tests are precision timing tests (Mooneye/Blargg) that test
exact M-cycle timing of specific instruction sequences. These will be
addressed in follow-up commits.

Based on SameBoy's M-cycle accurate implementation:
- github.com/LIJI32/SameBoy/blob/master/Core/sm83_cpu.c
Fix critical M-cycle timing bugs discovered during integration testing:

## Bugs Fixed

**1. JP instructions - Missing internal cycle**
- JP nn (0xC3) and all conditional JP instructions were missing the
  internal 1 M-cycle delay that occurs during the jump operation
- Added `cycleNoAccess()` for the jump delay in:
  - JP nn (0xC3)
  - JP NZ,nn JP Z,nn, JP NC,nn, JP C,nn (conditional)

**2. LD (HL+/-) instructions - Extra incorrect cycle**
- LD A,(HL+) (0x2A), LD (HL+),A (0x22)
- LD A,(HL-) (0x3A), LD (HL-),A (0x32)
- These were incorrectly adding `cycleNoAccess()` for the HL increment/
  decrement, but this is an internal operation with no cycle cost
- Removed the incorrect `cycleNoAccess()` calls

## Verification

Traced cycle advancement vs returned cycle counts:
- Before: JP returned 16 but advanced 12 (diff: +4)
- Before: LD A,(HL+) returned 8 but advanced 12 (diff: -4)
- After: All instructions match returned and advanced cycles ✓

## Impact

- ✅ All 397 unit tests still pass
- ✅ Cycle accuracy significantly improved
- ⚠️  Some Mooneye timing tests still fail (require additional precision)

These fixes resolve major timing discrepancies and bring the emulator
closer to hardware-accurate M-cycle timing.
Add missing cycleNoAccess() calls for internal M-cycle delays in
stack pointer manipulation instructions.

## Instructions Fixed

**ADD SP,e (0xE8)** - 4 M-cycles total
- Fetch: 1 M-cycle (handled in step())
- Read immediate: 1 M-cycle (readImm8 → cycleRead)
- Internal operations: 2 M-cycles (ADDED)
- Total: 16 T-cycles ✓

**LD HL,SP+e (0xF8)** - 3 M-cycles total
- Fetch: 1 M-cycle (handled in step())
- Read immediate: 1 M-cycle (readImm8 → cycleRead)
- Internal operation: 1 M-cycle (ADDED)
- Total: 12 T-cycles ✓

**LD SP,HL (0xF9)** - 2 M-cycles total
- Fetch: 1 M-cycle (handled in step())
- Internal transfer: 1 M-cycle (ADDED)
- Total: 8 T-cycles ✓

## Testing

- ✅ All 397 unit tests still pass
- ⚠️  Mooneye timing tests still fail (require more precise timing work)

These fixes continue to improve M-cycle accuracy by ensuring all
internal operations are properly timed.
…al delay

Remove incorrect cycleNoAccess() from all POP instructions. According to
hardware specifications, POP instructions should be 3 M-cycles total, not 4.

## Bug Fix

**POP BC/DE/HL/AF (0xC1/0xD1/0xE1/0xF1)** - Incorrect internal delay
- Before: Had cycleNoAccess() at start → 4 M-cycles (16 T-cycles)
- After: Removed cycleNoAccess() → 3 M-cycles (12 T-cycles) ✓

**Correct M-cycle breakdown:**
- M-cycle 1: Fetch opcode (handled in CPU.step())
- M-cycle 2: Read low byte from stack (cycleRead)
- M-cycle 3: Read high byte from stack (cycleRead)
- Total: 3 M-cycles = 12 T-cycles

**Note:** PUSH instructions correctly have an internal delay and remain at
4 M-cycles (16 T-cycles).

## Testing

- ✅ All 397 unit tests still pass
- ✅ pop_timing Mooneye test now passes (was failing)
- ✅ Mooneye test failures reduced from 29 to 28

This fix aligns POP instruction timing with hardware behavior and brings
the emulator closer to cycle-accurate execution.
…ctions

All conditional RET instructions (RET Z, RET NZ, RET C, RET NC) were missing
the internal M-cycle for the condition check. This caused incorrect timing:

Before:
- Non-taken: 4 T-cycles (fetch only) - should be 8
- Taken: 16 T-cycles (fetch + 2 reads + jump) - should be 20

After:
- Non-taken: 8 T-cycles (fetch + condition check)
- Taken: 20 T-cycles (fetch + condition check + 2 reads + jump)

The fix adds cycleNoAccess() before the condition check, matching SameBoy's
implementation where conditional RET is: cycle_no_access + ret().

Changes:
- RET NZ (0xC0): Added condition check cycle
- RET Z (0xC8): Added condition check cycle
- RET NC (0xD0): Added condition check cycle
- RET C (0xD8): Added condition check cycle

All unit tests (397/397) continue to pass.
@eddmann eddmann force-pushed the claude/fix-rom-check-tests-011CUvhrwoDuE1ENDgvstnSS branch from 52b9148 to 6c6def2 Compare November 9, 2025 14:18
Added implementations for all 10 illegal opcodes on the Game Boy CPU:
- 0xD3, 0xDB, 0xDD
- 0xE3, 0xE4
- 0xEB, 0xEC, 0xED
- 0xF4
- 0xFC, 0xFD

Behavior:
- These opcodes are undefined on the Game Boy hardware
- They now lock up the CPU (set stopped = true) matching actual hardware
- This prevents crashes when ROMs execute these opcodes
- Takes 4 T-cycles (1 M-cycle) to execute

Previously these would throw "Unknown opcode" exceptions and crash the
emulator. Now they execute correctly, allowing better ROM compatibility
including test ROMs that may test illegal opcode behavior.

All unit tests (397/397) continue to pass.
@eddmann eddmann merged commit 936ae9d into main Nov 9, 2025
1 check failed
@eddmann eddmann deleted the claude/fix-rom-check-tests-011CUvhrwoDuE1ENDgvstnSS branch November 10, 2025 10:23
eddmann added a commit that referenced this pull request Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants