Skip to content

Restore OTA CRC integrity check and add hard reset after BLE DFU#22

Draft
4np wants to merge 2 commits into
oltaco:masterfrom
4np:scratch/fix-ota-crc-discard-and-no-reset-after-ble-dfu
Draft

Restore OTA CRC integrity check and add hard reset after BLE DFU#22
4np wants to merge 2 commits into
oltaco:masterfrom
4np:scratch/fix-ota-crc-discard-and-no-reset-after-ble-dfu

Conversation

@4np
Copy link
Copy Markdown

@4np 4np commented Mar 22, 2026

Improve BLE OTA reliability: fix discarded CRC and add hard reset after DFU

Disclaimer: This investigation and the resulting code changes were performed with the assistance of Claude Code (Anthropic). All changes have been reviewed and verified by the author.

Description of Change

Two improvements to the BLE OTA DFU path, verified on RAK4631 (nRF52840, S140 6.1.1) with MeshCore 1.14.1 via the iOS nRF DFU app. The serial/USB DFU path is unaffected.


1. OTA CRC was silently discarded (dfu_init.c, dfu_single_bank.c, dfu_init.h)

dfu_init_postvalidate() computed and validated the image CRC but never returned it to the caller. m_image_crc stayed 0 after every OTA, so bank_0_crc = 0 was persisted to bootloader settings. Because bootloader_app_is_valid() skips the CRC check when bank_0_crc is zero, the boot-time integrity check was permanently disabled for any OTA-flashed image.

Fix: added uint16_t * p_crc_out to dfu_init_postvalidate() and updated the call site in dfu_single_bank.c to pass &m_image_crc. The validated CRC is now stored in bank_0_crc and verified on every subsequent boot.

// before
err_code = dfu_init_postvalidate((uint8_t *)mp_storage_handle_active->block_id, m_image_size);

// after
err_code = dfu_init_postvalidate((uint8_t *)mp_storage_handle_active->block_id, m_image_size, &m_image_crc);

2. No hard reset after BLE OTA (main.c)

After image activation, the bootloader jumped directly to the application without NVIC_SystemReset(). This left nRF52840 peripheral registers and radio state in a post-DFU condition. Additionally, sd_softdevice_vector_table_base_set() silently fails when the SoftDevice is already disabled, and its SRAM fallback at 0x20000000 can be zeroed by the application's .bss initialisation before the SD is re-enabled.

GPREGRET is already cleared to 0 earlier in check_dfu_mode(), so a hard reset produces a clean boot directly into the application without re-entering DFU mode.

Fix: added NVIC_SystemReset() after BLE OTA teardown.

// before
if (_ota_dfu) {
    sd_softdevice_disable();
    usb_teardown();
}

// after
if (_ota_dfu) {
    sd_softdevice_disable();
    usb_teardown();
    NVIC_SystemReset(); // clean reset; GPREGRET is already 0, boots straight to app
}

3. GCC 12–15 compatibility (Makefile)

Extended the existing GCC version workaround to cover GCC 15, and added -Wno-array-bounds and -Wno-unterminated-string-initialization to suppress false-positive warnings on memory-mapped address dereferences in SDK headers and intentional FAT 8.3 filename fields in ghostfat.c.


Building and flashing

I used the following commands to build and flash the bootloader:

xcode-select --install
brew install --cask gcc-arm-embedded
pip3 install adafruit-nrfutil intelhex
git submodule update --init --recursive
python3 -m venv .venv
source .venv/bin/activate
pip install intelhex adafruit-nrfutil
make BOARD=wiscore_rak4631_board flash-dfu SERIAL=/dev/cu.usbmodem2101

Testing

  • BLE OTA of MeshCore 1.14.1 on RAK4631 via iOS nRF DFU app: application boots successfully
    after OTA completes.
  • Serial DFU path (adafruit-nrfutil dfu serial) unaffected.
  • Consecutive OTA cycles complete and boot correctly.

4np and others added 2 commits March 22, 2026 16:03
  Two bugs caused BLE OTA to silently succeed while the application never
  booted:

  1. dfu_init_postvalidate() computed and validated the image CRC but
     discarded it without writing it back to the caller. m_image_crc
     remained 0 after every OTA, so bank_0_crc = 0 was persisted to
     bootloader settings. bootloader_app_is_valid() skips the CRC check
     when bank_0_crc is 0, meaning any image — including a corrupted one —
     was unconditionally accepted at boot.

     Fix: add uint16_t *p_crc_out to dfu_init_postvalidate() and write the
     validated CRC through it. Update the call site in dfu_single_bank.c to
     pass &m_image_crc so the value is captured and stored in
     bootloader_settings_t.bank_0_crc.

  2. After BLE OTA activation, check_dfu_mode() tore down the SoftDevice
     and USB and then returned to main(), which jumped directly to the
     application without issuing NVIC_SystemReset(). This left nRF52840
     peripheral registers and radio state in a post-DFU condition. BLE
     applications (e.g. MeshCore on RAK4631) depend on a clean hardware
     reset to initialise their radio, SoftDevice, and peripheral stack.
     The direct jump caused silent initialisation failure and the device
     never came online. Additionally, sd_softdevice_vector_table_base_set()
     fails when the SD is already disabled, falling back to writing the
     forwarding address to 0x20000000, which the application's .bss
     initialisation can overwrite before the SD is re-enabled.

     Fix: add NVIC_SystemReset() after BLE OTA teardown. GPREGRET is
     already cleared to 0 earlier in check_dfu_mode(), so the subsequent
     boot goes straight to the application with a fully reset hardware
     state. The serial/USB DFU path is unaffected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant