Restore OTA CRC integrity check and add hard reset after BLE DFU#21
Closed
4np wants to merge 3 commits into
Closed
Restore OTA CRC integrity check and add hard reset after BLE DFU#214np wants to merge 3 commits into
4np wants to merge 3 commits into
Conversation
Two bugs caused BLE OTA to silently succeed while the application never
booted:
1. dfu_init_postvalidate() computed and validated the image CRC but
discarded it without writing it back to the caller. m_image_crc
remained 0 after every OTA, so bank_0_crc = 0 was persisted to
bootloader settings. bootloader_app_is_valid() skips the CRC check
when bank_0_crc is 0, meaning any image — including a corrupted one —
was unconditionally accepted at boot.
Fix: add uint16_t *p_crc_out to dfu_init_postvalidate() and write the
validated CRC through it. Update the call site in dfu_single_bank.c to
pass &m_image_crc so the value is captured and stored in
bootloader_settings_t.bank_0_crc.
2. After BLE OTA activation, check_dfu_mode() tore down the SoftDevice
and USB and then returned to main(), which jumped directly to the
application without issuing NVIC_SystemReset(). This left nRF52840
peripheral registers and radio state in a post-DFU condition. BLE
applications (e.g. MeshCore on RAK4631) depend on a clean hardware
reset to initialise their radio, SoftDevice, and peripheral stack.
The direct jump caused silent initialisation failure and the device
never came online. Additionally, sd_softdevice_vector_table_base_set()
fails when the SD is already disabled, falling back to writing the
forwarding address to 0x20000000, which the application's .bss
initialisation can overwrite before the SD is re-enabled.
Fix: add NVIC_SystemReset() after BLE OTA teardown. GPREGRET is
already cleared to 0 earlier in check_dfu_mode(), so the subsequent
boot goes straight to the application with a fully reset hardware
state. The serial/USB DFU path is unaffected.
Two bugs caused BLE OTA to silently succeed while the application never
booted:
1. dfu_init_postvalidate() computed and validated the image CRC but
discarded it without writing it back to the caller. m_image_crc
remained 0 after every OTA, so bank_0_crc = 0 was persisted to
bootloader settings. bootloader_app_is_valid() skips the CRC check
when bank_0_crc is 0, meaning any image — including a corrupted one —
was unconditionally accepted at boot.
Fix: add uint16_t *p_crc_out to dfu_init_postvalidate() and write the
validated CRC through it. Update the call site in dfu_single_bank.c to
pass &m_image_crc so the value is captured and stored in
bootloader_settings_t.bank_0_crc.
2. After BLE OTA activation, check_dfu_mode() tore down the SoftDevice
and USB and then returned to main(), which jumped directly to the
application without issuing NVIC_SystemReset(). This left nRF52840
peripheral registers and radio state in a post-DFU condition. BLE
applications (e.g. MeshCore on RAK4631) depend on a clean hardware
reset to initialise their radio, SoftDevice, and peripheral stack.
The direct jump caused silent initialisation failure and the device
never came online. Additionally, sd_softdevice_vector_table_base_set()
fails when the SD is already disabled, falling back to writing the
forwarding address to 0x20000000, which the application's .bss
initialisation can overwrite before the SD is re-enabled.
Fix: add NVIC_SystemReset() after BLE OTA teardown. GPREGRET is
already cleared to 0 earlier in check_dfu_mode(), so the subsequent
boot goes straight to the application with a fully reset hardware
state. The serial/USB DFU path is unaffected.
efce1e7 to
1256ae2
Compare
Author
|
Closing in favor of #22 (some commits got borked). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BLE OTA firmware update succeeds but application never boots
This Pull Requests addresses #15 by fixing two bugs in the BLE OTA DFU path that together cause the application to never start after a successful over-the-air firmware update. The DFU upload completes without error, but the device silently fails to boot the new firmware and does not come online.
Bug 1 — OTA CRC is discarded, permanently disabling the boot-time integrity check.
dfu_init_postvalidate()computed and validated the image CRC but never returned it to the caller.m_image_crcremained0after every OTA, causingbank_0_crc = 0to be written to bootloader settings. Becausebootloader_app_is_valid()skips the CRC check whenbank_0_crcis zero, any image — including a corrupted one — was unconditionally accepted at boot.Bug 2 — No hard reset after BLE OTA causes silent application initialisation failure.
After activating the new image, the bootloader jumped directly to the application without
NVIC_SystemReset(). BLE applications depend on a clean hardware reset to initialise their radio, SoftDevice, and peripheral stack. The dirty post-DFU hardware state caused silent initialisation failure. This is the direct cause of the application never coming online.Impact: Any board using BLE OTA DFU is affected. Verified on RAK4631 (nRF52840, S140 6.1.1)
with MeshCore 1.14.1. The serial/USB DFU path is unaffected.
Summary
After a successful BLE OTA firmware update via the iOS nRF DFU app, the newly flashed application
never starts. The device silently fails to boot and does not come online. This PR fixes two distinct
bugs that together cause this behaviour:
m_image_crcis never set — the post-OTA CRC computed during image validation isdiscarded, so
bank_0_crc = 0is always written to bootloader settings. This permanentlydisables the boot-time integrity check, meaning a corrupted image would still be booted.
directly to the application without issuing
NVIC_SystemReset(). This leaves nRF52840peripheral registers, radio state, and NVIC configuration in a post-DFU dirty state that
BLE applications such as MeshCore cannot recover from.
Both issues must be fixed together. Fix 1 ensures image integrity is enforced after every OTA. Fix 2 is the direct cause of the application failing to start.
Reproduction
0.9.2-OTAFIX2.1-BP1.2, SoftDevice S140 6.1.1Root Cause Analysis
Bug 1 —
m_image_crcis never assigneddfu_init_postvalidate()insrc/dfu_init.ccorrectly computes a CRC-16 over the received image in flash and compares it against the CRC supplied in the DFU init packet. If they match it returnsNRF_SUCCESS. However, the computed CRC value was never communicated back to the caller.The original call site in
dfu_single_bank.c:The static variable
m_image_crc(initialised to0indfu_init()) was therefore neverupdated:
This zero propagates into
bootloader_settings_t.bank_0_crc, which is persisted to flash atBOOTLOADER_SETTINGS_ADDRESS(0x000FF000 on nRF52840).At the next boot,
bootloader_app_is_valid()readsbank_0_crc:Because
bank_0_crcis always0, the CRC check is unconditionally bypassed and any image -including a corrupted one- will be considered valid. This is a silent integrity regression introduced whenever a BLE OTA is performed.Bug 2 — No hard reset after BLE OTA (root cause of boot failure)
After
dfu_image_activate()completes and bootloader settings are saved, execution incheck_dfu_mode()(src/main.c) resumes:Control then returns to
main(), which callsbootloader_app_start():The direct jump is performed with the following hardware state:
bootloader_app_start)sd_softdevice_disable(), but register state not guaranteed cleansd_softdevice_vector_table_base_set()fails when SD is disabled; SRAM fallback at0x20000000may be zeroed by the application's.bssinitialisation before the SD is re-enabledApplications such as MeshCore initialise the SoftDevice, BLE stack, LoRa (SX1262 via SPI), and their mesh networking layer from scratch on startup. They depend on the hardware reset vector sequence to put all peripherals into a known state. A direct jump from a post-DFU bootloader provides no such guarantee and the application's initialisation silently fails.
Critically,
NVIC_SystemReset()was the correct post-OTA action.GPREGRETis already cleared to0earlier incheck_dfu_mode()(line 245), so a reset will produce a clean boot: no magic value inGPREGRET→check_dfu_mode()does not enter DFU →bootloader_app_is_valid()returnstrue→bootloader_app_start()is called with a fully reset hardware state.The direct-jump behaviour was introduced in OTAFIX 2.1 as "auto-boot after OTA" to avoid the BLE OTA timeout when USB is connected. However, it was applied unconditionally to the BLE OTA path, not only to the USB/UF2 path it was intended for. A
NVIC_SystemReset()after BLE OTA costs approximately one additional boot cycle (~100–200 ms) and is the only reliable way to guarantee a clean hardware state for the incoming application.Changes
src/dfu_init.candlib/sdk11/components/libraries/bootloader_dfu/dfu_init.hAdded an output parameter
uint16_t * p_crc_outtodfu_init_postvalidate(). When post-validation succeeds, the validated CRC is written through this pointer before returningNRF_SUCCESS.lib/sdk11/components/libraries/bootloader_dfu/dfu_single_bank.cUpdated the call to
dfu_init_postvalidate()to pass&m_image_crc, so the validated CRC is captured and subsequently stored inbootloader_settings_t.bank_0_crc.After this change,
bootloader_app_is_valid()will compute a CRC over the application in flash on every boot and compare it against the stored value. A corrupted or incomplete OTA will be caught and the bootloader will re-enter DFU mode rather than jumping to a broken image.src/main.cAdded
NVIC_SystemReset()incheck_dfu_mode()immediately after BLE OTA teardown. This replaces the direct-jump path with a clean hardware reset, giving the application a fully initialised register and peripheral state on first boot.Why both fixes are needed
bank_0_crcafter OTA00Fix 2 alone would make the application start, but with the integrity check still bypassed.
Fix 1 alone restores the integrity check but does not fix the boot failure.
Both fixes together restore correct, safe, and reliable OTA behaviour.
Testing
adafruit-nrfutil dfu serial) unaffected;NVIC_SystemReset()is only added to the_ota_dfubranch.verified clean between cycles).
NRF_ERROR_INVALID_DATA), DFU is rejected before activation — image in flash is unchanged.