DroneCAN: ISR-driven TX for STM32F7 bxCAN driver#14
Closed
daijoubu wants to merge 30 commits into
Closed
Conversation
… verbosity Remove LOG_DEBUG from canardSTM32Transmit() failure path — this will run in ISR callback context after the TX ISR migration and calling printf from an ISR is undefined behaviour. Collapse 5 intermediate canardSTM32ComputeTimings() debug calls to the single summary line already present at the end of the function. Remove per-frame success-path LOG_DEBUG calls in dronecan.c transfer handlers. These fire on every received GPS fix, NodeStatus and battery update and are too noisy for upstream submission.
Replace blocking HAL_CAN_AddTxMessage() polling in canardSTM32Transmit()
with a SPSC SW ring buffer (TX_QUEUE_SIZE=32) drained by CAN1_TX_IRQHandler
via HAL_CAN_TxMailbox{0,1,2}CompleteCallback().
Bootstrapping: when all 3 mailboxes are idle RQCP=0 so TMEIE alone won't
fire; seed the HW directly from the main-loop path when
GetTxMailboxesFreeLevel()==3. ISR handles all subsequent frames.
CAN_IT_TX_MAILBOX_EMPTY is activated on each enqueue and deactivated when
the queue drains, keeping the interrupt silent at idle.
Adds four debug visibility hooks to the STM32F7 CAN TX ISR driver: - canTxDropped counter: incremented when SW TX queue (depth 32) is full and canardSTM32Transmit() silently drops a frame. Cumulative across the session; exposed via canardProtocolStatus_t.tx_dropped. - ESR register fields in canardProtocolStatus_t: tec (TX error counter), rec (RX error counter), lec (last error code) read from hcan1.Instance->ESR. Reads BusOff/ErrorPassive atomically with the ESR snapshot. - canardSTM32GetTxQueueFillLevel(): returns current SW queue depth as (head - tail + TX_QUEUE_SIZE) % TX_QUEUE_SIZE. - CLI 'dronecan' command (USE_DRONECAN guard): prints all of the above plus RX buffer fill level in a single readable block. Useful for polling during long-running hardware validation tests without needing the configurator or MSP.
- Add NVIC_PRIO_CAN = 4 to nvic.h; use it for CAN1_RX0 and CAN1_TX IRQs (was 0, violating NVIC_PRIO_MAX=1 and risking preemption of motor timers at priority 3) - Add canTxQueueHWM / canRxBufferHWM tracking; expose via canardProtocolStatus_t and dronecan CLI command as "N / 32 (hwm: N)" for both TX queue and RX buffer - Include drivers/nvic.h in canard_stm32f7xx_driver.c
…pace - Replace direct canTxQueuePop() in main-loop seed path with canTxDrainQueue() called under __disable_irq()/__enable_irq(), making the ISR the sole consumer of the SW TX queue at all times - Only call ActivateNotification if queue is non-empty after seeding, eliminating the spurious ISR invocation on single-frame bursts - Add forward declaration for canTxDrainQueue() - Strip trailing whitespace throughout file
Add __DMB() in canTxQueuePush() between writing frame data and advancing head, and in canTxQueuePop() between reading head and reading frame data. Prevents the M7 write buffer from reordering the head store before the frame data store is globally visible.
Enable TransmitFifoPriority (TXFP) so the peripheral transmits mailboxes in request order rather than mailbox-number order. Without TXFP, loading the next frame into mailbox 0 after it completes causes it to transmit before frames still pending in mailboxes 1 and 2, corrupting the toggle bit sequence of multi-frame UAVCAN transfers. With TXFP enabled, restore the while loop in canTxDrainQueue to fill all free mailboxes per ISR callback for full pipeline utilization. Also fix remaining code review findings: - Check HAL_CAN_AddTxMessage return value; count failures as dropped frames - Add CanTxQueue_t struct tag to match RxBuffer_t convention - Remove blank line artifact at top of onTransferReceived body
- Add volatile to RxBuffer writeIndex/readIndex: written in ISR and read in main loop, missing volatile allows compiler to cache stale values causing silent receive stalls - Peek-then-pop in canTxDrainQueue: advance tail only after HAL_CAN_ AddTxMessage succeeds so a HAL rejection never silently loses a frame - Move ActivateNotification inside critical section: eliminates race where ISR could drain queue and deactivate between __enable_irq and the ActivateNotification call, leaving a spurious notification armed - Snapshot tail before advancing head in canTxQueuePush: prevents ISR advancing tail between the HWM read and head write, which understated the high water mark - Remove dead canTxQueuePop (inlined into peek-then-pop drain logic) - Remove dead RxFrame_t rxMsg global - Fix 3-space indent on first if in canardSTM32Transmit - Fix missing space in ErrorPassive CLI output - Add missing newline at EOF
…eNotification error handling - Fix REC register extraction: ESR bits 7:0 are error flags, REC lives at bits 31:24 (CAN_ESR_REC_Pos=24). Previous code always read near- zero instead of the actual receive error counter. - Remove canTxDropped increment from ISR drain path: a HAL_CAN_ AddTxMessage failure leaves tail unadvanced so the frame is retried on the next ISR — it is not dropped. Incrementing the drop counter here gave wrong semantics and created a dual-writer race on the volatile uint16_t counter. - Check ActivateNotification return in seed path: failure means the TX ISR will never fire for in-flight completions, stalling the queue with no indication. Return 0 on failure so the DroneCAN task retries.
…e access Replace direct struct field access in canTxDrainQueue with two new queue functions: canTxQueuePeek() returns a const pointer to the tail frame (including the DMB barrier) without consuming it, and canTxQueueConsume() advances the tail after the HW accepts the frame. This keeps queue invariants — barrier placement, index arithmetic, and the peek-then-consume ordering — inside the queue API rather than spread across canTxDrainQueue.
…, flag CLI denominators - Fix ESR bit-range annotation on rec field in header: was ESR[7:0] (error flags), correct position is ESR[31:24] per CAN_ESR_REC_Pos=24. Extraction code was already correct; wrong comment risked a future maintainer reverting it. - Add comment explaining why TxMailboxAbortCallback is not wired: AutoRetransmission=ENABLE prevents ALST; TERR leaves the frame in the queue (tail not advanced) so the next Transmit call re-seeds the HW. - Add coupling comments on hardcoded /32 denominators in CLI output so they stay in sync if TX_QUEUE_SIZE or RX_BUFFER_SIZE ever changes.
Add @brief/@param/@RetVal doc blocks to all previously undocumented functions in canard_stm32f7xx_driver.c. Fix stale FDCAN1/hfdcan copy-paste in the canardSTM32CAN1_Init block — this is bxCAN, not FDCAN.
Group functions by subsystem in all three files to aid navigation: F7 driver: Initialization → RX subsystem → TX subsystem → Diagnostics. Removes forward declarations (each function now defined before use). H7 driver: Initialization → RX → TX → Diagnostics. Same section layout as F7 for consistency. Also adds missing canardSTM32GetTxQueueFillLevel stub (declared in shared header, never implemented for H7 — caused undefined-reference link error on MATEKH743). dronecan.c: Message handlers → Libcanard callbacks → Message senders → TX/periodic processing → Public API. No logic changed. Both MATEKF765SE and MATEKH743 build clean.
…ions Public API (dronecanInit, dronecanUpdate) at the top, internals below: TX/periodic processing → message senders → libcanard callbacks → message handlers. Add forward declarations for all internal functions. Mark all internal functions static.
- canardSTM32Transmit: return 1 after successful canTxQueuePush regardless of ActivateNotification result — prevents libcanard from re-pushing the same frame and producing duplicate CAN transmissions that corrupt the UAVCAN transfer-ID toggle sequence - handle_GetNodeInfo: replace printf() with LOG_DEBUG() — printf has no output backend in firmware and risks a fault or linker error - rxBufferPushFrame/rxBufferPopFrame: change return type uint8_t → int8_t so the -1 error sentinel is type-correct
Issue 4 — AbortCallback comment: previous text incorrectly described TERR as a synchronous AddTxMessage failure and misattributed the abort path suppression. Correct explanation: AutoRetransmission=ENABLE causes the hardware to retransmit automatically on arbitration loss (ALST), so the software abort path is never triggered; bus-off is handled by the state machine in dronecanUpdate(). Issue 5 — critical section: replace __disable_irq()/__enable_irq() with ATOMIC_BLOCK(NVIC_PRIO_CAN) from build/atomic.h. This raises BASEPRI to mask only the CAN TX ISR (priority 4 and below), allowing higher-priority IRQs such as the gyro timer (priority 3) to continue firing during the mailbox-seeding window. Also simplifies the two return-1 paths into one.
- Fix 7: Clarify SJW=3 comment in canardSTM32ComputeTimings - Fix 8: Remove commented-out CubeMX scaffold from canardSTM32CAN1_Init - Fix 9: Convert tab indentation to 4-space in canardSTM32Recieve (F7) - Fix 10: Remove USER CODE BEGIN/END sentinel comments from H7 FDCAN1_Init - Fix 11: Replace Unicode section separators with ASCII dashes in all three files
… and 12) - Fix 6: Clarify hardcoded 32 denominators in CLI status output — add comments naming the matching TX_QUEUE_SIZE / RX_BUFFER_SIZE constants - Fix 12: Remove WHAT-comment "// Transmitting" from processCanardTxQueue
F7 driver: - Add __DMB() before writeIndex advance in rxBufferPushFrame (ISR→main memory ordering: ensures frame data visible before consumer sees new index) - Add __DMB() before memcpy in rxBufferPopFrame (acquire barrier: ensures ISR's writes are visible before main loop reads frame data) - Check canardSTM32ComputeTimings() return value; return error if no valid timing solution found for the requested bitrate - Mark rxBufferPushFrame, rxBufferPopFrame, rxBufferNumMessages static H7 driver: - Fix prescaler upper bound: 1024 → 512 (FDCAN NBTP.NBRP is 9-bit) - Zero-initialize canardProtocolStatus_t in GetProtocolStatus so TEC/REC/LEC/queue fields are not garbage on H7 builds - Remove always-true DataLength guard in canardSTM32Transmit; libcanard constrains data_len to 0-8 so the else branch was unreachable
- Mark canard and memory_pool static in dronecan.c (internal state, no reason for external linkage) - Fix spelling: "incremeneted" -> "incremented" in dronecan.c - Clarify H7 FDCAN_FILTER_DUAL comment: the specific filter is a no-op because ConfigGlobalFilter accepts all non-matching frames
Both F7 (max_quanta_per_bit=18) and H7 (max_quanta_per_bit=17) differ from the reference paper maximum of 17. Mark both for revisit alongside SJW tuning — SJW=3 was set conservatively and should be validated against actual crystal performance on the board.
Per INAV coding standards — dead debug logging should be deleted, not left commented out.
The raw sjw=3 value is written to the HAL without the N-1 offset that BS1/BS2 correctly apply, so the hardware actually runs 4TQ SJW not 3TQ. Mark with TODO for the planned SJW/timing tuning pass.
F7 driver: - Fix false-full race in canTxQueuePush: read tail fresh at the full-check so a concurrent ISR drain cannot cause a spurious TX drop; move snapshot after the push for HWM calculation only H7 driver: - Replace ErrorCode/!= 1 timing check with if (!canardSTM32ComputeTimings()) to match F7 pattern and remove fragile bool-stored-as-int16_t - Fix @RetVal for canardSTM32CAN1_Init: documents CANARD_OK (0) on success, not 1 as previously stated - Remove stray USER CODE BEGIN FDCAN1_Init 2 CubeMX sentinel
- Set optional_field_flags to OPTIONAL_FIELD_FLAG_VCS_COMMIT in handle_GetNodeInfo: the field is a bitmask indicating which optional fields are valid, not a storage field for patch version - Remove stale TODO comment from handle_GetNodeInfo - Drop '/ 32' denominator from CLI TX queue and RX buffer lines: value is misleading on H7 (no software queue exists); fill level alone is sufficient on both targets
- Remove rhetorical question comments from GPIO init in F7 and H7 drivers (AF selection is target-specific by design; cmake selects the right driver) - Remove "Wheres the data?" comment from canardSTM32Recieve in F7 driver - Replace "unsure about this one" on H7 TX header fields with factual notes: ESI_ACTIVE (not error-passive), NO_TX_EVENTS (no FIFO allocated), MessageMarker=0 (unused without TX events) - Normalize tab/space mixed indentation in dronecanInit in dronecan.c
- Fix F7 canardSTM32CAN1_Init @RetVal: CANARD_OK is 0, not 1 - Enable AutoRetransmission on H7 to match F7; without it frames lost to bus arbitration are silently dropped, breaking multi-frame transfers - Add volatile qualifier to RX buffer function parameters so the compiler cannot cache index reads across ISR/main-loop boundary - Rename canardSTM32Recieve -> canardSTM32Receive (fix persistent misspelling in header, F7/H7/SITL drivers, and dronecan.c call site)
…N drivers - Correct file header names in canard_stm32f7xx_driver.c and canard_stm32h7xx_driver.c - Remove leading space before #ifdef CAN1_STANDBY in both drivers - Replace all tabs with spaces in dronecan.c (RX loop, 1Hz service block, processCanardTxQueue body)
…neCAN drivers
- Add canardSTM32GetTxQueueFillLevel() stub to canard_sitl_driver.c to fix
linker error when building SITL target with USE_DRONECAN defined
- Add memset in sitlCANGetStatsStub and sitlCANGetStatsSocketCAN to zero all
canardProtocolStatus_t fields, preventing stack garbage in CLI output
- Fix stale comment in processCanardTxQueue: "TX FIFO full" -> "SW TX queue full"
- Remove redundant LOG_DEBUG("Battery Info") from onTransferReceived
- Fix 6-space indent to 4-space in CAN1_RX0_IRQHandler
…ership docs - rxBufferPopFrame: snapshot readIndex before __DMB() so the barrier precedes the data dereference, matching the push-side store-release pattern - canardSTM32Transmit: seed HW mailboxes when any mailbox is free (> 0) rather than only when all three are idle (== 3), preventing a permanent TX stall when HAL_CAN_AddTxMessage fails with the SW queue non-empty - Add counter ownership comments: canTxDropped/canTxQueueHWM written by main loop, canRxBufferHWM written by ISR; all are single-instruction atomic reads - H7 AutoRetransmission=ENABLE: add detailed comment explaining frame ordering guarantee (FDCAN TX FIFO is strictly ordered) and the TODO to evaluate whether DISABLE would be safer for the H7 polling model
Owner
Author
|
Closing in favour of a direct PR to iNavFlight/inav targeting maintenance-10.x. Depends on iNavFlight#11514. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Migrates the STM32F7 bxCAN TX implementation from polling/blocking to ISR-driven transmission using a 32-deep SPSC ring queue. Also fixes several latent bugs in H7 and SITL drivers found during the review.
Changes
canardSTM32GetTxQueueFillLevelstub, improved diagnosticscanardSTM32GetTxQueueFillLevelstub (was causing linker error), fixed uninitialized status fieldscanardSTM32Receive(was "Recieve")optional_field_flagsmisuse, made canard/memory_pool static, removed printf from ISR contextdronecancommand showing TX/RX queue stats and protocol statusTesting
ISR-driven TX verified queuing and transmitting frames correctly. SPSC ring buffer (32-deep) operating with no overruns. CLI
dronecancommand confirmed showing live queue fill levels and protocol status. No regression in RX or frame parsing.F4 build failure is a pre-existing issue in PR iNavFlight#11514 (
__FPU_PRESENTredefined incmake/stm32f4.cmake). The sameSYSTEM_INCLUDE_DIRECTORIESfix applied tocmake/stm32f7.cmake(commit 37e6b23) needs to be applied tocmake/stm32f4.cmake. This must be resolved in iNavFlight#11514 before that PR can land.Dependencies
This PR depends on iNavFlight#11514 (STM32F7 HAL v1.3.3 update) being merged first.
When iNavFlight#11514 merges into iNavFlight/inav master, retarget this PR to master before merging.
Known Deferred Items (Non-Blocking)
max_quanta_per_bit=18on F7 vs paper max of 17 (H7 uses 17) — TODO comment in place for future harmonizationhandle_NodeStatus/handle_GNSSRCTMStream— scaffolding/silent discard, needs follow-upNone of these block merging.