Skip to content

DroneCAN: ISR-driven TX for STM32F7 bxCAN driver#14

Closed
daijoubu wants to merge 30 commits into
feature/stm32f7-hal-v1.3.3-updatefrom
feature/stm32f7-can-tx-isr
Closed

DroneCAN: ISR-driven TX for STM32F7 bxCAN driver#14
daijoubu wants to merge 30 commits into
feature/stm32f7-hal-v1.3.3-updatefrom
feature/stm32f7-can-tx-isr

Conversation

@daijoubu
Copy link
Copy Markdown
Owner

Summary

Migrates the STM32F7 bxCAN TX implementation from polling/blocking to ISR-driven transmission using a 32-deep SPSC ring queue. Also fixes several latent bugs in H7 and SITL drivers found during the review.

Changes

  • STM32F7: Full rewrite to ISR-driven TX (CAN1_TX_IRQHandler), SPSC ring queue (TX_QUEUE_SIZE=32), TXFP=ENABLE for FIFO ordering, Cortex-M7 DMB barriers, ATOMIC_BLOCK critical sections
  • STM32H7: Fixed AutoRetransmission (was incorrectly DISABLE, now ENABLE), added missing canardSTM32GetTxQueueFillLevel stub, improved diagnostics
  • SITL: Added missing canardSTM32GetTxQueueFillLevel stub (was causing linker error), fixed uninitialized status fields
  • Header: Fixed spelling of canardSTM32Receive (was "Recieve")
  • dronecan.c: Reordered with forward declarations, fixed optional_field_flags misuse, made canard/memory_pool static, removed printf from ISR context
  • CLI: Added dronecan command showing TX/RX queue stats and protocol status

Testing

Target MCU Result
MATEKF765SE F7 PASS
MATEKH743 H7 PASS
SITL PASS
SPEEDYBEEF405WING F4 FAIL (pre-existing — see note)

ISR-driven TX verified queuing and transmitting frames correctly. SPSC ring buffer (32-deep) operating with no overruns. CLI dronecan command confirmed showing live queue fill levels and protocol status. No regression in RX or frame parsing.

F4 build failure is a pre-existing issue in PR iNavFlight#11514 (__FPU_PRESENT redefined in cmake/stm32f4.cmake). The same SYSTEM_INCLUDE_DIRECTORIES fix applied to cmake/stm32f7.cmake (commit 37e6b23) needs to be applied to cmake/stm32f4.cmake. This must be resolved in iNavFlight#11514 before that PR can land.

Dependencies

This PR depends on iNavFlight#11514 (STM32F7 HAL v1.3.3 update) being merged first.

When iNavFlight#11514 merges into iNavFlight/inav master, retarget this PR to master before merging.

Known Deferred Items (Non-Blocking)

  • max_quanta_per_bit=18 on F7 vs paper max of 17 (H7 uses 17) — TODO comment in place for future harmonization
  • SJW raw encoding discrepancy (stores 3, encodes 4TQ) — pre-existing bug, TODO comment added
  • H7 tec/rec/lec not populated in protocol status — diagnostic gap only
  • handle_NodeStatus / handle_GNSSRCTMStream — scaffolding/silent discard, needs follow-up

None of these block merging.

daijoubu added 30 commits May 2, 2026 15:22
… verbosity

Remove LOG_DEBUG from canardSTM32Transmit() failure path — this will run
in ISR callback context after the TX ISR migration and calling printf from
an ISR is undefined behaviour.

Collapse 5 intermediate canardSTM32ComputeTimings() debug calls to the
single summary line already present at the end of the function.

Remove per-frame success-path LOG_DEBUG calls in dronecan.c transfer
handlers. These fire on every received GPS fix, NodeStatus and battery
update and are too noisy for upstream submission.
Replace blocking HAL_CAN_AddTxMessage() polling in canardSTM32Transmit()
with a SPSC SW ring buffer (TX_QUEUE_SIZE=32) drained by CAN1_TX_IRQHandler
via HAL_CAN_TxMailbox{0,1,2}CompleteCallback().

Bootstrapping: when all 3 mailboxes are idle RQCP=0 so TMEIE alone won't
fire; seed the HW directly from the main-loop path when
GetTxMailboxesFreeLevel()==3. ISR handles all subsequent frames.

CAN_IT_TX_MAILBOX_EMPTY is activated on each enqueue and deactivated when
the queue drains, keeping the interrupt silent at idle.
Adds four debug visibility hooks to the STM32F7 CAN TX ISR driver:

- canTxDropped counter: incremented when SW TX queue (depth 32) is full
  and canardSTM32Transmit() silently drops a frame. Cumulative across
  the session; exposed via canardProtocolStatus_t.tx_dropped.

- ESR register fields in canardProtocolStatus_t: tec (TX error counter),
  rec (RX error counter), lec (last error code) read from hcan1.Instance->ESR.
  Reads BusOff/ErrorPassive atomically with the ESR snapshot.

- canardSTM32GetTxQueueFillLevel(): returns current SW queue depth
  as (head - tail + TX_QUEUE_SIZE) % TX_QUEUE_SIZE.

- CLI 'dronecan' command (USE_DRONECAN guard): prints all of the above
  plus RX buffer fill level in a single readable block. Useful for
  polling during long-running hardware validation tests without
  needing the configurator or MSP.
- Add NVIC_PRIO_CAN = 4 to nvic.h; use it for CAN1_RX0 and CAN1_TX
  IRQs (was 0, violating NVIC_PRIO_MAX=1 and risking preemption of
  motor timers at priority 3)
- Add canTxQueueHWM / canRxBufferHWM tracking; expose via
  canardProtocolStatus_t and dronecan CLI command as
  "N / 32 (hwm: N)" for both TX queue and RX buffer
- Include drivers/nvic.h in canard_stm32f7xx_driver.c
…pace

- Replace direct canTxQueuePop() in main-loop seed path with
  canTxDrainQueue() called under __disable_irq()/__enable_irq(),
  making the ISR the sole consumer of the SW TX queue at all times
- Only call ActivateNotification if queue is non-empty after seeding,
  eliminating the spurious ISR invocation on single-frame bursts
- Add forward declaration for canTxDrainQueue()
- Strip trailing whitespace throughout file
Add __DMB() in canTxQueuePush() between writing frame data and
advancing head, and in canTxQueuePop() between reading head and
reading frame data. Prevents the M7 write buffer from reordering
the head store before the frame data store is globally visible.
Enable TransmitFifoPriority (TXFP) so the peripheral transmits mailboxes
in request order rather than mailbox-number order.  Without TXFP, loading
the next frame into mailbox 0 after it completes causes it to transmit
before frames still pending in mailboxes 1 and 2, corrupting the toggle
bit sequence of multi-frame UAVCAN transfers.

With TXFP enabled, restore the while loop in canTxDrainQueue to fill all
free mailboxes per ISR callback for full pipeline utilization.

Also fix remaining code review findings:
- Check HAL_CAN_AddTxMessage return value; count failures as dropped frames
- Add CanTxQueue_t struct tag to match RxBuffer_t convention
- Remove blank line artifact at top of onTransferReceived body
- Add volatile to RxBuffer writeIndex/readIndex: written in ISR and
  read in main loop, missing volatile allows compiler to cache stale
  values causing silent receive stalls
- Peek-then-pop in canTxDrainQueue: advance tail only after HAL_CAN_
  AddTxMessage succeeds so a HAL rejection never silently loses a frame
- Move ActivateNotification inside critical section: eliminates race
  where ISR could drain queue and deactivate between __enable_irq and
  the ActivateNotification call, leaving a spurious notification armed
- Snapshot tail before advancing head in canTxQueuePush: prevents ISR
  advancing tail between the HWM read and head write, which understated
  the high water mark
- Remove dead canTxQueuePop (inlined into peek-then-pop drain logic)
- Remove dead RxFrame_t rxMsg global
- Fix 3-space indent on first if in canardSTM32Transmit
- Fix missing space in ErrorPassive CLI output
- Add missing newline at EOF
…eNotification error handling

- Fix REC register extraction: ESR bits 7:0 are error flags, REC lives
  at bits 31:24 (CAN_ESR_REC_Pos=24). Previous code always read near-
  zero instead of the actual receive error counter.
- Remove canTxDropped increment from ISR drain path: a HAL_CAN_
  AddTxMessage failure leaves tail unadvanced so the frame is retried
  on the next ISR — it is not dropped. Incrementing the drop counter
  here gave wrong semantics and created a dual-writer race on the
  volatile uint16_t counter.
- Check ActivateNotification return in seed path: failure means the TX
  ISR will never fire for in-flight completions, stalling the queue with
  no indication. Return 0 on failure so the DroneCAN task retries.
…e access

Replace direct struct field access in canTxDrainQueue with two new
queue functions: canTxQueuePeek() returns a const pointer to the tail
frame (including the DMB barrier) without consuming it, and
canTxQueueConsume() advances the tail after the HW accepts the frame.

This keeps queue invariants — barrier placement, index arithmetic, and
the peek-then-consume ordering — inside the queue API rather than
spread across canTxDrainQueue.
…, flag CLI denominators

- Fix ESR bit-range annotation on rec field in header: was ESR[7:0]
  (error flags), correct position is ESR[31:24] per CAN_ESR_REC_Pos=24.
  Extraction code was already correct; wrong comment risked a future
  maintainer reverting it.
- Add comment explaining why TxMailboxAbortCallback is not wired:
  AutoRetransmission=ENABLE prevents ALST; TERR leaves the frame in the
  queue (tail not advanced) so the next Transmit call re-seeds the HW.
- Add coupling comments on hardcoded /32 denominators in CLI output so
  they stay in sync if TX_QUEUE_SIZE or RX_BUFFER_SIZE ever changes.
Add @brief/@param/@RetVal doc blocks to all previously undocumented
functions in canard_stm32f7xx_driver.c. Fix stale FDCAN1/hfdcan
copy-paste in the canardSTM32CAN1_Init block — this is bxCAN, not FDCAN.
Group functions by subsystem in all three files to aid navigation:

F7 driver: Initialization → RX subsystem → TX subsystem → Diagnostics.
Removes forward declarations (each function now defined before use).

H7 driver: Initialization → RX → TX → Diagnostics. Same section layout
as F7 for consistency. Also adds missing canardSTM32GetTxQueueFillLevel
stub (declared in shared header, never implemented for H7 — caused
undefined-reference link error on MATEKH743).

dronecan.c: Message handlers → Libcanard callbacks → Message senders →
TX/periodic processing → Public API.

No logic changed. Both MATEKF765SE and MATEKH743 build clean.
…ions

Public API (dronecanInit, dronecanUpdate) at the top, internals below:
TX/periodic processing → message senders → libcanard callbacks →
message handlers. Add forward declarations for all internal functions.
Mark all internal functions static.
- canardSTM32Transmit: return 1 after successful canTxQueuePush regardless
  of ActivateNotification result — prevents libcanard from re-pushing the
  same frame and producing duplicate CAN transmissions that corrupt the
  UAVCAN transfer-ID toggle sequence
- handle_GetNodeInfo: replace printf() with LOG_DEBUG() — printf has no
  output backend in firmware and risks a fault or linker error
- rxBufferPushFrame/rxBufferPopFrame: change return type uint8_t → int8_t
  so the -1 error sentinel is type-correct
Issue 4 — AbortCallback comment: previous text incorrectly described TERR
as a synchronous AddTxMessage failure and misattributed the abort path
suppression. Correct explanation: AutoRetransmission=ENABLE causes the
hardware to retransmit automatically on arbitration loss (ALST), so the
software abort path is never triggered; bus-off is handled by the state
machine in dronecanUpdate().

Issue 5 — critical section: replace __disable_irq()/__enable_irq() with
ATOMIC_BLOCK(NVIC_PRIO_CAN) from build/atomic.h. This raises BASEPRI to
mask only the CAN TX ISR (priority 4 and below), allowing higher-priority
IRQs such as the gyro timer (priority 3) to continue firing during the
mailbox-seeding window. Also simplifies the two return-1 paths into one.
- Fix 7: Clarify SJW=3 comment in canardSTM32ComputeTimings
- Fix 8: Remove commented-out CubeMX scaffold from canardSTM32CAN1_Init
- Fix 9: Convert tab indentation to 4-space in canardSTM32Recieve (F7)
- Fix 10: Remove USER CODE BEGIN/END sentinel comments from H7 FDCAN1_Init
- Fix 11: Replace Unicode section separators with ASCII dashes in all three files
… and 12)

- Fix 6: Clarify hardcoded 32 denominators in CLI status output — add
  comments naming the matching TX_QUEUE_SIZE / RX_BUFFER_SIZE constants
- Fix 12: Remove WHAT-comment "// Transmitting" from processCanardTxQueue
F7 driver:
- Add __DMB() before writeIndex advance in rxBufferPushFrame (ISR→main
  memory ordering: ensures frame data visible before consumer sees new index)
- Add __DMB() before memcpy in rxBufferPopFrame (acquire barrier: ensures
  ISR's writes are visible before main loop reads frame data)
- Check canardSTM32ComputeTimings() return value; return error if no
  valid timing solution found for the requested bitrate
- Mark rxBufferPushFrame, rxBufferPopFrame, rxBufferNumMessages static

H7 driver:
- Fix prescaler upper bound: 1024 → 512 (FDCAN NBTP.NBRP is 9-bit)
- Zero-initialize canardProtocolStatus_t in GetProtocolStatus so
  TEC/REC/LEC/queue fields are not garbage on H7 builds
- Remove always-true DataLength guard in canardSTM32Transmit; libcanard
  constrains data_len to 0-8 so the else branch was unreachable
- Mark canard and memory_pool static in dronecan.c (internal state,
  no reason for external linkage)
- Fix spelling: "incremeneted" -> "incremented" in dronecan.c
- Clarify H7 FDCAN_FILTER_DUAL comment: the specific filter is a no-op
  because ConfigGlobalFilter accepts all non-matching frames
Both F7 (max_quanta_per_bit=18) and H7 (max_quanta_per_bit=17) differ from
the reference paper maximum of 17. Mark both for revisit alongside SJW
tuning — SJW=3 was set conservatively and should be validated against
actual crystal performance on the board.
Per INAV coding standards — dead debug logging should be deleted,
not left commented out.
The raw sjw=3 value is written to the HAL without the N-1 offset that
BS1/BS2 correctly apply, so the hardware actually runs 4TQ SJW not 3TQ.
Mark with TODO for the planned SJW/timing tuning pass.
F7 driver:
- Fix false-full race in canTxQueuePush: read tail fresh at the full-check
  so a concurrent ISR drain cannot cause a spurious TX drop; move snapshot
  after the push for HWM calculation only

H7 driver:
- Replace ErrorCode/!= 1 timing check with if (!canardSTM32ComputeTimings())
  to match F7 pattern and remove fragile bool-stored-as-int16_t
- Fix @RetVal for canardSTM32CAN1_Init: documents CANARD_OK (0) on success,
  not 1 as previously stated
- Remove stray USER CODE BEGIN FDCAN1_Init 2 CubeMX sentinel
- Set optional_field_flags to OPTIONAL_FIELD_FLAG_VCS_COMMIT in
  handle_GetNodeInfo: the field is a bitmask indicating which optional
  fields are valid, not a storage field for patch version
- Remove stale TODO comment from handle_GetNodeInfo
- Drop '/ 32' denominator from CLI TX queue and RX buffer lines:
  value is misleading on H7 (no software queue exists); fill level
  alone is sufficient on both targets
- Remove rhetorical question comments from GPIO init in F7 and H7 drivers
  (AF selection is target-specific by design; cmake selects the right driver)
- Remove "Wheres the data?" comment from canardSTM32Recieve in F7 driver
- Replace "unsure about this one" on H7 TX header fields with factual notes:
  ESI_ACTIVE (not error-passive), NO_TX_EVENTS (no FIFO allocated),
  MessageMarker=0 (unused without TX events)
- Normalize tab/space mixed indentation in dronecanInit in dronecan.c
- Fix F7 canardSTM32CAN1_Init @RetVal: CANARD_OK is 0, not 1
- Enable AutoRetransmission on H7 to match F7; without it frames lost
  to bus arbitration are silently dropped, breaking multi-frame transfers
- Add volatile qualifier to RX buffer function parameters so the compiler
  cannot cache index reads across ISR/main-loop boundary
- Rename canardSTM32Recieve -> canardSTM32Receive (fix persistent
  misspelling in header, F7/H7/SITL drivers, and dronecan.c call site)
…N drivers

- Correct file header names in canard_stm32f7xx_driver.c and canard_stm32h7xx_driver.c
- Remove leading space before #ifdef CAN1_STANDBY in both drivers
- Replace all tabs with spaces in dronecan.c (RX loop, 1Hz service block,
  processCanardTxQueue body)
…neCAN drivers

- Add canardSTM32GetTxQueueFillLevel() stub to canard_sitl_driver.c to fix
  linker error when building SITL target with USE_DRONECAN defined
- Add memset in sitlCANGetStatsStub and sitlCANGetStatsSocketCAN to zero all
  canardProtocolStatus_t fields, preventing stack garbage in CLI output
- Fix stale comment in processCanardTxQueue: "TX FIFO full" -> "SW TX queue full"
- Remove redundant LOG_DEBUG("Battery Info") from onTransferReceived
- Fix 6-space indent to 4-space in CAN1_RX0_IRQHandler
…ership docs

- rxBufferPopFrame: snapshot readIndex before __DMB() so the barrier precedes
  the data dereference, matching the push-side store-release pattern
- canardSTM32Transmit: seed HW mailboxes when any mailbox is free (> 0) rather
  than only when all three are idle (== 3), preventing a permanent TX stall when
  HAL_CAN_AddTxMessage fails with the SW queue non-empty
- Add counter ownership comments: canTxDropped/canTxQueueHWM written by main
  loop, canRxBufferHWM written by ISR; all are single-instruction atomic reads
- H7 AutoRetransmission=ENABLE: add detailed comment explaining frame ordering
  guarantee (FDCAN TX FIFO is strictly ordered) and the TODO to evaluate whether
  DISABLE would be safer for the H7 polling model
@daijoubu
Copy link
Copy Markdown
Owner Author

Closing in favour of a direct PR to iNavFlight/inav targeting maintenance-10.x. Depends on iNavFlight#11514.

@daijoubu daijoubu closed this May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant