Skip to content

Router::allocForSending: null-check packet pool allocation#10261

Open
nightjoker7 wants to merge 4 commits into
meshtastic:developfrom
nightjoker7:fix/router-allocforsending-null-check
Open

Router::allocForSending: null-check packet pool allocation#10261
nightjoker7 wants to merge 4 commits into
meshtastic:developfrom
nightjoker7:fix/router-allocforsending-null-check

Conversation

@nightjoker7
Copy link
Copy Markdown
Contributor

Summary

Router::allocForSending() calls `packetPool.allocZeroed()` and dereferences the result (`p->which_payload_variant = ...`) before any null check. When the static MemoryPool is exhausted, `allocZeroed()` returns nullptr (with a LOG_WARN) rather than asserting — so we crash reading `p->which_payload_variant`.

From MemoryPool.h:149-160 — static pools return nullptr when no free slot is available. That is a real failure mode under burst conditions (traceroute flood response, simultaneous module sends, ACK/NAK storms). On a ROUTER_LATE relaying traffic for a large neighborhood, the `MAX_PACKETS` limit can be saturated.

Fix

Null-check the allocation and return nullptr on failure. Every upstream caller already handles a nullptr return (eg. `RoutingModule::sendAckNak`, PositionModule, TelemetryModule, admin reply paths) — we just need to let them.

Test plan

  • Grep confirms the callers of `allocForSending()` all handle nullptr
  • Simulate pool exhaustion (fill txQueue) and confirm graceful degradation instead of crash

Related

This is the same bug class as #10252 (RadioLib RX path hardening, which included a similar allocZeroed NULL-check). Ongoing pool-allocation safety sweep.

packetPool.allocZeroed() returns nullptr when the static MemoryPool
(MAX_PACKETS slots) is exhausted — it logs a WARN and returns NULL
rather than asserting. Every line after the allocation dereferences
p->field, so an exhausted pool crashes instead of letting the caller
handle the failure.

Callers of allocForSending() already null-check the return value
(e.g. RoutingModule::sendAckNak, PositionModule, TelemetryModule,
admin paths), but we never reach those checks because we crash first.

Under burst send conditions — eg. a node flooding traceroute responses,
or multiple modules queueing replies simultaneously — the pool can
genuinely run out. On hw71 (ROUTER_LATE, relaying traffic for 100+
neighbors) this is reachable.
@github-actions github-actions Bot added needs-review Needs human review bugfix Pull request that fixes bugs labels Apr 23, 2026
@thebentern thebentern requested a review from Copilot April 23, 2026 11:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Hardens Router::allocForSending() against MemoryPool exhaustion by checking the allocZeroed() result before dereferencing, avoiding a crash and allowing existing callers to gracefully handle nullptr.

Changes:

  • Add a null-check after packetPool.allocZeroed() and return nullptr on allocation failure.
  • Emit a log when the packet pool is exhausted.

Comment thread src/mesh/Router.cpp Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@thebentern
Copy link
Copy Markdown
Contributor

@nightjoker7 I'm not sure every module actually handles nullptr gracefully. I believe I've seen this crash before when I have implemented an early return before. We need unit tests for each one before I will feel remotely comfortable with this change.

Review feedback from @thebentern on PR meshtastic#10261: 'I'm not sure every
module actually handles nullptr gracefully.'

Full audit of allocForSending() call sites:

Already NULL-safe (no change):
  - modules/TraceRouteModule.cpp:591  (if (p) { ... })
  - modules/TraceRouteModule.cpp:712  (if (p) { ... })
  - modules/TrafficManagementModule.cpp:1237  (if (!reply) return false;)

Were UNSAFE — would deref NULL on first line after alloc — now fixed:
  - mesh/MeshModule.cpp:59 (allocAckNak)    → return nullptr
  - mesh/MeshModule.cpp:85 (allocErrorResponse → forwards from allocAckNak) → return nullptr
  - mesh/SinglePortModule.h:34 (allocDataPacket base) → return nullptr
  - modules/SerialModule.h:73 (allocDataPacket base) → return nullptr
  - modules/SerialModule.cpp:311 (sendTelemetry) → LOG_WARN + return
  - graphics/niche/InkHUD/Applets/System/Menu/MenuApplet.cpp:1995 (sendText) → LOG_WARN + return
  - mqtt/MQTT.cpp:191 (JSON downlink sendtext) → LOG_WARN + return
  - mqtt/MQTT.cpp:222 (JSON downlink sendposition) → LOG_WARN + return

MeshModule::allocAckNak and ::allocErrorResponse change signatures
semantically (may now return nullptr). All upstream callers I can find
already check for nullptr before dereferencing the return (they use it
with router->send() / packetPool.release() / service->sendToMesh()
which tolerate NULL).

So after this change: every allocForSending() caller EITHER early-
returns on NULL, OR forwards NULL up its own call chain for the
outer layer to handle. The remaining question 'does every consumer
of the outer alloc function handle NULL' is beyond the scope of this
PR — that's a caller-audit of allocDataPacket / allocAckNak /
allocErrorResponse, not allocForSending itself.
@nightjoker7
Copy link
Copy Markdown
Contributor Author

@thebentern done full caller audit in 43bb5841f. All 10 call sites of allocForSending() now either early-return on NULL or forward NULL up their own call chain. See commit message for per-site classification. The three that were already NULL-safe (TraceRouteModule x2, TrafficManagement) unchanged; the seven that weren't now are.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/modules/SerialModule.h:79

  • SerialModuleRadio::allocDataPacket() now returns nullptr when the packet pool is exhausted, but there are call paths (e.g., SerialModuleRadio::sendPayload() via allocReply()) that dereference the returned packet without a null-check. To actually avoid pool-exhaustion crashes, propagate the nullptr handling to those callers (or ensure allocReply() never returns nullptr).
    meshtastic_MeshPacket *allocDataPacket()
    {
        // Update our local node info with our position (even if we don't decide to update anyone else)
        meshtastic_MeshPacket *p = router->allocForSending();
        if (!p)
            return nullptr;
        p->decoded.portnum = ourPortNum;

        return p;
    }

Comment thread src/mqtt/MQTT.cpp
Comment on lines 191 to +195
meshtastic_MeshPacket *p = router->allocForSending();
if (!p) {
LOG_WARN("MQTT downlink sendtext dropped: packet pool exhausted");
return;
}
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the sendtext downlink path, a packet is allocated before validating jsonPayloadStr.length(). If the payload is too long, the code logs and drops but never releases the allocated packet back to the pool, which can permanently reduce available pool slots and make exhaustion more likely. Ensure the allocated packet is released (or avoid allocating until after length validation) on the oversize-payload drop path.

Copilot uses AI. Check for mistakes.
Comment on lines 33 to 39
// Update our local node info with our position (even if we don't decide to update anyone else)
meshtastic_MeshPacket *p = router->allocForSending();
if (!p)
return nullptr;
p->decoded.portnum = ourPortNum;

return p;
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allocDataPacket() can now return nullptr when the packet pool is exhausted. Many existing callers (including ProtobufModule::allocDataProtobuf() and several modules that call allocDataPacket() directly) assume a non-null return and will still dereference the pointer, so the pool-exhaustion crash is not fully addressed. Either update the common helper(s) to propagate/handle nullptr safely (e.g., make allocDataProtobuf() return nullptr when allocation fails) and/or update call sites to check for nullptr before dereferencing.

Copilot uses AI. Check for mistakes.
Comment thread src/mesh/MeshModule.cpp Outdated
// auto p = allocDataProtobuf(c);
meshtastic_MeshPacket *p = router->allocForSending();
if (!p) {
LOG_WARN("allocAckNak: packet pool exhausted, dropping ack/nak");
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

packetPool.allocZeroed() already emits a LOG_WARN when the static pool has no free slots. Adding another unthrottled LOG_WARN here can double-log the same event and potentially flood logs during ACK/NAK storms (which are also a scenario where pool exhaustion occurs). Consider lowering this to LOG_DEBUG, throttling it, or removing it in favor of the pool's existing warning.

Suggested change
LOG_WARN("allocAckNak: packet pool exhausted, dropping ack/nak");

Copilot uses AI. Check for mistakes.
…endtext leak + MeshModule dedupe log

Copilot follow-up feedback after the allocForSending() caller audit. Three
distinct issues it identified on call sites the audit didn't cover:

1. ProtobufModule::allocDataProtobuf() still assumed a non-null return from
   allocDataPacket() and dereferenced `p->decoded` without a check. The
   audit fixed the direct allocDataPacket callers but missed this helper
   that many modules layer on top. Propagate nullptr to the caller; every
   current caller either null-checks or forwards the return.

2. MQTT JSON downlink 'sendtext' path allocated a MeshPacket before
   validating payload length — if the payload was too long the LOG_WARN
   branch returned without releasing the packet, permanently reducing pool
   slots. Add packetPool.release(p) on the oversize drop.

3. MeshModule::allocAckNak added an unthrottled LOG_WARN on pool
   exhaustion. The allocator itself already LOG_WARNs, so this double-logs
   — worst-case during an ACK/NAK storm, which is exactly when the pool
   exhausts. Drop the redundant log.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Pull request that fixes bugs needs-review Needs human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants