Router::allocForSending: null-check packet pool allocation#10261
Router::allocForSending: null-check packet pool allocation#10261nightjoker7 wants to merge 4 commits into
Conversation
packetPool.allocZeroed() returns nullptr when the static MemoryPool (MAX_PACKETS slots) is exhausted — it logs a WARN and returns NULL rather than asserting. Every line after the allocation dereferences p->field, so an exhausted pool crashes instead of letting the caller handle the failure. Callers of allocForSending() already null-check the return value (e.g. RoutingModule::sendAckNak, PositionModule, TelemetryModule, admin paths), but we never reach those checks because we crash first. Under burst send conditions — eg. a node flooding traceroute responses, or multiple modules queueing replies simultaneously — the pool can genuinely run out. On hw71 (ROUTER_LATE, relaying traffic for 100+ neighbors) this is reachable.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Hardens Router::allocForSending() against MemoryPool exhaustion by checking the allocZeroed() result before dereferencing, avoiding a crash and allowing existing callers to gracefully handle nullptr.
Changes:
- Add a null-check after
packetPool.allocZeroed()and returnnullptron allocation failure. - Emit a log when the packet pool is exhausted.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@nightjoker7 I'm not sure every module actually handles nullptr gracefully. I believe I've seen this crash before when I have implemented an early return before. We need unit tests for each one before I will feel remotely comfortable with this change. |
Review feedback from @thebentern on PR meshtastic#10261: 'I'm not sure every module actually handles nullptr gracefully.' Full audit of allocForSending() call sites: Already NULL-safe (no change): - modules/TraceRouteModule.cpp:591 (if (p) { ... }) - modules/TraceRouteModule.cpp:712 (if (p) { ... }) - modules/TrafficManagementModule.cpp:1237 (if (!reply) return false;) Were UNSAFE — would deref NULL on first line after alloc — now fixed: - mesh/MeshModule.cpp:59 (allocAckNak) → return nullptr - mesh/MeshModule.cpp:85 (allocErrorResponse → forwards from allocAckNak) → return nullptr - mesh/SinglePortModule.h:34 (allocDataPacket base) → return nullptr - modules/SerialModule.h:73 (allocDataPacket base) → return nullptr - modules/SerialModule.cpp:311 (sendTelemetry) → LOG_WARN + return - graphics/niche/InkHUD/Applets/System/Menu/MenuApplet.cpp:1995 (sendText) → LOG_WARN + return - mqtt/MQTT.cpp:191 (JSON downlink sendtext) → LOG_WARN + return - mqtt/MQTT.cpp:222 (JSON downlink sendposition) → LOG_WARN + return MeshModule::allocAckNak and ::allocErrorResponse change signatures semantically (may now return nullptr). All upstream callers I can find already check for nullptr before dereferencing the return (they use it with router->send() / packetPool.release() / service->sendToMesh() which tolerate NULL). So after this change: every allocForSending() caller EITHER early- returns on NULL, OR forwards NULL up its own call chain for the outer layer to handle. The remaining question 'does every consumer of the outer alloc function handle NULL' is beyond the scope of this PR — that's a caller-audit of allocDataPacket / allocAckNak / allocErrorResponse, not allocForSending itself.
|
@thebentern done full caller audit in 43bb5841f. All 10 call sites of |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/modules/SerialModule.h:79
SerialModuleRadio::allocDataPacket()now returnsnullptrwhen the packet pool is exhausted, but there are call paths (e.g.,SerialModuleRadio::sendPayload()viaallocReply()) that dereference the returned packet without a null-check. To actually avoid pool-exhaustion crashes, propagate the nullptr handling to those callers (or ensureallocReply()never returns nullptr).
meshtastic_MeshPacket *allocDataPacket()
{
// Update our local node info with our position (even if we don't decide to update anyone else)
meshtastic_MeshPacket *p = router->allocForSending();
if (!p)
return nullptr;
p->decoded.portnum = ourPortNum;
return p;
}
| meshtastic_MeshPacket *p = router->allocForSending(); | ||
| if (!p) { | ||
| LOG_WARN("MQTT downlink sendtext dropped: packet pool exhausted"); | ||
| return; | ||
| } |
There was a problem hiding this comment.
In the sendtext downlink path, a packet is allocated before validating jsonPayloadStr.length(). If the payload is too long, the code logs and drops but never releases the allocated packet back to the pool, which can permanently reduce available pool slots and make exhaustion more likely. Ensure the allocated packet is released (or avoid allocating until after length validation) on the oversize-payload drop path.
| // Update our local node info with our position (even if we don't decide to update anyone else) | ||
| meshtastic_MeshPacket *p = router->allocForSending(); | ||
| if (!p) | ||
| return nullptr; | ||
| p->decoded.portnum = ourPortNum; | ||
|
|
||
| return p; |
There was a problem hiding this comment.
allocDataPacket() can now return nullptr when the packet pool is exhausted. Many existing callers (including ProtobufModule::allocDataProtobuf() and several modules that call allocDataPacket() directly) assume a non-null return and will still dereference the pointer, so the pool-exhaustion crash is not fully addressed. Either update the common helper(s) to propagate/handle nullptr safely (e.g., make allocDataProtobuf() return nullptr when allocation fails) and/or update call sites to check for nullptr before dereferencing.
| // auto p = allocDataProtobuf(c); | ||
| meshtastic_MeshPacket *p = router->allocForSending(); | ||
| if (!p) { | ||
| LOG_WARN("allocAckNak: packet pool exhausted, dropping ack/nak"); |
There was a problem hiding this comment.
packetPool.allocZeroed() already emits a LOG_WARN when the static pool has no free slots. Adding another unthrottled LOG_WARN here can double-log the same event and potentially flood logs during ACK/NAK storms (which are also a scenario where pool exhaustion occurs). Consider lowering this to LOG_DEBUG, throttling it, or removing it in favor of the pool's existing warning.
| LOG_WARN("allocAckNak: packet pool exhausted, dropping ack/nak"); |
…endtext leak + MeshModule dedupe log Copilot follow-up feedback after the allocForSending() caller audit. Three distinct issues it identified on call sites the audit didn't cover: 1. ProtobufModule::allocDataProtobuf() still assumed a non-null return from allocDataPacket() and dereferenced `p->decoded` without a check. The audit fixed the direct allocDataPacket callers but missed this helper that many modules layer on top. Propagate nullptr to the caller; every current caller either null-checks or forwards the return. 2. MQTT JSON downlink 'sendtext' path allocated a MeshPacket before validating payload length — if the payload was too long the LOG_WARN branch returned without releasing the packet, permanently reducing pool slots. Add packetPool.release(p) on the oversize drop. 3. MeshModule::allocAckNak added an unthrottled LOG_WARN on pool exhaustion. The allocator itself already LOG_WARNs, so this double-logs — worst-case during an ACK/NAK storm, which is exactly when the pool exhausts. Drop the redundant log.
Summary
Router::allocForSending() calls `packetPool.allocZeroed()` and dereferences the result (`p->which_payload_variant = ...`) before any null check. When the static MemoryPool is exhausted, `allocZeroed()` returns nullptr (with a LOG_WARN) rather than asserting — so we crash reading `p->which_payload_variant`.
From MemoryPool.h:149-160 — static pools return nullptr when no free slot is available. That is a real failure mode under burst conditions (traceroute flood response, simultaneous module sends, ACK/NAK storms). On a ROUTER_LATE relaying traffic for a large neighborhood, the `MAX_PACKETS` limit can be saturated.
Fix
Null-check the allocation and return nullptr on failure. Every upstream caller already handles a nullptr return (eg. `RoutingModule::sendAckNak`, PositionModule, TelemetryModule, admin reply paths) — we just need to let them.
Test plan
Related
This is the same bug class as #10252 (RadioLib RX path hardening, which included a similar allocZeroed NULL-check). Ongoing pool-allocation safety sweep.