Skip to content

fix tcp partial write#2

Open
JimMoen wants to merge 2 commits into
neugates:neuronfrom
JimMoen:fix/tcp-partial-write-neugates-neuron
Open

fix tcp partial write#2
JimMoen wants to merge 2 commits into
neugates:neuronfrom
JimMoen:fix/tcp-partial-write-neugates-neuron

Conversation

@JimMoen
Copy link
Copy Markdown

@JimMoen JimMoen commented Feb 28, 2026

Summary

Fix a bug in tcp_dowrite() where sendmsg() partial writes cause byte interleaving between concurrent aios on the same TCP connection.

Problem

posix_tcpconn.c:tcp_dowrite() assumes that sendmsg() never returns a partial result (as stated in the original comment). This assumption is incorrect for TCP sockets when the kernel send buffer is nearly full, sendmsg() can write fewer bytes than requested.

When a partial write occurs, the old code immediately finishes the current aio and moves on to the next one in the writeq. This causes data from different aios to be interleaved in the TCP byte stream.

How this manifests in MQTT

In the NanoSDK MQTT transport, txaio (PUBLISH) and qsaio (PUBREL/PUBREC ACK) can be queued concurrently on the same TCP writeq. A PUBLISH message is sent as two iovs:

  • iov[0]: MQTT fixed header (2 5 bytes, very small)
  • iov[1]: variable header + payload (topic_length + topic + packet_id + payload, large)

When sendmsg() does a partial write on a PUBLISH (e.g. only sends iov[0]):

  1. tcp_dowrite() finishes the PUBLISH aio with a short count
  2. The while loop processes the next aio a PUBREL (0x62 0x02 ...) writing it into the TCP stream
  3. The PUBLISH send_cb detects the short count and resubmits iov[1] (the body)

The broker receives:

[PUBLISH header]  [PUBREL bytes]  [PUBLISH body]
                    interleaved!

The broker interprets the PUBREL bytes (0x62 0x02) at the position where it expects topic_length, reading it as 25090 (0x6202), which exceeds the remaining packet length, causing:

frame_error: #{cause => invalid_topic, parsed_length => 25090}

Fix

After sendmsg(), advance the aio's iov by the number of bytes actually sent. If data remains unsent, continue the loop to retry sendmsg() with the same aio instead of moving to the next one. Only after the aio is fully transmitted do we finish it and process the next aio in the writeq.

nni_aio_bump_count(aio, n);

nni_aio_iov_advance(aio, n);
if (nni_aio_iov_count(aio) > 0) {
    continue;  // retry with remaining data
}

nni_aio_list_remove(aio);
nni_aio_finish(aio, 0, nni_aio_count(aio));

If sendmsg() returns EAGAIN on retry, the function returns normally and the aio stays in the writeq when the socket becomes writable again, tcp_dowrite() resumes with the same aio.

Test

Added test_tcp_concurrent_send_data_integrity which submits 4 concurrent aios (each 64KB with a unique byte pattern) to the same TCP stream and verifies on the receiving end that all data blocks arrive contiguously in FIFO order with no interleaving.

…aving

sendmsg() on TCP sockets CAN return partial results when the send
buffer is nearly full. The old code assumed this never happens and
immediately finished the aio, allowing the next queued aio to be
processed. This caused byte interleaving when multiple aios were
queued concurrently (e.g. MQTT PUBLISH via txaio and PUBREL via
qsaio), resulting in corrupted MQTT frames on the broker side.

Fix: after sendmsg(), advance the aio's iov by the number of bytes
actually sent. If data remains, continue the loop to retry sendmsg
with the same aio instead of moving to the next one. This ensures
each aio is fully transmitted before processing the next.
Verify that multiple aios submitted concurrently to the same TCP
stream produce contiguous data blocks on the receiving end, with no
byte interleaving between aios. Each aio sends 64KB filled with a
unique byte pattern; the receiver verifies all patterns arrive as
intact contiguous blocks in FIFO order.

This test validates the partial write fix in tcp_dowrite() which
ensures each aio is fully transmitted before processing the next.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant