fix tcp partial write#2
Open
JimMoen wants to merge 2 commits into
Open
Conversation
…aving sendmsg() on TCP sockets CAN return partial results when the send buffer is nearly full. The old code assumed this never happens and immediately finished the aio, allowing the next queued aio to be processed. This caused byte interleaving when multiple aios were queued concurrently (e.g. MQTT PUBLISH via txaio and PUBREL via qsaio), resulting in corrupted MQTT frames on the broker side. Fix: after sendmsg(), advance the aio's iov by the number of bytes actually sent. If data remains, continue the loop to retry sendmsg with the same aio instead of moving to the next one. This ensures each aio is fully transmitted before processing the next.
Verify that multiple aios submitted concurrently to the same TCP stream produce contiguous data blocks on the receiving end, with no byte interleaving between aios. Each aio sends 64KB filled with a unique byte pattern; the receiver verifies all patterns arrive as intact contiguous blocks in FIFO order. This test validates the partial write fix in tcp_dowrite() which ensures each aio is fully transmitted before processing the next.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix a bug in
tcp_dowrite()wheresendmsg()partial writes cause byte interleaving between concurrent aios on the same TCP connection.Problem
posix_tcpconn.c:tcp_dowrite()assumes thatsendmsg()never returns a partial result (as stated in the original comment). This assumption is incorrect for TCP sockets when the kernel send buffer is nearly full,sendmsg()can write fewer bytes than requested.When a partial write occurs, the old code immediately finishes the current aio and moves on to the next one in the
writeq. This causes data from different aios to be interleaved in the TCP byte stream.How this manifests in MQTT
In the NanoSDK MQTT transport,
txaio(PUBLISH) andqsaio(PUBREL/PUBREC ACK) can be queued concurrently on the same TCPwriteq. A PUBLISH message is sent as two iovs:iov[0]: MQTT fixed header (2 5 bytes, very small)iov[1]: variable header + payload (topic_length + topic + packet_id + payload, large)When
sendmsg()does a partial write on a PUBLISH (e.g. only sendsiov[0]):tcp_dowrite()finishes the PUBLISH aio with a short countwhileloop processes the next aio a PUBREL (0x62 0x02 ...) writing it into the TCP streamsend_cbdetects the short count and resubmitsiov[1](the body)The broker receives:
The broker interprets the PUBREL bytes (
0x62 0x02) at the position where it expectstopic_length, reading it as 25090 (0x6202), which exceeds the remaining packet length, causing:frame_error: #{cause => invalid_topic, parsed_length => 25090}
Fix
After
sendmsg(), advance the aio's iov by the number of bytes actually sent. If data remains unsent,continuethe loop to retrysendmsg()with the same aio instead of moving to the next one. Only after the aio is fully transmitted do we finish it and process the next aio in the writeq.If sendmsg() returns EAGAIN on retry, the function returns normally and the aio stays in the writeq when the socket becomes writable again, tcp_dowrite() resumes with the same aio.
Test
Added test_tcp_concurrent_send_data_integrity which submits 4 concurrent aios (each 64KB with a unique byte pattern) to the same TCP stream and verifies on the receiving end that all data blocks arrive contiguously in FIFO order with no interleaving.