Skip to content

discard rejected mavlink frames entirely#508

Open
onur-ozkan wants to merge 7 commits into
mavlink:masterfrom
Orkavian:strict-packet-handling
Open

discard rejected mavlink frames entirely#508
onur-ozkan wants to merge 7 commits into
mavlink:masterfrom
Orkavian:strict-packet-handling

Conversation

@onur-ozkan

@onur-ozkan onur-ozkan commented Jun 6, 2026

Copy link
Copy Markdown
Member

Fixes packet-in-packet parsing for rejected mavlink frames.

Mavlink frames are length delimited so bytes inside a frame payload must remain part of that frame even if validation fails.

Previously mavlink frame (for both v1 and v2) failed CRC validation and the parser advanced by one byte and resumed scanning for the next MAV_STX / MAV_STX_V2. This allowed a marker byte inside the rejected frame payload to be treated as a new packet start.

Now rejected frames are consumed entirely before parsing resumes to prevent payload bytes from being decoded as new separate packets.

When a mavlink 2 frame fails CRC validation, the parser advanced
by one byte and resumed scanning for MAV_STX_V2. This allowed a
marker byte inside the rejected frame payload to be treated as a
new packet start.

Instead, consume the whole candidate frame before looking for the
next packet. Do the same for unsupported incompatibility flags so
rejected mavlink 2 frames are skipped consistently.

Signed-off-by: Onur Özkan <work@onurozkan.dev>
When a mavlink 1 frame fails CRC validation, the parser advanced
by one byte and resumed scanning for MAV_STX. This allowed a marker
byte inside the rejected frame payload to be treated as a new packet
start.

Instead, consume the whole candidate frame before looking for the
next packet so rejected mavlink 1 frames are skipped consistently.

Signed-off-by: Onur Özkan <work@onurozkan.dev>
@onur-ozkan onur-ozkan added kind:improvement Change that makes existing behavior better. priority:0 Urgent, highest priority. risk:breaking-change Change may break backward compatibility. risk:security May involve vulnerabilities. scope:core Affects the core library logic. labels Jun 6, 2026
Signed-off-by: Onur Özkan <work@onurozkan.dev>
This test no longer makes sense with the recent packet handling update.

Signed-off-by: Onur Özkan <work@onurozkan.dev>
@pv42

pv42 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

This has the issue that any "rouge" stx causes a frame detection and then interprets a garbage byte as frame length to skip. It is conceivable that a data stream cuts of the beginning of the first message, then a stx byte that is inside of a periodicly sent message could cause a skip over the stx at the next message start so that the next stx found is once again inside of the payload and a large number of messages get skipped.

Signed-off-by: Onur Özkan <work@onurozkan.dev>
@onur-ozkan

Copy link
Copy Markdown
Member Author

This has the issue that any "rouge" stx causes a frame detection and then interprets a garbage byte as frame length to skip. It is conceivable that a data stream cuts of the beginning of the first message, then a stx byte that is inside of a periodicly sent message could cause a skip over the stx at the next message start so that the next stx found is once again inside of the payload and a large number of messages get skipped.

As far as I remember there's no escape literal on mavlink specs so I wonder how other implementations avoid this.

@onur-ozkan

Copy link
Copy Markdown
Member Author

By the way I would rather accept potential over skipping than a packet-in-packet vulnerability. In both cases the input is already malformed so this seems like a tradeoff where we have to choose which failure mode is less harmful. If the data is malformed, skipping it sounds acceptable to me (at least compared to packet-in-packet).

@pv42

pv42 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

This doesn't really fix packet-in-packet, you could still send a large packet with a header claiming a length of 1 byte and then have an embedded packet at the appropriate offset.
Trusting the length field of a message that fails CRC can lead to bad outcomes.

@onur-ozkan

Copy link
Copy Markdown
Member Author

We already have that bad outcome on malformed input. :) I will check other implementations later today to see how they solve this.

Signed-off-by: Onur Özkan <work@onurozkan.dev>
@onur-ozkan

Copy link
Copy Markdown
Member Author

I checked mavsdk and they use the same length delimited parsing approach. In practice the length field would not be useful for framing if the parser ignored it after validation failed (which doesn't make any sense to have that information in the mav spec).

image

Also, I added a regression test that I think disproves your claim here, please correct me if I am missing something.

@pv42

pv42 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Also, I added a regression test that I think disproves your claim #508 (comment), please correct me if I am missing something.

This test would also succeed if we just return an error on crc fail and don't change the parser advancement.

0, // fake message id byte 0
0, // fake message id byte 1
0, // fake message id byte 2
0, // fake payload byte 0

@pv42 pv42 Jun 6, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing a 2 fake crc byte

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do that then we provide a full frame but what we are trying to do here is packet-in-packet injection.

Signed-off-by: Onur Özkan <work@onurozkan.dev>
@onur-ozkan

Copy link
Copy Markdown
Member Author

Also, I added a regression test that I think disproves your claim #508 (comment), please correct me if I am missing something.

This test would also succeed if we just return an error on crc fail and don't change the parser advancement.

Improved the coverage in 728cff2.

@pv42

pv42 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

I think if we don't want to jump back we could potentially remove the whole [Async]PeekReader code, but I am a bit concerned about the amount of packet lost is the file test. I am not sure where the data came from but I assumed its a real packet capture and then losing 3.5% of packets seems pretty high.

I also would prefer if some form of the agnostic recv test would stay, just a version where v1 and v2 messages are interleaved, no grabage data in between.

@onur-ozkan

Copy link
Copy Markdown
Member Author

We can't know whether they were real packets at all. The easiest way to verify that would be to write a similar test using the same file with other mavlink libraries.

@pv42

pv42 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

We can't know whether they were real packets at all.

They all pass the crc check so its rather unlikely that they are not real.

@onur-ozkan

Copy link
Copy Markdown
Member Author

I need to review reports generated from this log file by rust-mavlink and other mavlink implementations and then compare them. I'm not sure when I will be able to allocate time for this.

@pv42

pv42 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

I'm not sure when I will be able to allocate time for this.

I think I can look into it further tomorrow.

@onur-ozkan

Copy link
Copy Markdown
Member Author

I'm not sure when I will be able to allocate time for this.

I think I can look into it further tomorrow.

Thanks!

@onur-ozkan

Copy link
Copy Markdown
Member Author

Luckily I had a bit of time to review the C implementation this morning.

The main difference compared to our implementation, here, the C version only skips the incompat flags on incompat flags, where our version skips the packet frame based on the length.

So if the missing packets are caused by this path my understanding is that the log producer is not doing a great work on generating messages. I don't know if it's exported from ardupilot or px4. Either way I need to test this against a PX4 log before changing the behavior. A stable producer should not lose packets here unless it is emitting unsupported incompatibility flags or malformed frames.

@joaoantoniocardoso

Copy link
Copy Markdown
Contributor

I may be blindsided, but sincerely, I don't see a solution that doesn't open a window for other security/robustness problems. I believe we are dealing with a fundamental limitation of the protocol, and perhaps the best answer we can have is to document it and recommend signing, which would solve the PIP until the attacker is able to forge the signing. Then, the next solution was already given by Goodspeed et al.: encryption, which anyone can implement in a layer around/above this protocol.

@onur-ozkan

Copy link
Copy Markdown
Member Author

There is no single solution indeed. We just need to pick one that ends with less damage than the other.

@pv42

pv42 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

The .tlog file format seems to insert 8 bytes of timestamp in between each message so it does not really contain a pure mavlink message stream. There are 4 Mav2 STX in those bytes. In the current impl that does not impact the parsing (all messages are found, no false ones are detected) but on a sufficiently large file the inc flags would be 0 or 1 and crc would (randomly) be correct.

c_library_v2 only finds 1366 messages compared to 1426 in master and 1374 in this PR. It does abort parsing on an unknown inc flag but does not walk back the parser in that case. This causes it to skip a bit of data since it detects a few more STX bytes that dont belong to a real message.

@joaoantoniocardoso

Copy link
Copy Markdown
Contributor

There is no single solution indeed. We just need to pick one that ends with less damage than the other.

Right, but be aware that we'll (possibly) be swapping 1 known for N unknown problems, and the solution to this problem exists without code change.

@onur-ozkan

onur-ozkan commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

There is no single solution indeed. We just need to pick one that ends with less damage than the other.

Right, but be aware that we'll (possibly) be swapping 1 known for N unknown problems, and the solution to this problem exists without code change.

We are not trying something new. The approach we are taking is used by other libraries under this organization. Current problem is packet-in-packet injection and we are putting our best efforts to avoid this by switching to "may skip more messages than we need". In both cases the input is malformed, but still, skipping is much safer than accepting malformed packets. This is the trade-off and like I said, is already followed by other libraries in this organization. In fact, I don't think any mavlink library is doing what this one does (continiously looking for new packets in a single frame/packet).

Did you read the e-mail we (all the rust-mavlink maintainers) received recently? I assume you haven't checked that yet.

@joaoantoniocardoso

Copy link
Copy Markdown
Contributor

"may skip more messages than we need"

Now I see it. Well, "may skip more messages than we need" is also known as a DoS attack, which will be a known vulnerability: an attacker could use the current PoC idea to perform a DoS instead of injecting messages, which I agree is potentially less harmful. I also agree that the right move is to adhere to what's been practiced in the other parsers.

@onur-ozkan

Copy link
Copy Markdown
Member Author

c_library_v2 only finds 1366 messages compared to 1426 in master and 1374 in this PR.

Thanks for testing. Unless C library is considered unreliable parser, then this PR doesn't seem to lose any packets?

Btw, like I mentioned earlier, skipping incompat flags on incompat flags (not the entire frame) would increase the number of messages found in the test file. But then I would like to know why we find so many more messages than the C version. 😁 Maybe we should just copy C state machine into Rust and sync with their approach entirely.

@pv42

pv42 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

But then I would like to know why we find so many more messages than the C version.

It's basically random, and just depends on which STX bytes are encountered as the start of messages.

@onur-ozkan

Copy link
Copy Markdown
Member Author

But then I would like to know why we find so many more messages than the C version.

It's basically random, and just depends on which STX bytes are encountered as the start of messages.

What do you mean by random? Having random behaviour in a parser sounds terrible.

@pv42

pv42 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

No the parsing itself is not random, but it is not predictable which parser will find more messages, which messages it will find, and there are messages that are only found by either one parser.

@onur-ozkan

Copy link
Copy Markdown
Member Author

No the parsing itself is not random, but it is not predictable which parser will find more messages, which messages it will find, and there are messages that are only found by either one parser.

Yeah sure, it depens on the implementation of the parser. That's not a concern.

Ideally, from a reliable message producer, current PR should never miss any packet. If that ever happens it's likely a bug on the message source or it's a malformed data (in this case there is no good end, it's just the less "worst end" we can have which is skipping frames).

P.s: Users who are concerned about system security or who require maximum input stability should enable signing to ensure their data remains secure and reliable.

@onur-ozkan

Copy link
Copy Markdown
Member Author

Any concerns?

@pv42

pv42 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

I would prefer actual parity with an existing parser. If there are immedeate concerns this seems ok but the file input test should maybe be adjusted with a file that contains a valid mavlink stream. Since the input stream does not jump back anymore we could also possbile get some performance improvements by removing the code for that at some point.

@onur-ozkan

Copy link
Copy Markdown
Member Author

Based on your comparison with C parser, this didn't lose any packet right? We can improve the test coverage later I think. We are already extremely behind in terms of test coverage so I wouldn't block this PR with that.

@pv42

pv42 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

I it still loses some packets that C finds, but just finds more that C misses. It would be possible to construct an example where C finds more packets.

@onur-ozkan

Copy link
Copy Markdown
Member Author

We could add a CI action that reads a mavlink stream using both rust-mavlink and the C mavlink implementation, then compares the results and fails if rust-mavlink misses a packet that the C implementation successfully parses. I am not sure when I will be able to work on that but it would likely provide more value than a file based test that just asserts arbitrary numbers (which are unconfirmed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:improvement Change that makes existing behavior better. priority:0 Urgent, highest priority. risk:breaking-change Change may break backward compatibility. risk:security May involve vulnerabilities. scope:core Affects the core library logic.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants