refactor(proto): do not set timers in poll_transmit#721
Conversation
We had one case where we are setting timers in poll_transmit, this happened to work around a bug in noq that was not waking up proto in a timely manner. However the code style is very much not following this. In general poll_transmit only tries to only concern itself with building packets that need to be build and contains the bare minimum of application logic that we can manage. Other timers which need to be set when packets are sent are also set at the time the packets are scheduled. The time in between should be negligible if the driver is functioning correctly. Furthermore path challenges are special in that there can be several reasons why they are being sent. Only when they are being sent is this known, and poll_transmit having to figure out what the reason was so it can set the right timers is triky. Bringing this separation back will also allow us to schedule on-path path challenges for liveness checks for example, without needing strict validation, something RFC9000 suggests and is a nice thing to have available in our toolbox should we have a use. Fixes #686
|
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/noq/pr/721/docs/noq/ Last updated: 2026-06-25T14:33:57Z |
Performance Comparison Report
|
| Scenario | noq | upstream | Delta | CPU (avg/max) |
|---|---|---|---|---|
| large-single | 5440.2 Mbps | 8001.4 Mbps | -32.0% | 93.9% / 98.1% |
| medium-concurrent | 5409.3 Mbps | 7836.7 Mbps | -31.0% | 94.4% / 98.8% |
| medium-single | 4453.8 Mbps | 4718.5 Mbps | -5.6% | 96.2% / 147.0% |
| small-concurrent | 3734.7 Mbps | 5315.7 Mbps | -29.7% | 92.9% / 102.0% |
| small-single | 3335.1 Mbps | 4758.3 Mbps | -29.9% | 92.1% / 102.0% |
Netsim Benchmarks (network simulation)
| Condition | noq | upstream | Delta |
|---|---|---|---|
| ideal | 3058.5 Mbps | 4005.8 Mbps | -23.6% |
| lan | 796.3 Mbps | 805.1 Mbps | -1.1% |
| lossy | 69.9 Mbps | 69.8 Mbps | ~0% |
| wan | 83.8 Mbps | 83.8 Mbps | ~0% |
Summary
noq is 25.9% slower on average
c4f853b85429cf61b0b04f57702c1edb257b0c11 - artifacts
Raw Benchmarks (localhost)
| Scenario | noq | upstream | Delta | CPU (avg/max) |
|---|---|---|---|---|
| large-single | 5332.9 Mbps | 7924.7 Mbps | -32.7% | 91.9% / 97.0% |
| medium-concurrent | 5545.7 Mbps | 7804.5 Mbps | -28.9% | 93.8% / 99.0% |
| medium-single | 3352.6 Mbps | 4645.4 Mbps | -27.8% | 93.9% / 101.0% |
| small-concurrent | 3779.1 Mbps | 5329.3 Mbps | -29.1% | 98.4% / 153.0% |
| small-single | 3439.0 Mbps | 4734.7 Mbps | -27.4% | 89.5% / 97.3% |
Netsim Benchmarks (network simulation)
| Condition | noq | upstream | Delta |
|---|---|---|---|
| ideal | 2989.1 Mbps | 4050.6 Mbps | -26.2% |
| lan | 782.4 Mbps | 810.3 Mbps | -3.4% |
| lossy | 69.8 Mbps | 55.9 Mbps | +25.0% |
| wan | 83.8 Mbps | 83.8 Mbps | ~0% |
Summary
noq is 28.4% slower on average
4e5eb9b8208e288087361ae22513981ad0c10695 - artifacts
Raw Benchmarks (localhost)
| Scenario | noq | upstream | Delta | CPU (avg/max) |
|---|---|---|---|---|
| large-single | 5299.2 Mbps | 7914.9 Mbps | -33.0% | 92.9% / 98.0% |
| medium-concurrent | 5460.9 Mbps | 7807.9 Mbps | -30.1% | 94.5% / 98.4% |
| medium-single | 3792.2 Mbps | 4749.6 Mbps | -20.2% | 93.6% / 100.0% |
| small-concurrent | 3775.4 Mbps | 5281.1 Mbps | -28.5% | 99.9% / 152.0% |
| small-single | 3412.6 Mbps | 4847.0 Mbps | -29.6% | 98.6% / 153.0% |
Netsim Benchmarks (network simulation)
| Condition | noq | upstream | Delta |
|---|---|---|---|
| ideal | 2997.8 Mbps | 4064.7 Mbps | -26.2% |
| lan | 782.4 Mbps | 810.3 Mbps | -3.4% |
| lossy | 69.9 Mbps | 69.8 Mbps | ~0% |
| wan | 83.8 Mbps | 83.8 Mbps | ~0% |
Summary
noq is 27.9% slower on average
divagant-martian
left a comment
There was a problem hiding this comment.
Sorry but I don't think this is a desirable change. Style should not prevail over correctness. Regardless of reason, a timer for a response to an action should be set when the action is performed, not when it's scheduled. For sending frames, there are multiple reasons why one might not be immediately be sent beside bugs. Pacing? queued frames of higher priority? It doesn't matter, the principle is the same, a timer should start only once what it's timing has begun, not before.
Trying to find all the finicky reasons surrounding making this change sound, and worse, attempting to maintain it, only makes for brittle code
Description
We had one case where we are setting timers in poll_transmit, this
happened to work around a bug in noq that was not waking up proto in a
timely manner. However the code style is very much not following
this. In general poll_transmit only tries to only concern itself with
building packets that need to be build and contains the bare minimum
of application logic that we can manage. Other timers which need to be
set when packets are sent are also set at the time the packets are
scheduled. The time in between should be negligible if the driver is
functioning correctly.
Furthermore path challenges are special in that there can be several
reasons why they are being sent. Only when they are being sent is this
known, and poll_transmit having to figure out what the reason was so
it can set the right timers is triky. Bringing this separation back
will also allow us to schedule on-path path challenges for liveness
checks for example, without needing strict validation, something
RFC9000 suggests and is a nice thing to have available in our toolbox
should we have a use.
Fixes #686
Breaking Changes
none
Notes & open questions
This replaces 3 tests with one new one. I believe this together with
the open_path_validation_fails_* tests ensures the timers are set as
intended. The previous manual triggering of path challenges had a very
tight coupling between the code under test and how to trigger it. So
much that I felt directly adopting it would just be re-implementing
the same logic again.
The new test opens a new path and ensures lost challenges are reset
appropriately, which is exactly what needs to be tested. On an
existing path we never send new challenges, so those tests were very
artificial.
Change checklist