Skip to content

refactor(proto): do not set timers in poll_transmit#721

Draft
flub wants to merge 3 commits into
mainfrom
flub/no-timers-in-poll-transmit
Draft

refactor(proto): do not set timers in poll_transmit#721
flub wants to merge 3 commits into
mainfrom
flub/no-timers-in-poll-transmit

Conversation

@flub

@flub flub commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Description

We had one case where we are setting timers in poll_transmit, this
happened to work around a bug in noq that was not waking up proto in a
timely manner. However the code style is very much not following
this. In general poll_transmit only tries to only concern itself with
building packets that need to be build and contains the bare minimum
of application logic that we can manage. Other timers which need to be
set when packets are sent are also set at the time the packets are
scheduled. The time in between should be negligible if the driver is
functioning correctly.

Furthermore path challenges are special in that there can be several
reasons why they are being sent. Only when they are being sent is this
known, and poll_transmit having to figure out what the reason was so
it can set the right timers is triky. Bringing this separation back
will also allow us to schedule on-path path challenges for liveness
checks for example, without needing strict validation, something
RFC9000 suggests and is a nice thing to have available in our toolbox
should we have a use.

Fixes #686

Breaking Changes

none

Notes & open questions

This replaces 3 tests with one new one. I believe this together with
the open_path_validation_fails_* tests ensures the timers are set as
intended. The previous manual triggering of path challenges had a very
tight coupling between the code under test and how to trigger it. So
much that I felt directly adopting it would just be re-implementing
the same logic again.

The new test opens a new path and ensures lost challenges are reset
appropriately, which is exactly what needs to be tested. On an
existing path we never send new challenges, so those tests were very
artificial.

Change checklist

  • Self-review.
  • Tests if relevant.

We had one case where we are setting timers in poll_transmit, this
happened to work around a bug in noq that was not waking up proto in a
timely manner. However the code style is very much not following
this. In general poll_transmit only tries to only concern itself with
building packets that need to be build and contains the bare minimum
of application logic that we can manage. Other timers which need to be
set when packets are sent are also set at the time the packets are
scheduled. The time in between should be negligible if the driver is
functioning correctly.

Furthermore path challenges are special in that there can be several
reasons why they are being sent. Only when they are being sent is this
known, and poll_transmit having to figure out what the reason was so
it can set the right timers is triky. Bringing this separation back
will also allow us to schedule on-path path challenges for liveness
checks for example, without needing strict validation, something
RFC9000 suggests and is a nice thing to have available in our toolbox
should we have a use.

Fixes #686
@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/noq/pr/721/docs/noq/

Last updated: 2026-06-25T14:33:57Z

@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown

Performance Comparison Report

f9931ee292ea66f2c008c701f45bd15b735326db - artifacts

Raw Benchmarks (localhost)

Scenario noq upstream Delta CPU (avg/max)
large-single 5440.2 Mbps 8001.4 Mbps -32.0% 93.9% / 98.1%
medium-concurrent 5409.3 Mbps 7836.7 Mbps -31.0% 94.4% / 98.8%
medium-single 4453.8 Mbps 4718.5 Mbps -5.6% 96.2% / 147.0%
small-concurrent 3734.7 Mbps 5315.7 Mbps -29.7% 92.9% / 102.0%
small-single 3335.1 Mbps 4758.3 Mbps -29.9% 92.1% / 102.0%

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal 3058.5 Mbps 4005.8 Mbps -23.6%
lan 796.3 Mbps 805.1 Mbps -1.1%
lossy 69.9 Mbps 69.8 Mbps ~0%
wan 83.8 Mbps 83.8 Mbps ~0%

Summary

noq is 25.9% slower on average

---
c4f853b85429cf61b0b04f57702c1edb257b0c11 - artifacts

Raw Benchmarks (localhost)

Scenario noq upstream Delta CPU (avg/max)
large-single 5332.9 Mbps 7924.7 Mbps -32.7% 91.9% / 97.0%
medium-concurrent 5545.7 Mbps 7804.5 Mbps -28.9% 93.8% / 99.0%
medium-single 3352.6 Mbps 4645.4 Mbps -27.8% 93.9% / 101.0%
small-concurrent 3779.1 Mbps 5329.3 Mbps -29.1% 98.4% / 153.0%
small-single 3439.0 Mbps 4734.7 Mbps -27.4% 89.5% / 97.3%

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal 2989.1 Mbps 4050.6 Mbps -26.2%
lan 782.4 Mbps 810.3 Mbps -3.4%
lossy 69.8 Mbps 55.9 Mbps +25.0%
wan 83.8 Mbps 83.8 Mbps ~0%

Summary

noq is 28.4% slower on average

---
4e5eb9b8208e288087361ae22513981ad0c10695 - artifacts

Raw Benchmarks (localhost)

Scenario noq upstream Delta CPU (avg/max)
large-single 5299.2 Mbps 7914.9 Mbps -33.0% 92.9% / 98.0%
medium-concurrent 5460.9 Mbps 7807.9 Mbps -30.1% 94.5% / 98.4%
medium-single 3792.2 Mbps 4749.6 Mbps -20.2% 93.6% / 100.0%
small-concurrent 3775.4 Mbps 5281.1 Mbps -28.5% 99.9% / 152.0%
small-single 3412.6 Mbps 4847.0 Mbps -29.6% 98.6% / 153.0%

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal 2997.8 Mbps 4064.7 Mbps -26.2%
lan 782.4 Mbps 810.3 Mbps -3.4%
lossy 69.9 Mbps 69.8 Mbps ~0%
wan 83.8 Mbps 83.8 Mbps ~0%

Summary

noq is 27.9% slower on average

@divagant-martian divagant-martian left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but I don't think this is a desirable change. Style should not prevail over correctness. Regardless of reason, a timer for a response to an action should be set when the action is performed, not when it's scheduled. For sending frames, there are multiple reasons why one might not be immediately be sent beside bugs. Pacing? queued frames of higher priority? It doesn't matter, the principle is the same, a timer should start only once what it's timing has begun, not before.

Trying to find all the finicky reasons surrounding making this change sound, and worse, attempting to maintain it, only makes for brittle code

@n0bot n0bot Bot added this to iroh Jun 24, 2026
@github-project-automation github-project-automation Bot moved this to 🚑 Needs Triage in iroh Jun 24, 2026
@github-project-automation github-project-automation Bot moved this from 🚑 Needs Triage to 🏗 In progress in iroh Jun 24, 2026
@flub flub marked this pull request as draft June 25, 2026 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🏗 In progress

Development

Successfully merging this pull request may close these issues.

Do not set timers inside poll_transmit

2 participants