Skip to content

Remove custom till_deferred_has_result(...) in favor of HomeserverTestCase.wait_on_thread(...) to drive async Rust (Tokio runtime/thread pool)#19867

Closed
MadLittleMods wants to merge 6 commits into
developfrom
madlittlemods/remove-till_deferred_has_result
Closed

Remove custom till_deferred_has_result(...) in favor of HomeserverTestCase.wait_on_thread(...) to drive async Rust (Tokio runtime/thread pool)#19867
MadLittleMods wants to merge 6 commits into
developfrom
madlittlemods/remove-till_deferred_has_result

Conversation

@MadLittleMods

@MadLittleMods MadLittleMods commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Remove custom till_deferred_has_result(...) in favor of HomeserverTestCase.wait_on_thread(...) to drive async Rust (Tokio runtime/thread pool).

Spawning from adding some more async Rust things in #19846 and noticing that we have an existing pattern to use instead of the custom till_deferred_has_result(...) that has crept in to a few files.

Dev notes

std::thread::sleep(std::time::Duration::from_secs(1));

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

Comment thread tests/unittest.py
events.USE_FROZEN_DICTS = False

def wait_on_thread(self, deferred: Deferred, timeout: int = 10) -> None:
def wait_on_thread(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait_on_thread(...) was first introduced in matrix-org/synapse#5475. It's unclear from that PR itself but I'm guessing this is because of the defer_to_thread(...) usage in the media repo.

Comment thread tests/unittest.py
def wait_on_thread(self, deferred: Deferred, timeout: int = 10) -> None:
def wait_on_thread(
self,
awaitable: Awaitable[TV],

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to accept any awaitable (Deferred or Coroutine)

Comment thread tests/unittest.py
time.sleep(0.01)
# Advance the Twisted reactor as the thread may have scheduled something on
# the reactor to run (like `reactor.callFromThread(...)`)
self.reactor.advance(0)

@MadLittleMods MadLittleMods Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this from advancing time by 0.01 each iteration to 0. We shouldn't need to advance the Twisted reactor time at all bolstered by the fact that the tests still pass.

Comment thread tests/unittest.py
Comment on lines +515 to +517
# Advance the Twisted reactor as the thread may have scheduled something on
# the reactor to run (like `reactor.callFromThread(...)`)
self.reactor.advance(0)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the order to advance the Twisted reactor after we give some time to other threads to make some progress. The thinking is that the thread can do some work, potentially scheduling some things on reactor, and then we unblock that.

Since we're iterating in a loop, the order probably doesn't matter that much but perhaps this makes more sense.

Comment on lines 859 to 863
requester = self.get_success(
self.till_deferred_has_result(
self._auth.get_user_by_access_token("some_token")
)
# We have to wait for the async Rust HTTP client (running on the Tokio
# thread pool) to do its thing (see `create_deferred(...)` usage)
self.wait_on_thread(self._auth.get_user_by_access_token("some_token"))
)

@MadLittleMods MadLittleMods Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, not very satisfied with this API shape. I think it's good enough as another iteration. Perhaps a better name?

Ideally, we could just do self.get_success(self._auth.get_user_by_access_token("some_token"))) which would drive things to completion regardless of the kind of work necessary (Python or Rust).

But adding real sleeps to get_success(...) for everything would result in slowing down the entire test suite.

We could potentially add a new attribute to HomeserverTestCase like drive_work_on_threads (like the existing needs_threadpool) which would conditionally sleep for real in get_success(...). This is half-decent but having to realize this obscure detail makes things work kinda sucks.

For example, this is how needs_threadpool is defined on a test case basis:

class MediaStorageTests(unittest.HomeserverTestCase):
needs_threadpool = True

Or even better if we could automatically detect when there is work to be done on the Tokio thread pool and do the real sleep loop. Probably have to detect this by using the Tokio RuntimeMetrics. I think it would be better to explore this as a follow-up though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think that we can actually change the sleep to be time.sleep(0), since that is apparently enough to signal that other threads should run. I've tried it locally and the tests tests/handlers/test_oauth_delegation.py tests/synapse_rust/test_http_client.py seem to pass?

At which point this could be added to get_success I think?

@MadLittleMods MadLittleMods Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.sleep(0) does seem to work 👍. If I look at the docs for time.sleep(...) ("Suspend execution of the calling thread [...]"), it also mentions that you can use os.sched_yield() ("Voluntarily relinquish the CPU.") which also works.

https://discuss.python.org/t/time-sleep-0-yield-behaviour/27185/5 mentions that time.sleep(0) releases the GIL while os.sched_yield() doesn't. But I think that is now fixed (looks like it was part of Python 3.10): python/cpython#96078

It's unclear what's better, constant context switching or just allowing some time. I would have assumed that constant context switching would have some impact but the results from #19871 show that it doesn't seem to slow things down in a noticeable way.

Comment on lines +883 to +885
# We have to wait for the async Rust HTTP client (running on the Tokio
# thread pool) to do its thing (see `create_deferred(...)` usage)
self.wait_on_thread(self._auth.get_user_by_access_token("some_token"))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is repetitive but it's nice to know why the wait_on_thread(...) complication is being used here.

…s not subscriptable`

```
  File "/home/runner/work/synapse/synapse/tests/unittest.py", line 481, in HomeserverTestCase
    ) -> Deferred[TV]:
builtins.TypeError: 'type' object is not subscriptable
```
@MadLittleMods MadLittleMods changed the title Remove custom till_deferred_has_result(...) in favor of HomeserverTestCase.wait_on_thread(...) Remove custom till_deferred_has_result(...) in favor of HomeserverTestCase.wait_on_thread(...) to drive async Rust (Tokio runtime/thread pool) Jun 18, 2026
@MadLittleMods MadLittleMods marked this pull request as ready for review June 18, 2026 23:27
@MadLittleMods MadLittleMods requested a review from a team as a code owner June 18, 2026 23:27
@MadLittleMods

Copy link
Copy Markdown
Contributor Author

Closing in favor of #19871

MadLittleMods added a commit that referenced this pull request Jul 2, 2026
…nc Rust (Tokio runtime/thread pool) (#19871)

This means you can use `get_success(...)` anywhere regardless
of what kind of work needs to be done.

Spawning from adding some more async Rust things in
#19846 and wanting something
more standard instead of the custom `till_deferred_has_result(...)` that
has crept in to a few files.

Alternative to #19867 spurred
on by [this
comment](#19867 (comment))
from @erikjohnston


### How does this work?

Previously, `get_success(...)` just ran in a hot-loop advancing the
Twisted reactor clock which didn't give any time for other threads to do
some work or acquire the GIL if necessary (whenever there is a hand-off
from Rust to Python, we need the GIL).

Now, `get_success(...)` loops until we see a result (until we hit the
~0.1s real-time timeout). In the loop, we call
[`time.sleep(0)`](https://docs.python.org/3/library/time.html#time.sleep)
which will "Suspend execution of the calling thread [...]" (CPU and GIL)
to allow other threads to do some work. Then like before, we advance the
Twisted reactor clock to run any scheduled callbacks which includes
anything the other threads may have scheduled.


### Does this slow down the entire test suite?

Seems just as fast as before. There is minutes variance in what we had
before and after but both are within the same range of each other.

(see PR for actual before/after timings)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants