Skip to content

Rust database access via Python database connection pool v2#19878

Open
MadLittleMods wants to merge 34 commits into
developfrom
madlittlemods/rust-db-access-using-python-db-pool2
Open

Rust database access via Python database connection pool v2#19878
MadLittleMods wants to merge 34 commits into
developfrom
madlittlemods/rust-db-access-using-python-db-pool2

Conversation

@MadLittleMods

@MadLittleMods MadLittleMods commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Based on the explorations done in #19824 and #19846,


Rust database access via Python database connection pool

This is a stepping stone before we can go full Rust everywhere. We're providing a generic interface as we want database access to work in Synapse and synapse-rust-apps. In synapse-rust-apps, we will use a tokio-postgres based database connection pool so it's full Rust.

We want to avoid the situation where we have two database connection pools (one for Python, one for Rust) as we've run into connection exhaustion problems on Matrix.org before.

Why runInteraction(...)?

Using the same runInteraction pattern that we already have in Synapse means we can port over existing Synapse code/endpoints without much thought. But this pattern also makes sense because we want1 transactions to have repeatable-read isolation (easy to think about, less foot-guns). Having everything thappen in a function callback means we can do retries for serialization/deadlock errors.

When an application receives this error message, it should abort the current transaction and retry the whole transaction from the beginning. The second time through, the transaction will see the previously-committed change as part of its initial view of the database, so there is no logical conflict in using the new version of the row as the starting point for the new transaction's update.

Note that only updating transactions might need to be retried; read-only transactions will never have serialization conflicts.

-- https://www.postgresql.org/docs/current/transaction-iso.html#XACT-REPEATABLE-READ

As a note, this strategy is less of an impedance mismatch (aligns more closely) with Synapse so the glue code for the python_db_pool should also be simpler.

How does this interact with logcontext (LoggingContext)?

See docs on log contexts for more background.

We already support normal logging from Rust -> Python with pyo3-log and log but as soon as we pass a thread boundary, everything is logged against the sentinel log context. Normally, we want logs and CPU/DB usage correlated with the request that spawned the work.

You can see how I took a stab at fixing this in #19846 by capturing the logcontext in a Tokio task local and re-activating as necessary. For example, in that PR, I reactivated the logcontext in run_python_awaitable(...) which we use to call runInteraction(...) from the Rust side which means all of the database usage is correlated with the request as expected. It also means any log:info!(...) done in run_interaction(...) is correlated correctly. But there needs to be a better story for when you want to log everywhere else.

I haven't explored tracking CPU usage on the Rust side.

I've left all of this out of this PR as I think it will be better to tackle this as a dedicated follow-up. For example, I'm thinking about instead creating a new LoggingContext with the parent_context set to the calling context and try to avoid needing to call set_current_context(...) on the Python side where possible (like tracking CPU).

Testing strategy

Added some tests that exercise some async Rust handlers for the /versions endpoint:

SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.rest.client.test_versions.VersionsTestCase

Real-world:

  1. poetry run synapse_homeserver --config-path homeserver.yaml
  2. GET http://localhost:8008/_matrix/client/versions

Dev notes

Example INSERT
self.db_pool
    .run_interaction("asdf", move |txn| {
        async move {
            let _rows = txn
                .query(
                    "INSERT INTO delayed_events (delay_id, user_localpart, delay, send_ts, room_id, event_type, content, is_processed) \
                     VALUES (?, ?, ?, ?, ?, ?, ?, ?)",
                    &["delay_id", "user_localpart", "10", "999991782342348349", "!room", "m.room.message", "content", "0"],
                )
                .await?;

            Ok(())
        }
        .boxed()
    })
    .await?;

Todo

  • Audit panic usage (unwrap, expect, panic!, unreachable!)

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

Footnotes

  1. To note: Ideally, we'd want the least isolation possible but the problem is that there is no tooling to yell at you when your queries/logic is wrong so repeatable-read isolation is a great balance.

`/versions` is a good candidate as its is the simplest endpoint that uses the database.

We use `tests/rest/client/test_versions.py` as a sanity check that all of this
new Rust database access works.
Comment thread changelog.d/19878.misc
@@ -0,0 +1 @@
Allow Rust code to have database access via Python database connection pool.

@MadLittleMods MadLittleMods Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest reviewing this commit by commit.

It first introduces the database interface structure, then Store usage, and then finally refactoring the /versions endpoint to be handled in Rust.

The /versions endpoint is the simplest endpoint I could find that still had some database access. Hopefully the refactor on /versions isn't that controversial as it's not really the point of this PR. We can always remove it from this PR but it's just here as a sanity check that all of this works.

I've looked over and refined all of this but feel free to call out whatever as I might be cargo-culting too much from the LLM splat that this is based on, #19846 (that PR is a combination of LLM splats and manual refinement)

Comment on lines +43 to +46
# XXX: We must create the Rust HTTP client before we call `reactor.run()` below.
# Twisted's `MemoryReactor` doesn't invoke `callWhenRunning` callbacks if it's
# already running and we rely on that to start the Tokio thread pool in Rust. In
# the future, this may not matter, see https://github.com/twisted/twisted/pull/12514

@MadLittleMods MadLittleMods Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an update, twisted/twisted#12514 is finally part of a Twisted release 26.4.0 (2026-05-11). If we updated our Twisted version, we could probably get rid of all of this ugliness.

But that may be a few years away given our deprecation policy considers a "no-brainer" upgrade once it's available in both the latest Debian Stable (currently Twisted 24.11.0) and Ubuntu LTS repositories (currently Twisted 25.5.0)

Comment thread rust/src/deferred.rs
Comment on lines +282 to +283
// We can't check this here because of circular import issues
// logging_context_module(py)?;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this will get fixed by #19876 (comment)

Comment thread synapse/config/room.py
RoomCreationPreset.TRUSTED_PRIVATE_CHAT,
RoomCreationPreset.PUBLIC_CHAT,
]
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made these a set to better represent what it is and more efficient lookups. Touched this because I had to model it in SynapseHomeServerConfig on the Rust side


/// Experimental features the server supports
#[derive(Serialize, Debug, Clone)]
pub struct UnstableFeatureMap {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, this is a manual translation from synapse/rest/client/versions.py#L105-L213 (not a prompt and pray it's right). I also already asked an LLM to review and spot any mistakes.

@@ -0,0 +1,343 @@
/*

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, you can also see what a Rust tokio-postgres based DatabasePool looks like in #19846

It's completely unscrutinized but the point is just that it's possible and we can care about it once we start using it in synapse-rust-apps.

Comment thread rust/src/deferred.rs
Comment on lines +205 to +209
// We fire-and-forget using `run_in_background`. Re-using
// `run_in_background` also makes sure the awaitable gets run with the
// current logcontext while following the logcontext rules.
//
// FIXME: Currently runs in the sentinel logcontext because we don't manage it here

@MadLittleMods MadLittleMods Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see how handling logcontext might look like in #19846 but per my explanation in the PR description here, it's being left out in favor of a follow-up PR.

logger = logging.getLogger(__name__)


class VersionsTestCase(unittest.HomeserverTestCase):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm relying on these tests to sanity check that the whole Rust database access thing works.

Is this good enough? Should I be writing tests on the Rust side? Seems hard/weird as our only DatabasePool implementation requires Synapse to be running. How do we make that happen?

I wish I could write some specific Rust db_pool.run_interaction(...)/txn.query(...) scenarios and test results better but generally I guess this is good enough. We could expose that function and run it from here 🤷

Comment thread tests/server.py
Comment thread tests/server.py
@@ -301,15 +302,59 @@ def transport(self) -> "FakeChannel":
def await_result(self, timeout_ms: int = 1000) -> None:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are based on the same decisions being made in #19871

@MadLittleMods MadLittleMods Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to cause our trial tests not to finish in CI 🤔

Even tried these changes by themselves in #19879 which also reproduces.

I'll have to figure it out tomorrow but I think this PR is still good to review otherwise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split off to #19879

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the tests will fail here until we figure out #19879

Comment thread rust/src/handlers/mod.rs
Comment on lines +31 to +33
struct RustHandlers {
versions: Py<versions::VersionsHandler>,
}

@MadLittleMods MadLittleMods Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinions on self.rust_handlers.versions vs exposing handlers individually?

If this is the pattern we like, we should also move rust/src/rendezvous / rust/src/msc4388_rendezvous here as well (follow-up PR).


#[pyclass]
pub struct VersionsHandler {
pub global_unstable_feature_map: Arc<UnstableFeatureMap>,

@MadLittleMods MadLittleMods Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the UnstableFeatureMap abstraction separate from SynapseHomeServerConfig as we potentially want to re-use these handlers as part of synapse-rust-apps, Rusty homeserver, etc. We don't want to bog it down with Synapse specific things.

"v1.10".to_string(),
"v1.11".to_string(),
"v1.12".to_string(),
]),

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this actually generic (usable outside of Synapse), we probably want to pass this in as well but I've left it for now ⏩

@MadLittleMods MadLittleMods marked this pull request as ready for review June 25, 2026 01:59
@MadLittleMods MadLittleMods requested a review from a team as a code owner June 25, 2026 01:59
@erikjohnston erikjohnston self-assigned this Jun 25, 2026
Comment on lines +98 to +102
// ```
// db_pool.run_interaction("description", async move |txn| {
// /* do stuff with txn */
// })
// ```

@MadLittleMods MadLittleMods Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of thing may be possible. It seems like sea-orm accomplished this somehow:

In any case, we can address this in a follow-up PR.

@erikjohnston erikjohnston left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecturally this looks sensible. Other than deciding when it's appropriate to panic, the rest are either some simplifications, stylistic nits or comments.

"v1.9".to_string(),
"v1.10".to_string(),
"v1.11".to_string(),
"v1.12".to_string(),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be easier if these were &'static str instead

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave this as-is for now. We probably need to pass these in anyway per #19878 (comment)

Comment thread rust/src/handlers/versions.rs Outdated
Comment thread rust/src/storage/db/python_db_pool.rs Outdated
Comment thread rust/src/storage/db/python_db_pool.rs Outdated
Comment thread rust/src/storage/db/python_db_pool.rs Outdated
Comment thread rust/src/storage/db/python_db_pool.rs Outdated
.expect("`database_engine` type must have a name")
.to_str()
.expect("`database_engine` type name must be valid UTF-8")
.to_owned();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we should be using panics outside of very specific cases, and can end up with much worse UX depending on the consumer if they're hit. E.g. in synapse-rust-apps this will cause us to abort the HTTP connection rather than return a 500-error I believe?

Plus in this instance we're making things much more verbose, compared with: let name = txn_py.getattr("database_engine")?.get_type().name()?;

I don't really see any upside in using relying on panic, given we're returning a PyErr throughout the call chain anyway.

If we wanted to better catch e.g. people renaming database_engine, then we should do this lookup during startup so that we catch it early (and can abort startup).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been updated to compare the actual Python types and return normal errors ✅

If we wanted to better catch e.g. people renaming database_engine, then we should do this lookup during startup so that we catch it early (and can abort startup).

If there was a good way to connect the lookups (the one at start-up to the one we have to do at runtime), that sounds great to me but I don't see a way to do it here.


In terms of UX in Synapse, both give a 500 error response when used in async Rust code.

For panic, we don't get any details in the logs, just synapse.synapse_rust.http_client.RustPanicError: unknown error
2026-07-03 17:07:07-0500 [-] synapse.http.server - 147 - ERROR - GET-6 - Failed handle request via 'VersionsRestServlet': <SynapseRequest at 0x7fba6f73fc50 method='GET' uri='/_matrix/client/versions' clientproto='1.1' site='test'>
	Traceback (most recent call last):
	  File "/home/eric/Documents/github/element/synapse/synapse/http/server.py", line 335, in _async_render_wrapper
	    callback_return = await self._async_render(request)
	                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/eric/Documents/github/element/synapse/synapse/http/server.py", line 576, in _async_render
	    callback_return = await raw_callback_return
	                      ^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/eric/Documents/github/element/synapse/synapse/rest/client/versions.py", line 81, in on_GET
	    versions_response_body = await self.rust_handlers.versions.get_versions(user_id)
	                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/eric/.cache/pypoetry/virtualenvs/matrix-synapse-xCtC9ulO-py3.14/lib/python3.14/site-packages/twisted/internet/defer.py", line 1187, in __iter__
	    yield self
	synapse.synapse_rust.http_client.RustPanicError: unknown error
For error: we get to see the error message in the logs: RuntimeError: Failed to build /versions response: test error
2026-07-03 17:07:57-0500 [-] synapse.http.server - 147 - ERROR - GET-6 - Failed handle request via 'VersionsRestServlet': <SynapseRequest at 0x7fb1f8c23c50 method='GET' uri='/_matrix/client/versions' clientproto='1.1' site='test'>
	Traceback (most recent call last):
	  File "/home/eric/Documents/github/element/synapse/synapse/http/server.py", line 335, in _async_render_wrapper
	    callback_return = await self._async_render(request)
	                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/eric/Documents/github/element/synapse/synapse/http/server.py", line 576, in _async_render
	    callback_return = await raw_callback_return
	                      ^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/eric/Documents/github/element/synapse/synapse/rest/client/versions.py", line 81, in on_GET
	    versions_response_body = await self.rust_handlers.versions.get_versions(user_id)
	                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/eric/.cache/pypoetry/virtualenvs/matrix-synapse-xCtC9ulO-py3.14/lib/python3.14/site-packages/twisted/internet/defer.py", line 1187, in __iter__
	    yield self
	RuntimeError: Failed to build /versions response: test error

Overall, for either, I wish we had actual stack traces to the Rust itself.

Panic doesn't actually stop Synapse so there is no benefit there (fail fast so developer can notice). We get more information from errors currently (although I assume we could also make it better for panics).

Error wins ⏩

Comment thread rust/src/storage/store.rs Outdated
Comment thread rust/src/deferred.rs Outdated
Comment thread tests/rest/client/test_versions.py Outdated
Comment thread rust/src/config/mod.rs Outdated
}

#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub enum RoomCreationPreset {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of the file could all use some docstrings

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved RoomCreationPreset to its own file so its not muddying up the config structure and added a docstring.

I think only SynapseHomeServerConfig deserves a docstring (which it already has). The rest don't really correspond to anything directly. They are just mimicking the config structure.

For example, you could think that RoomConfig in Rust corresponds to RoomConfig in Synapse but that's not really true.

Comment thread rust/src/handlers/versions.rs Outdated
Comment thread rust/src/storage/store.rs
MSC3575,
#[serde(rename = "msc4222")]
MSC4222,
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using serde for this, feels cleaner to do:

impl PerUserExperimentalFeature {
    fn as_str(&self) -> &'static str {
        match self {
            Self::MSC3881 => "msc3881",
            Self::MSC3575 => "msc3575",
            Self::MSC4222 => "msc4222",
        }
    }
}

especially since we don't Deserialize

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I'd use some string enum type but I don't want to add another crate for this.

This is the same pattern I've used in SBG and TI-Messenger Proxy 2 3 4.

If this was composed as part of some larger structure that we Serialize, I would most definitely prefer to rely on serde to have a single source of truth. In this case, it gets serialized on its own so it could be more applicable to go with the plain strategy. My current leaning is still just to use the familiar (to me) pattern but I would be willing to update (just poke again).

…-db-pool2

Conflicts:
	synapse/rest/client/versions.py
So we can stop muddying up the actual config
Comment thread rust/src/storage/store.rs
/// Checks whether a given feature is enabled/disabled for this user
///
/// If there is no entry, returns None
pub async fn is_feature_enabled_for_user(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that is different from the Python is_feature_enabled(...) is that we don't have any cached lookups on the Rust version

@cached()
async def is_feature_enabled(
self, user_id: str, feature: "ExperimentalFeature"
) -> bool:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants