Skip to content

Conversation

@WofWca
Copy link
Collaborator

@WofWca WofWca commented Feb 1, 2026

Before this commit we would not start another getNextEvent()
before we have received a response for the previous one.
This commit makes a pool of getNextEvent() requests
so that the server always has a request
that it can readily respond to.

Measurements

I have tested this on Delta Chat desktop.
This seems to provide a measurable speedup in handling event bursts.

I did console.time() when we start
a BackendRemote.rpc.maybeNetwork() request
(which we do every time the main window is focused),
and a console.timeEnd() when we receive the first
"IDLE entering wait-on-remote state" entry, inside of
ipcBackend.on('json-rpc-message'.
For these measurements I also disabled request-response logging
with config['log-debug'] = false Edit: logJsonrpcConnection = false.

Each such maybeNetwork() resulted in ~1000 getNextEvent() responses.

With the original event loop (without a pool) the average time
based on 150 measurements was 1152.28 ms.
With the new event loop with a pool of 20
based on 150 measurements it was 774.58 ms.

That is 67.22% of the original average duration.

TODO:

  • Please check that this doesn't break stuff.
    I heard that there was some issue with Delta Chat bots
    trying to not miss a single event.

    If that's the case, or if in doubt, we can set
    the default pool size to 1 instead of 20,
    and only set it to 20 in Delta Chat Desktop.

Related:

Before this commit we would not start another `getNextEvent()`
before we have received a response for the previous one.
This commit makes a pool of `getNextEvent()` requests
so that the server always has a request
that it can readily respond to.

Measurements

I have tested this on Delta Chat desktop.
This seems to provide a measurable speedup in handling event bursts.

I did `console.time()` when we start
a `BackendRemote.rpc.maybeNetwork()` request
(which we do every time the main window is focused),
and a `console.timeEnd()` when we receive the first
"IDLE entering wait-on-remote state" entry, inside of
[`ipcBackend.on('json-rpc-message'`](https://github.com/deltachat/deltachat-desktop/blob/3846aef67c1644d8ec8b57b229329855a378ee4e/packages/target-electron/runtime-electron/runtime.ts#L52).
For these measurements I also disabled request-response logging
with `config['log-debug'] = false`.

Each such `maybeNetwork()` resulted in ~1000 `getNextEvent()` responses.

With the original event loop (without a pool) the average time
based on 150 measurements was 1152.28 ms.
With the new event loop with a pool of 20
based on 150 measurements it was 774.58 ms.

That is 67.22% of the original average duration.

Related:
- deltachat/deltachat-desktop#5282
@WofWca WofWca force-pushed the wofwca/perf-jsonrpc-event-loop-buffer branch from 264a1e3 to 7a7f040 Compare February 1, 2026 11:19
@link2xt
Copy link
Collaborator

link2xt commented Feb 3, 2026

I'm not sure what exactly the problem is and how this pooling improves performance. We already have concurrent calls working on the JSON-RPC level, tested with a Python test. Maybe there are some large responses and just reading them blocks reception of events?

The other option is adding a batched API that fetches multiple events at once (up to some limit) and returns them all at once. This way as soon as you get an event call response, you get all events that were in the queue at the moment of the call processing and not have to wait for every other response to be read. Getting all events at once makes it easier to deduplicate them, e.g. if you have 10 events in the queue about some chat being modified, you are only interested in the last one.

@WofWca
Copy link
Collaborator Author

WofWca commented Feb 3, 2026

I'm not sure what exactly the problem is and how this pooling improves performance.

IDK, but it just makes sense that there is a non-zero delay between the getNextEvent() call on the client and the time the Core receives that request. This is what this pooling eliminates. The Core now doesn't have to wait to receive a get_next_event request before it can send a response.

@link2xt
Copy link
Collaborator

link2xt commented Feb 3, 2026

IDK, but it just makes sense that there is a non-zero delay between the getNextEvent() call on the client and the time the Core receives that request.

I doubt there is any significant delay in passing the event from one process to another via stdio even with JSON serialization. More likely the difference comes from different scheduling, if you get events faster than heavy API responses and have some debouncing logic in desktop, you may do less work.

@WofWca
Copy link
Collaborator Author

WofWca commented Feb 4, 2026

So you're saying that I might have taken the measurements wrong? Good idea. But I just checked and saw that merely focusing the window, besides the get_next_event requests, only produces, ~2 get_connectivity requests which get responded to in ~10 ms, plus, obviously maybe_network.

image

I doubt there is any significant delay in passing the event from one process to another via stdio even with JSON serialization.

Even if the delay is 0.5ms, in the end it's multiplied by 1000 (for 1000 events).

@WofWca
Copy link
Collaborator Author

WofWca commented Feb 4, 2026

The other option is adding a batched API that fetches multiple events at once (up to some limit) and returns them all at once.

Don't want to go off-topic, but I had an idea of removing the event queue from the Core, and instead only sending the event if there is a pending get_next_event request. The client would then have to have an even bigger pool of get_next_event requests, to be able to handle bursts.

@link2xt
Copy link
Collaborator

link2xt commented Feb 4, 2026

I implemented batching at #7825, would be great if you can measure it in the same way. It also applies to python.

@link2xt
Copy link
Collaborator

link2xt commented Feb 4, 2026

Please check that this doesn't break stuff.

The problem with this approach is that it may introduce reordering:

  1. First get_next_event() call starts
  2. Second get_next_event() call starts
  3. Event A is emitted
  4. Event B is emitted
  5. Event A is consumed by second get_next_event() call
  6. Event B is consumed by first get_next_event() call
  7. First get_next_event() call returns, UI processes event B
  8. Second get_next_event() call returns, UI processes event A

So far event loops always delivered events in the order they are produced, tests depend on this. There are not many JS tests, but in general this kind of reordering may make tests flaky and bots may assume that events are always generated in the same order, e.g. if you receive two messages then event about the first message should be delivered before the event about the second message.

@WofWca
Copy link
Collaborator Author

WofWca commented Feb 4, 2026

Ah, good catch. Our get_next_event doesn't make promises about the order. It's basically up to the async runtime IMU.

async fn get_next_event(&self) -> Result<Event> {
self.event_emitter
.recv()
.await
.map(|event| event.into())
.context("event channel is closed")
}

core/src/events.rs

Lines 72 to 82 in 53ebf2c

pub async fn recv(&self) -> Option<Event> {
let mut lock = self.0.lock().await;
match lock.recv().await {
Err(async_broadcast::RecvError::Overflowed(n)) => Some(Event {
id: 0,
typ: EventType::EventChannelOverflow { n },
}),
Err(async_broadcast::RecvError::Closed) => None,
Ok(event) => Some(event),
}
}

OK, so this MR is not ready then I'd say.

I will try your MR.

@WofWca WofWca closed this Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants