Skip to content

Merge ered_client and ered_connection#157

Merged
zuiderkwast merged 9 commits intoEricsson:single-process-clientfrom
zuiderkwast:single-process-client
Feb 27, 2026
Merged

Merge ered_client and ered_connection#157
zuiderkwast merged 9 commits intoEricsson:single-process-clientfrom
zuiderkwast:single-process-client

Conversation

@zuiderkwast
Copy link
Collaborator

@zuiderkwast zuiderkwast commented Feb 23, 2026

  • Get rid of the two processes in ered_connection and duplicated logic.
  • Use active mode.
  • Response timeout handling is existing but incomplete.
  • Batching is not fixed. Only sends one command at a time.
  • Sending may block. The client process is unresponsive when it happens.

Batching and non-blocking send are to be fixed in follow-up PRs.

* Get rid of the two processes in ered_connection and duplicated logic.
* Use active mode.
* Response timeout handling is existing but incomplete.
* Batching is not fixed. Only sends one command at a time.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
* Fix unit tests for ered_client.
* Fix spelling and formatting.
* Chanage ered_parser API to return parse errors instead of throwing them as exceptions.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
@zuiderkwast zuiderkwast marked this pull request as ready for review February 23, 2026 17:53
@drmull
Copy link
Collaborator

drmull commented Feb 24, 2026

According to documentation gen_tcp:send can hang. You dont do any load shedding so in such a situation messages would just pile up in the process message queue rather that the bound "waiting" queue.

Also not sure what the purpose of the change is, fixing bug or optimization? You should add or run some benchmarks with these type of changes.

@zuiderkwast
Copy link
Collaborator Author

zuiderkwast commented Feb 24, 2026

Hi Patrik! Good to see you here.

It's true that gen_tcp:send can hang, but only if the TCP option inet_backend = socket is used. Otherwise, inet will buffer it when the OS send buffer is full so send will never block unless the socket backend is used. It's possible that someone wants to use the socket backend though. Currently, we rely on limiting pending queue. We can also limit how much we send in bytes rather than only the number of commands.

The main reason for the change is that batching didn't work as intended previously. When the pending queue got full, then when one command returned, ered_connection would send the reply back to ered_client which returned the reply to the caller. Then process_commands would send only one command from the waiting queue to ered_connection. The send_loop process would try to read more commands to build a batch, but since ered_client only sent one, the send loop only sends one command at a time to the socket.

This PR doesn't fix the batching problem, but it gets much easier to fix it when all the queues are in the same process.

A secondary reason is to avoid copying between processes and the complexity with that extra communication and bookkeeping.

You have a good point about benchmarking. We should write a small erlang-program to just bomb a single ered_client with as much data as possible from multiple erlang processes. But for that, we should also fix the batching bug.

@drmull
Copy link
Collaborator

drmull commented Feb 24, 2026

It's true that gen_tcp:send can hang, but only if the TCP option inet_backend = socket is used

I was thinking about this passage in the documentation:

(the function can "hang" even when using the inet backend if the internal buffers are full).

But I am not familiar with the specific details.

The main reason for the change is that batching didn't work as intended previously. When the pending queue got full, then when one command returned, ered_connection would send the reply back to ered_client which returned the reply to the caller. Then process_commands would send only one command from the waiting queue to ered_connection. The send_loop process would try to read more commands to build a batch, but since ered_client only sent one, the send loop only sends one command at a time to the socket.

Yes that sounds sub optimal! Wouldnt it be simple to solve with a high and low limit in the pending queue? When a command or reply arrive dont call the processing function if the pending queue is above the low limit. Allow the processing function to fill to the high limit when it is called.

A secondary reason is to avoid copying between processes and the complexity with that extra communication and bookkeeping.

I can see that it could be worthwhile if the send is guaranteed not to block.

@zuiderkwast
Copy link
Collaborator Author

zuiderkwast commented Feb 25, 2026

I was thinking about this passage in the documentation:

(the function can "hang" even when using the inet backend if the internal buffers are full).

Yes, you're right. And from the inet documentation under setopts/2, there's more about this.

I haven't seen this happening, and I'm wondering why other TCP tools like gun and cowboy don't seem to worry about this, but I can see some possible solutions:

  1. Use a relay process, like the old send_loop, but simpler, just relaying the send.
  2. Send from the main process, but use send_timeout 0 and if it returns timeout, try again when later after receiving some data from the server.
  3. A combination of the two. Use send_timeout 0 by default, and if it returns a timeout, spawn a temporary process just to do the blocking send. We expect it to be rare, so this code path doesn't have to be that optimal.

@zuiderkwast zuiderkwast changed the base branch from main to single-process-client February 27, 2026 00:24
@zuiderkwast
Copy link
Collaborator Author

@bjosv @JeppeTh @WilliamVoong @drmull I've created a feature branch and changed the target branch of this PR to the feature branch. I'm going to merge this PR to the feature branch now so we can open follow-up PRs targeting the feature branch. Later, when the follow-ups are done, I'll open another PR for merging the whole feature into main. We can do a more thorough review when that happens. Or, if you post review comments on this PR even after it's merged, I can address them on the feature branch.

TODOs to be done in separate PRs:

  • Batching (@Hippo)
  • Non-blocking send (me)

@zuiderkwast zuiderkwast merged commit acb5ce2 into Ericsson:single-process-client Feb 27, 2026
14 of 15 checks passed
%% This is risky behavior. If some of the commands sent are not idempotent, we
%% can't just reconnect and send them again. We may just want to return an error
%% instead.
abort_pending_commands(State) ->
Copy link

@WilliamVoong WilliamVoong Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting - maybe we should discuss this in office?

Ill try to think about some scenarios in our code where it may be relevant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So #157 (comment) was referring to this.

Yes, you can discuss it with your team. :)

I didn't want to change in case you rely on this behavior.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you may discuss resending with your team. :)

Perhaps you don't use any non-idempotent commands. Then it's safe for you, but if we want to the client to be generic, it's better to not assume that.

If you need to keep the resending behavior, then we can perhaps add an option for it.

@zuiderkwast zuiderkwast deleted the single-process-client branch March 2, 2026 15:49
{error, Reason} ->
Transport = State#st.opts#opts.transport,
Transport:close(Socket),
{noreply, connection_down({socket_closed, Reason}, State#st{socket = undefined})}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been none. Need to fix in a follow-up.

Suggested change
{noreply, connection_down({socket_closed, Reason}, State#st{socket = undefined})}
{noreply, connection_down({socket_closed, Reason}, State#st{socket = none})}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants