Cache EndpointChannel across sticky clients#2647
Cache EndpointChannel across sticky clients#2647fsamuel-bs wants to merge 12 commits intodevelopfrom
EndpointChannel across sticky clients#2647Conversation
Generate changelog in
|
| public EndpointChannel getChannel(Endpoint endpoint, Function<Endpoint, EndpointChannel> channelFactory) { | ||
| return cache.get( | ||
| new StickyCacheKey(endpoint.serviceName(), endpoint.httpMethod(), endpoint.endpointName()), | ||
| _unused -> channelFactory.apply(endpoint)); | ||
| } |
There was a problem hiding this comment.
We should probably make this a loading cache, so we guarantee the channelFactory is always the same (it is, but that's not enforced by this API).
There was a problem hiding this comment.
I'll address this comment + #2647 (comment) here.
channelFactory it's potentially different. The issue is that the lifecycle of the cache is associated with the factory that can construct any kind of client (i.e. the same class constructs FooBlocking.class or BarBlocking.class). And the channelFactory that we're getting here is an endpointChannel factory (not the outer channel factory), which is constructed per DialogueChannel - so we don't have it at the time of cache creation.
Note that, for a given endpoint - it's likely that the channelFactory will have the same configs, so this is not an issue in practice. But it's hard to make the code apply this restriction.
There was a problem hiding this comment.
As a middle-ground I've moved the sticky endpoints cache to ChannelCache, which is constructed once per DialogueFactory, thus making it clearer that this is related to the factory lifecycle.
|
|
||
| import com.palantir.dialogue.Channel; | ||
|
|
||
| public interface StickyChannelFactory { |
There was a problem hiding this comment.
nit: Can we give this a different name to disambiguate from DialogueClients.StickyChannelFactory? The DialgoueClients interfaces are already confusing enough...
There was a problem hiding this comment.
Agreed, but I think we can now remove StickyChannelFactory and leave only StickyChannelFactory2 - StickyChannelFactory is not referenced by any code internally but this repo.
| import com.palantir.dialogue.Channel; | ||
|
|
||
| public interface StickyChannelFactory { | ||
| Channel stickyChannel(StickyEndpointChannelCache cache); |
There was a problem hiding this comment.
It's weird to pass the cache as a parameter - the StickyChannelFactory should own the cache.
| Channel stickyChannel(StickyEndpointChannelCache cache); | |
| Channel stickyChannel(); |
EndpointChannel across sticky clients
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes garbage collection pressure for sticky clients by caching EndpointChannel instances across clients while preserving per-client stateful behavior.
- Introduces caching of
EndpointChannelobjects via a shared cache to reduce recreation overhead - Refactors
StickyEndpointChannels2to separate stateful routing logic from cacheable channel instances - Updates the API to use
StickyEndpointChannelsFactoryinstead ofSupplier<Channel>for better abstraction
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| StickyEndpointChannelsFactory.java | New factory interface for creating sticky channels |
| StickyEndpointChannels2.java | Major refactoring to support caching with separated concerns between routing and channel creation |
| DialogueChannel.java | Updated to use new factory interface and support sticky endpoint channel cache |
| Config.java | Added cache configuration for sticky endpoint channels |
| ReloadingClientFactory.java | Updated to use new sticky channel factory API |
| ChannelCache.java | Added shared cache for sticky endpoint channels |
| Strategy.java | Updated test to pass cache parameter |
| StickyEndpointChannels2Test.java | Updated test setup to include cache parameter |
| // Avoid holding onto old targets, which is now more common as we bind to resolved IP addresses | ||
| .weakValues() | ||
| .build(this::createNonLiveReloadingChannel); | ||
| // TODO: add docs |
There was a problem hiding this comment.
The TODO comment indicates missing documentation for the stickyEndpointChannelsCache field. This cache should be documented to explain its purpose, configuration rationale (max size 1000, weak values), and relationship to sticky client optimization.
| // TODO: add docs | |
| /** | |
| * Cache for sticky {@link EndpointChannel} instances keyed by {@link Endpoint}. | |
| * <p> | |
| * This cache is used to optimize sticky client behavior by reusing {@link EndpointChannel} | |
| * instances for the same endpoint, reducing overhead from repeated channel creation. | |
| * <p> | |
| * The cache is configured with a maximum size of 1000 to prevent unbounded memory usage, | |
| * and uses weak values to allow unused channels to be garbage collected, minimizing | |
| * the risk of memory leaks. This configuration strikes a balance between performance | |
| * (by enabling reuse) and resource management. | |
| */ |
| public Supplier<Channel> getSticky2NonReloading(Simulation simulation, Map<String, SimulationServer> servers) { | ||
| Preconditions.checkArgument(servers.size() == 1, "Only one server supported"); | ||
| return dialogueChannelWithDefaults(simulation, servers).stickyChannels(); | ||
| return dialogueChannelWithDefaults(simulation, servers).stickyChannels(stickyEndpointChannelCache); |
There was a problem hiding this comment.
The variable stickyEndpointChannelCache is referenced but not defined in this context. This will cause a compilation error as the variable is not declared or imported in this class.
|
This PR has been automatically marked as stale because it has not been touched in the last 14 days. If you'd like to keep it open, please leave a comment or add the 'long-lived' label, otherwise it'll be closed in 7 days. |
Before this PR
Internally, we're seeing clients that use the sticky client incur in significant GC pressure because they re-create the client sticky per-session, and re-creating the client causes all metrics in all channels to be re-created.
Attempts to solve this, e.g. #2026, had limited success.
This PR attempts a new approach: the stateful bits of the sticky client are in
StickyChannels2:StickyRouter, which addsstickyTargetto requests upon the first request succeeding + adds them to following requestsStickyEndpointChannels2EndpointFactorySupplier(what a name) that createdqueueOverrides.We attempt to cache the inner
EndpointChannelblobs across clients, while preserving the per-client stateful messages.See prior art in #1334 + #1336
After this PR
==COMMIT_MSG==
==COMMIT_MSG==
Possible downsides?