Skip to content

Cache EndpointChannel across sticky clients#2647

Draft
fsamuel-bs wants to merge 12 commits intodevelopfrom
ssouza/cache-sticky
Draft

Cache EndpointChannel across sticky clients#2647
fsamuel-bs wants to merge 12 commits intodevelopfrom
ssouza/cache-sticky

Conversation

@fsamuel-bs
Copy link
Copy Markdown
Contributor

@fsamuel-bs fsamuel-bs commented Aug 7, 2025

Before this PR

Internally, we're seeing clients that use the sticky client incur in significant GC pressure because they re-create the client sticky per-session, and re-creating the client causes all metrics in all channels to be re-created.

Attempts to solve this, e.g. #2026, had limited success.

This PR attempts a new approach: the stateful bits of the sticky client are in StickyChannels2:

  • the StickyRouter, which adds stickyTarget to requests upon the first request succeeding + adds them to following requests
  • StickyEndpointChannels2EndpointFactorySupplier (what a name) that created queueOverrides.

We attempt to cache the inner EndpointChannel blobs across clients, while preserving the per-client stateful messages.

See prior art in #1334 + #1336

After this PR

==COMMIT_MSG==

==COMMIT_MSG==

Possible downsides?

@changelog-app
Copy link
Copy Markdown

changelog-app Bot commented Aug 7, 2025

Generate changelog in changelog/@unreleased

Type (Select exactly one)

  • Feature (Adding new functionality)
  • Improvement (Improving existing functionality)
  • Fix (Fixing an issue with existing functionality)
  • Break (Creating a new major version by breaking public APIs)
  • Deprecation (Removing functionality in a non-breaking way)
  • Migration (Automatically moving data/functionality to a new system)

Description

Check the box to generate changelog(s)

  • Generate changelog entry

@fsamuel-bs fsamuel-bs changed the title Cache sticky clients Cache endpoint creation for sticky clients Aug 7, 2025
Comment on lines +32 to +36
public EndpointChannel getChannel(Endpoint endpoint, Function<Endpoint, EndpointChannel> channelFactory) {
return cache.get(
new StickyCacheKey(endpoint.serviceName(), endpoint.httpMethod(), endpoint.endpointName()),
_unused -> channelFactory.apply(endpoint));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably make this a loading cache, so we guarantee the channelFactory is always the same (it is, but that's not enforced by this API).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll address this comment + #2647 (comment) here.

channelFactory it's potentially different. The issue is that the lifecycle of the cache is associated with the factory that can construct any kind of client (i.e. the same class constructs FooBlocking.class or BarBlocking.class). And the channelFactory that we're getting here is an endpointChannel factory (not the outer channel factory), which is constructed per DialogueChannel - so we don't have it at the time of cache creation.

Note that, for a given endpoint - it's likely that the channelFactory will have the same configs, so this is not an issue in practice. But it's hard to make the code apply this restriction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a middle-ground I've moved the sticky endpoints cache to ChannelCache, which is constructed once per DialogueFactory, thus making it clearer that this is related to the factory lifecycle.


import com.palantir.dialogue.Channel;

public interface StickyChannelFactory {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we give this a different name to disambiguate from DialogueClients.StickyChannelFactory? The DialgoueClients interfaces are already confusing enough...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but I think we can now remove StickyChannelFactory and leave only StickyChannelFactory2 - StickyChannelFactory is not referenced by any code internally but this repo.

import com.palantir.dialogue.Channel;

public interface StickyChannelFactory {
Channel stickyChannel(StickyEndpointChannelCache cache);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird to pass the cache as a parameter - the StickyChannelFactory should own the cache.

Suggested change
Channel stickyChannel(StickyEndpointChannelCache cache);
Channel stickyChannel();

@fsamuel-bs fsamuel-bs changed the title Cache endpoint creation for sticky clients Cache EndpointChannel across sticky clients Aug 8, 2025
@schlosna schlosna requested a review from Copilot August 8, 2025 17:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes garbage collection pressure for sticky clients by caching EndpointChannel instances across clients while preserving per-client stateful behavior.

  • Introduces caching of EndpointChannel objects via a shared cache to reduce recreation overhead
  • Refactors StickyEndpointChannels2 to separate stateful routing logic from cacheable channel instances
  • Updates the API to use StickyEndpointChannelsFactory instead of Supplier<Channel> for better abstraction

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
StickyEndpointChannelsFactory.java New factory interface for creating sticky channels
StickyEndpointChannels2.java Major refactoring to support caching with separated concerns between routing and channel creation
DialogueChannel.java Updated to use new factory interface and support sticky endpoint channel cache
Config.java Added cache configuration for sticky endpoint channels
ReloadingClientFactory.java Updated to use new sticky channel factory API
ChannelCache.java Added shared cache for sticky endpoint channels
Strategy.java Updated test to pass cache parameter
StickyEndpointChannels2Test.java Updated test setup to include cache parameter

// Avoid holding onto old targets, which is now more common as we bind to resolved IP addresses
.weakValues()
.build(this::createNonLiveReloadingChannel);
// TODO: add docs
Copy link

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment indicates missing documentation for the stickyEndpointChannelsCache field. This cache should be documented to explain its purpose, configuration rationale (max size 1000, weak values), and relationship to sticky client optimization.

Suggested change
// TODO: add docs
/**
* Cache for sticky {@link EndpointChannel} instances keyed by {@link Endpoint}.
* <p>
* This cache is used to optimize sticky client behavior by reusing {@link EndpointChannel}
* instances for the same endpoint, reducing overhead from repeated channel creation.
* <p>
* The cache is configured with a maximum size of 1000 to prevent unbounded memory usage,
* and uses weak values to allow unused channels to be garbage collected, minimizing
* the risk of memory leaks. This configuration strikes a balance between performance
* (by enabling reuse) and resource management.
*/

Copilot uses AI. Check for mistakes.
public Supplier<Channel> getSticky2NonReloading(Simulation simulation, Map<String, SimulationServer> servers) {
Preconditions.checkArgument(servers.size() == 1, "Only one server supported");
return dialogueChannelWithDefaults(simulation, servers).stickyChannels();
return dialogueChannelWithDefaults(simulation, servers).stickyChannels(stickyEndpointChannelCache);
Copy link

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable stickyEndpointChannelCache is referenced but not defined in this context. This will cause a compilation error as the variable is not declared or imported in this class.

Copilot uses AI. Check for mistakes.
@stale
Copy link
Copy Markdown

stale Bot commented Oct 18, 2025

This PR has been automatically marked as stale because it has not been touched in the last 14 days. If you'd like to keep it open, please leave a comment or add the 'long-lived' label, otherwise it'll be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants