Skip to content

Clustering for signals #8703

@Legioth

Description

@Legioth

Description

Support automatic sharing of shared signal values between nodes in a cluster.

The solution should enable building a view with cluster-shared state along these lines:

@Route
public class ClickCount extends VerticalLayout {
  public ClickCount(@Autowired ClusteredSignalFactory factory) {
    SharedNumberSignal countSignal = factory.getNumber("clickCount");

    Button button = new Button();
    button.bindText(() -> "Click count: " + countSignal.get());
    button.addClickListener(click -> countSignal.incrementBy(1));

    add(button);
  }
}

Tier

Enterprise

License

Proprietary

Motivation

Background

Vaadin 25.1 introduces signals for reactive UI state management in Flow. One limitation in 25.1 is that it's not possible to share signal state between nodes in a cluster. To avoid causing surprises in production, the shared signal types throw an exception when serialized to make the developer realize that the use case is explicitly unsupported.

Problem

Signals can be a powerful mechanism for implementing collaborative functionality in a cluster by sharing UI state between collaborating users regardless of which nodes in the cluster those users are connected to. This requires new framework functionality for synchronizing signal values so that state can be shared between users who are connected to different nodes in a cluster.

Solution

The core idea is that multiple nodes in the cluster can have signal instances that are connected to each other. The application developer controls how instances are connected based on a string identifier provided when acquiring the instance. The state in instances that use the same identifier value is kept in sync whereas nothing is shared between instances using different identifiers.

AbstractSharedSignal and all subclasses are already designed for asynchronous operation. There's support for using a custom SignalTree implementation that would submit and confirm changes through an external system. Rather than manually managing SignalTree implementations, the application would configure a ClusteredSignalFactory instance as e.g. a Spring bean that can be injected into views that use collaborative features.

Strongly consistent event log

Synchronization between nodes happens through an event log abstraction. Each signal identifier corresponds to a separate event log which means that signals with different identifiers are completely independent from each other and might be hosted on different nodes in a distributed system.

Events in the log carry the signal modification operations that are already used by AbstractSharedSignal. The log must have strongly consistent ordering, meaning that there's no possibility that event order will change after a submitted event has been accepted by the log. Events can be distributed to clients in the cluster only when ordering is guaranteed to no longer change, e.g. through a consensus mechanism.

When a new signal instance is created with an identifier for which an event log already exists, it's necessary to re-apply all previous events from that log so that the signal reaches the same state as other signal instances using the same identifier. As the event log grows longer, this becomes increasingly more expensive. This should be mitigated with the help of state snapshots that are occasionally created and stored in the cluster. A new signal instance can then find the latest snapshot for a given event log and only replay events that have been added to the log after the snapshot was created.

Cluster backends

We will not build our own distributed event log from scratch. Instead, we should integrate with existing clustering technology that has solved hard problems such as partitioning and consensus.

A wide range of existing systems can provide the building blocks that are needed.

  • Actual event log systems such as Kafka or RabbitMQ (note that a traditional event queue is not enough since it doesn't preserve old events for later replay)
  • Databases with a notification mechanism like PostgreSQL or MongoDB
  • Distributed in-memory data grids like Hazelcast
  • Caches like Redis or Ehcache
  • Agent systems like Akka

We assume that a user who runs their application in a cluster already has an existing solution for communicating across that cluster and that they want to keep using that solution rather than deploying another one.

We need to find out which such systems would be most useful for typical users so that we can choose which ones to integrate with initially. Some promising candidates include Hazelcast thanks to the way it can be embedded in the application rather than run as a standalone system, and PostgreSQL thanks to its wide adoption.

Serialization

Vaadin applications running in a cluster often use session serialization for high availability or to migrate sessions from a node that is about to be shut down. Serializing a shared signal instance will lead to problems in case that instance has listeners from other sessions since that will effectively lead to serializing multiple sessions at the same time. Also, serialized data might no longer be up-to-date after deserializing.

Conceptually, what's needed is to not serialize the whole signal with all its data but instead only metadata needed to recover a connection to the same underlying event log. The main challenge is how to handle listeners. One potential solution might be to not allow sharing actual signal instances between sessions but instead let each user have their own signal and tree instances and manage only share the event log abstraction between sessions. This might be straightforward to implement but would lead to some application constraints that might be challenging to enforce even at runtime. Another alternative might be to only collect the data needed to recreate listeners from the current session while ignoring other listeners. Some research is needed on this topic before choosing an approach.

Notes

Similar functionality has been available as a preview feature for Collaboration Kit without gaining traction. It's assumed that this is mainly a reflection on the positioning of Collaboration Kit in general, but there's also a risk that it implies that there just isn't a huge demand among customers for this kind of solution.

Requirements

TBD

  • Requirement 1

  • Requirement 2

  • Requirement 3

  • Documentation

  • License check

  • Feature flag (remove if not needed)

Nice-to-haves

TBD

  • Nice-to-have 1
  • Nice-to-have 2
  • Nice-to-have 3

Risks, limitations and breaking changes

Risks

  • Correctness is the main concern. We need to carefully review and test the implementation to avoid different types of race conditions that lead to missed or duplicated events.
  • Access control on the application level is the responsibility of the application developer by choosing which users get access to which signal instances. Access control on the infrastructure level is handled by the 3rd party cluster backend.

Limitations

N/A

Breaking changes

This would be the first case where we properly exercise some aspects of the asynchronous APIs in the shared signals and this might uncover some needs for API adjustments. We might also need to make some changes to the internal-ish APIs related to listener registration to be able to associate a listener with a specific session to help manage serialization.

Out of scope

No response

Materials

No response

Metrics

No response

Pre-implementation checklist

  • Estimated (estimate entered into Estimate custom field)
  • Product Manager sign-off
  • Engineering Manager sign-off

Pre-release checklist

  • Documented (link to documentation provided in sub-issue or comment)
  • UX/DX tests conducted and blockers addressed
  • Approved for release by Product Manager

Security review

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions