✨ WIP: Cluster provider and cluster-aware controllers#3019
✨ WIP: Cluster provider and cluster-aware controllers#3019embik wants to merge 8 commits intokubernetes-sigs:mainfrom
Conversation
9584ab6 to
9b3f243
Compare
| // Name returns the name of the cluster. It identifies the cluster in the | ||
| // manager if that is attached to a cluster provider. The value is usually | ||
| // empty for the default cluster of a manager. | ||
| Name() string |
There was a problem hiding this comment.
We're internally working on a mechanism for dealing with multiple clusters, too. It's been helpful for the cluster identifier to be more flexible than a simple string. We've made it comparable.
Something like this:
type Cluster[ID comparable] interface {
Identifier() ID
...
}I understand this suggestion would cause a much larger change requiring type parameters in many places. Future work could be done to migrate.
Mostly, I wanted to raise this as something to consider and that calling the method Identifier would provide room to migrate in the future.
There was a problem hiding this comment.
I'm definitely open to that, I think renaming it to Identifier() should be easy enough! Maybe some maintainers can leave a note if they consider this a future endeavour and this would be a worthwhile change to prepare that.
There was a problem hiding this comment.
We were there before, had a logical cluster abstraction, and then went back to string as it just simplifies everything.
@sprsquish Am curious why a string as identifier and some lookup table on the cluster provider side does not work?
There was a problem hiding this comment.
Also, adding Name() string to the interface has no impact on the consumers of the Cluster type. But adding a type parameter forces them to pass e.g. string or whatever everywhere including the manager. I don't think that's a good idea. IMO, we should not do it.
There was a problem hiding this comment.
Am curious why a string as identifier and some lookup table on the cluster provider side does not work?
Strings work fine.. We like to work with more structure in our identifiers. For instance, working with multiple clusters across cloud providers it's nice to have a structure like this:
type ClusterID struct {
Provider string
Region string
Account string
Name string
}It can be marshaled into a string, of course, it's just nice not to have to.
I understand the need for a simpler approach and appreciate that this has been discussed already. I think there's a way to migrate over time that follows what's been happening with handlers by introducing TypedCluster[ID comparable] and an alias for the common use case type Cluster = TypedCluster[string] (Manager would need the same treatment).
The only change to the current proposal would be to use the more generic Identifier() rather than Name(). That leaves the door open for future discussions regarding generic identifiers without having to do all the work up front.
There was a problem hiding this comment.
If I understand your use-case correctly, this pattern should help you elegantly: https://goplay.space/#wRWyxv-GUrj
Or in other words: the type only matters in code that is ClusterID aware. Turning the string back into the rich format is one type conversion away.
9b3f243 to
f87db1f
Compare
f87db1f to
95a2c06
Compare
95a2c06 to
5c285f6
Compare
| type Provider interface { | ||
| // Get returns a cluster for the given identifying cluster name. Get | ||
| // returns an existing cluster if it has been created before. | ||
| Get(ctx context.Context, clusterName string) (Cluster, error) |
There was a problem hiding this comment.
is the default cluster a thing here already? I.e. can I always pass empty string?
There was a problem hiding this comment.
Should we specify what is returned for an unknown cluster?
There was a problem hiding this comment.
The default cluster is not a thing here, that is in Manager.GetCluster. The cluster provider doesn't have a concept of a default cluster.
Should we specify what is returned for an unknown cluster?
Probably. I can add a sentence to clarify this.
There was a problem hiding this comment.
Added a note on what should happen if the cluster is unknown.
| // Provider defines methods to retrieve clusters by name. The provider is | ||
| // responsible for discovering and managing the lifecycle of each cluster. | ||
| // |
There was a problem hiding this comment.
"defines methods" is redundant for an interface.
| // Provider defines methods to retrieve clusters by name. The provider is | |
| // responsible for discovering and managing the lifecycle of each cluster. | |
| // | |
| // Provider allows to retrieve clusters by name. The provider is | |
| // responsible for discovering and managing the lifecycle of each cluster. | |
| // |
It's kind of unclear what managing means in a getter interface.
There was a problem hiding this comment.
Maybe be concrete saying:
| // Provider defines methods to retrieve clusters by name. The provider is | |
| // responsible for discovering and managing the lifecycle of each cluster. | |
| // | |
| // Provider allows to retrieve clusters by name, e.g. to reconcilers. | |
| // The provider is responsible for discovering and managing the lifecycle | |
| // of each cluster, calling `Engage` and `Disengage` on the manager | |
| // it is run against, whenever a new cluster is discovered or a cluster | |
| // is unregistered. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: embik The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
83360c4 to
1dcba5e
Compare
b604946 to
a362992
Compare
On-behalf-of: SAP marvin.beckers@sap.com Co-authored-by: Vince Prignano <vincepri@redhat.com> Co-authored-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com> Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
On-behalf-of: SAP marvin.beckers@sap.com Co-authored-by: Vince Prignano <vincepri@redhat.com> Co-authored-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com> Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
On-behalf-of: SAP marvin.beckers@sap.com Co-authored-by: Vince Prignano <vincepri@redhat.com> Co-authored-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com> Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
On-behalf-of: SAP marvin.beckers@sap.com Co-authored-by: Vince Prignano <vincepri@redhat.com> Co-authored-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com> Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
On-behalf-of: SAP marvin.beckers@sap.com Co-authored-by: Vince Prignano <vincepri@redhat.com> Co-authored-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com> Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
On-behalf-of: SAP marvin.beckers@sap.com Co-authored-by: Vince Prignano <vincepri@redhat.com> Co-authored-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com> Co-authored-by: Iván Álvarez <ivanape@inditex.com> Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
On-behalf-of: SAP marvin.beckers@sap.com Co-authored-by: Vince Prignano <vincepri@redhat.com> Co-authored-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com> Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
On-behalf-of: SAP <marvin.beckers@sap.com> Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
a362992 to
e93979e
Compare
|
@embik: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
varshaprasad96
left a comment
There was a problem hiding this comment.
In the process of reviewing the PR in detail, for now most of my comments are either nits or questions.
This is a really helpful feature in controller-runtime, that can benefit other internal kubernetes-sigs projects.
cc: @sbueringer @joelanford Could we have some eyes on this and the proposal please. Thank you!
| return fmt.Errorf("For() can only be used with reconcile.Request, got %T", *new(request)) | ||
| var hdler handler.TypedEventHandler[client.Object, request] | ||
| switch reflect.TypeFor[request]() { | ||
| case reflect.TypeOf(reconcile.Request{}): |
There was a problem hiding this comment.
I'm not sure if it matters, I'll leave it up to you - the former is comparing parameter type at runtime and the other is checking for its type at during compile time. Isn't it better to be consistent?
| case reflect.TypeOf(reconcile.Request{}): | |
| case reflect.TypeFor[reconcile.Request](): |
There was a problem hiding this comment.
Good point, but I tried to just adapt the previous if condition. I'm not opposed to changing it, I just wanted to not fiddle with the condition checked.
| case reflect.TypeOf(reconcile.Request{}): | ||
| reflect.ValueOf(&hdler).Elem().Set(reflect.ValueOf(handler.WithLowPriorityWhenUnchanged(&handler.EnqueueRequestForObject{}))) | ||
| case reflect.TypeOf(reconcile.ClusterAwareRequest{}): | ||
| reflect.ValueOf(&hdler).Elem().Set(reflect.ValueOf(handler.WithLowPriorityWhenUnchanged(&handler.EnqueueClusterAwareRequestForObject{ClusterName: cl.Name()}))) |
There was a problem hiding this comment.
Just curious on why we need to use reflect and set the value instead of doing a direct assignment? If we are using reflect.TypeOf previously, aren't we sure of the type at runtime?
Just to make reading of the code easier.
There was a problem hiding this comment.
I've asked myself the same question when I stumbled over this part of the code (see below, it's how the code today sets the handler implementation as well). When I try to change it to something like:
case reflect.TypeOf(reconcile.ClusterAwareRequest{}):
hdler = handler.WithLowPriorityWhenUnchanged(&handler.EnqueueClusterAwareRequestForObject{ClusterName: cl.Name()})it fails with:
../../pkg/builder/controller.go:335:12: cannot use handler.WithLowPriorityWhenUnchanged(&handler.EnqueueClusterAwareRequestForObject{…}) (value of type handler.TypedEventHandler[client.Object, reconcile.ClusterAwareRequest]) as handler.TypedEventHandler[client.Object, request] value in assignment: handler.TypedEventHandler[client.Object, reconcile.ClusterAwareRequest] does not implement handler.TypedEventHandler[client.Object, request] (wrong type for method Create)
have Create(context.Context, event.TypedCreateEvent[client.Object], workqueue.TypedRateLimitingInterface[reconcile.ClusterAwareRequest])
want Create(context.Context, event.TypedCreateEvent[client.Object], workqueue.TypedRateLimitingInterface[request])My understanding of generics here is a bit limited, but I think this is because we expect TypedRateLimitingInterface to take whatever generic request we give into the function, but we are setting it to a concrete type. I suppose this only works with the reflect magic we currently have.
There was a problem hiding this comment.
No problem! This is at least how I understand this line. It's a bit unclear to me too.
|
|
||
| // engagedCluster is a map of engaged clusters. The can come and go as the manager is running. | ||
| engagedClustersLock sync.RWMutex | ||
| engagedClusters map[string]cluster.Cluster |
There was a problem hiding this comment.
It would be super helpful if we can expose the set of engagedClusters from the manager. In Kueue for a multi cluster implementation, we have a list of active and missing clusters based on available quota for admitting workloads.
There was a problem hiding this comment.
For my understanding, you'd be looking for a list of the keys for that map, is that correct (i.e. the cluster identifiers, not the cluster.Cluster objects directly)?
There was a problem hiding this comment.
Yes, a unique identifier for the cluster, so that we can fetch the relevant client and submit workloads in that cluster.
|
|
||
| var errs []error | ||
| for _, r := range cm.clusterAwareRunnables { | ||
| if err := r.Disengage(ctx, cl); err != nil { |
There was a problem hiding this comment.
Please correct me if I'm wrong - While calling Engage we are starting a SyncingSource from multicluster cache (
controller-runtime/pkg/builder/controller.go
Line 397 in e93979e
There was a problem hiding this comment.
The way this works (but I wonder if it's confusing and should be changed to be honest) is that the implementation expects the cluster provider to cancel the clusterCtx used for Engage (and subsequently, the ctxBoundedSyncingSource) when shutting down. Compare the fleet example here canceling the stored context when figuring out that a cluster has to be removed: https://github.com/embik/controller-runtime/blob/e93979ec5cdd656809f5fbb91b858665979cc040/examples/fleet/main.go#L280
// disengage manager
if mgr != nil {
if err := mgr.Disengage(ctx, k.clusters[name]); err != nil {
k.log.Error(err, "failed to disengage manager")
}
}
// stop and forget
k.lock.Lock()
k.cancelFns[name]() // <- HERE
delete(k.clusters, name)
delete(k.cancelFns, name)
k.lock.Unlock()The ctx passed to Disengage is a different context, one that is currently unused but could be used when disengaging should be cancelable.
I'm open to input here though. I was wondering if the manager should even have a Disengage (and instead clean up its internal cluster list when the respective context gets canceled), but then the manager itself wouldn't be a cluster.Aware.
| // This is an experimental feature and is subject to change. | ||
| EngageWithDefaultCluster *bool | ||
|
|
||
| // EngageWithProvidedClusters indicates whether the controller should engage |
There was a problem hiding this comment.
| // EngageWithProvidedClusters indicates whether the controller should engage | |
| // EngageWithProviderClusters indicates whether the controller should engage |
|
As per #2746 (comment), we're focussing on getting the design presented in this PR up and running in https://github.com/multicluster-runtime/multicluster-runtime. Feel free to follow along on the new project if you are interested. |
This is a continuation of #2726 and #2207, a prototype for a multi-cluster support in controller-runtime. A design proposal is at #2746 which will need updating depending on the feedback for this PR. I've reorganized code changes into different commits than the previous attempts, but tried to keep authorship of changes faithful to the commits this work is based on.
The main change from the previous PR (#2726) is the update to typed generics support and adjustment to have a BYO request type. The implications of that is a) that you will need to bring your own typed EventHandler and b) that a wrapper function is needed to inject information (mostly the cluster name) when establishing a handler.
Apart from that, the core design of this is the cluster provider that can be plugged into a manager. This unlocks different ways to discover the Kubernetes "fleet" that is being reconciled.