feat: auto tolerate daemonsets with MAP#117
feat: auto tolerate daemonsets with MAP#117pehlicd wants to merge 3 commits intokubernetes-sigs:mainfrom
Conversation
✅ Deploy Preview for node-readiness-controller canceled.
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pehlicd The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: pehlicd <furkanpehlivan34@gmail.com>
9676992 to
24fde7b
Compare
|
Thank you so much for the PR @pehlicd. I have left a few in-line comments. In addition to those, how do you suggest handling RBAC for the controller to be able to update For my local testing, I manually created the required RBAC. (updated later) Adding logs from my local testing, if you need to reproduce: |
|
Also adding, I was able to test all the scenarios listed here: (with the changes I suggested in the comments above) In addition, I also tested deleting a node-readiness-rule -> which removes the corresponding taint from the which I believe is how it should be (we should not be removing any existing tolerations, only inject new ones - the MAP rule suggested in the PR achievss that) |
|
/retest |
Signed-off-by: pehlicd <furkanpehlivan34@gmail.com>
| // +kubebuilder:rbac:groups=readiness.node.x-k8s.io,resources=nodereadinessrules/status,verbs=get;update;patch | ||
| // +kubebuilder:rbac:groups=readiness.node.x-k8s.io,resources=nodereadinessrules/finalizers,verbs=update | ||
| // +kubebuilder:rbac:groups="",resources=configmaps,verbs=get;list;watch;create;update;patch;delete | ||
| // +kubebuilder:rbac:groups="",resources=nodes,verbs=get;list;watch;create;update;patch;delete |
There was a problem hiding this comment.
Although we had nodes rbac roles, it was missing in the inline comments so I added them.
There was a problem hiding this comment.
@pehlicd, I looked into it further.
Firstly, the node resource RBAC added above - I don't think we need them at all for this PR's purpose. (correction - they're already added in the node controller here -
node-readiness-controller/internal/controller/node_controller.go
Lines 84 to 86 in 354b151
So, Let's remove the redundant markers from here.
What I realised is we don't need the above cluster-scoped configmap RBAC as well -
because, we already have a namespace-scoped RBAC role for configmaps (leader-election-role) here which is better IMO in terms of surface area of permissions and serves our purpose -
node-readiness-controller/config/rbac/leader_election_role.yaml
Lines 9 to 21 in 354b151
The only problem is that currently the (controller-runtime's informer) cache is trying to watch configmaps at the cluster scope (see the error below again) but our namespaced role only provide permissions for nrr-system namespace, so I suggest we configure the cache to only watch configmaps in our required namespace.
❯ kubectl logs -n nrr-system
deployment/nrr-controller-manager --tail=30 | grep -i configmap
2026-02-09T09:46:35Z ERROR controller-runtime.cache.UnhandledError Failed
to watch{"reflector":
"pkg/mod/k8s.io/client-go@v0.34.0/tools/cache/reflector.go:290", "type":
"*v1.ConfigMap", "error": "failed to list *v1.ConfigMap: configmaps is forbidden:
User \"system:serviceaccount:nrr-system:nrr-controller-manager\" cannot list
resource \"configmaps\" in API group \"\" at the cluster scope"}
I tried like following, it worked -
diff --git a/cmd/main.go b/cmd/main.go
index 34652e9..1527845 100644
--- a/cmd/main.go
+++ b/cmd/main.go
@@ -28,11 +28,14 @@ import (
_ "k8s.io/client-go/plugin/pkg/client/auth"
"k8s.io/client-go/rest"
+ corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/runtime"
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
"k8s.io/client-go/kubernetes"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
ctrl "sigs.k8s.io/controller-runtime"
+ "sigs.k8s.io/controller-runtime/pkg/cache"
+ "sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/healthz"
"sigs.k8s.io/controller-runtime/pkg/log/zap"
"sigs.k8s.io/controller-runtime/pkg/metrics/filters"
@@ -107,6 +110,15 @@ func main() {
HealthProbeBindAddress: probeAddr,
LeaderElection: enableLeaderElection,
LeaderElectionID: "ba65f13e.readiness.node.x-k8s.io",
+ Cache: cache.Options{
+ ByObject: map[client.Object]cache.ByObject{
+ &corev1.ConfigMap{}: {
+ Namespaces: map[string]cache.Config{
+ "nrr-system": {},
+ },
+ },
+ },
+ },
})
if err != nil {
setupLog.Error(err, "unable to start manager")| // +kubebuilder:rbac:groups=readiness.node.x-k8s.io,resources=nodereadinessrules,verbs=get;list;watch;create;update;patch;delete | ||
| // +kubebuilder:rbac:groups=readiness.node.x-k8s.io,resources=nodereadinessrules/status,verbs=get;update;patch | ||
| // +kubebuilder:rbac:groups=readiness.node.x-k8s.io,resources=nodereadinessrules/finalizers,verbs=update | ||
| // +kubebuilder:rbac:groups="",resources=configmaps,verbs=get;list;watch;create;update;patch;delete |
There was a problem hiding this comment.
In addition to those, how do you suggest handling RBAC for the controller to be able to update taints-key data in the readiness-taints configmap?
The PR currently miss that piece.
@Priyankasaggu11929 regarding to your comment above I added this inline kubebuilder comments and regenerated manifests.
|
Thanks for the detailed review, @Priyankasaggu11929. I believe I have addressed all the feedback. Please let me know if anything else is needed. |
|
/retest |
|
@pehlicd: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
/ok-to-test |
Signed-off-by: pehlicd <furkanpehlivan34@gmail.com>
|
/retest |
| - name: taintKeys | ||
| expression: | | ||
| ("taint-keys" in params.data) && params.data["taint-keys"] != "" ? | ||
| params.data["taint-keys"].split(",") : [] |
There was a problem hiding this comment.
| params.data["taint-keys"].split(",") : [] | |
| params.data["taint-keys"].split(",").filter(key, key != "") : [] |
currently, if someone intentionally add a empty taint key to the configmap
our MAP policy will create a matching empty-key wildcard toleration
For example
❯ kubectl patch configmap readiness-taints -n nrr-system --type=merge -p '{"data":{"taint-keys":"readiness.k8s.io/NetworkReady,,readiness.k8s.io/StorageReady"}}'
configmap/readiness-taints patched
// notice the two commas which basically mean an empty taint key
❯ kubectl get configmap -n nrr-system readiness-taints -o jsonpath='{.data.taint-keys}'
readiness.k8s.io/NetworkReady,,readiness.k8s.io/StorageReady
// the above empty taint, injected an empty-key wildcard toleration on the ds
❯ kubectl get ds test-ds -o jsonpath='{.spec.template.spec.tolerations[*]}' | jq
{
"effect": "NoSchedule",
"key": "readiness.k8s.io/NetworkReady",
"operator": "Exists"
}
{
"effect": "NoSchedule",
"operator": "Exists"
}
{
"effect": "NoSchedule",
"key": "readiness.k8s.io/StorageReady",
"operator": "Exists"
}| paramRef: | ||
| name: readiness-taints | ||
| namespace: nrr-system | ||
| parameterNotFoundAction: Deny |
There was a problem hiding this comment.
Does parameterNotFoundAction:Deny blocks daemonset creations in the absence of this config-map param?
There was a problem hiding this comment.
yes, in my local testing with the PR changes, it is blocking creation of new daemonsets.
It works if a user creates node-readiness-rules first and then daemonsets, but the other way around blocks the creation of new daemonsets.
I tested the behaviour by disabling the empty configmap file in the kustomization.yaml, because I was thinking why do we need to have an empty configmap file, if the configmap creation is taken care of by the controller itself - https://github.com/kubernetes-sigs/node-readiness-controller/pull/117/changes#diff-b857080def3b2fecd3967ca35b1bdc29564dda0b064fa902fa24efcd55471899R721-R727
But the thing is - the controller logic only kicks in when a user create a new node-readiness-rule.
So if a user created a daemonset first before creating the node-readiness-rules, that daemonset object creation is blocked.
The presence of the empty configmap file at the same time of creating MAP Policy/Policy-Binding fixes it.
❯ kubectl get node nrr-test-worker -o jsonpath={.spec.taints} | jq .
[
{
"effect": "NoSchedule",
"key": "readiness.k8s.io/NetworkReady",
"value": "pending"
}
]
// applying MAP policy/policy-binding without empty configmap
❯ kubectl apply -k config/admission-policy
mutatingadmissionpolicy.admissionregistration.k8s.io/inject-daemonset-readiness-tolerations created
mutatingadmissionpolicybinding.admissionregistration.k8s.io/inject-daemonset-readiness-tolerations-binding created
❯ kubectl get cm -n nrr-system
NAME DATA AGE
kube-root-ca.crt 1 41s
// daemonset creation is blocked
❯ kubectl apply -f test-daemonset.yaml
Error from server (Forbidden): error when creating "test-daemonset.yaml": policy 'inject-daemonset-readiness-tolerations' with binding 'inject-daemonset-readiness-tolerations-binding' denied request: failed to configure binding: no params found for policy binding with `Deny` parameterNotFoundAction
// creating a new nrr rule, which will trigger in turn configmap creation/patch per PR changes
❯ kubectl apply -f examples/cni-readiness/network-readiness-rule.yaml
nodereadinessrule.readiness.node.x-k8s.io/network-readiness-rule created
❯ kubectl get cm readiness-taints -n nrr-system -o jsonpath={.data}
{"taint-keys":"readiness.k8s.io/NetworkReady"}
// now daemonset creation works
❯ kubectl apply -f test-daemonset.yaml
daemonset.apps/test-ds createdThere was a problem hiding this comment.
It feels like using Deny is high-risk to me. If the controller is healthy and the config map exists, the tolerations are injected. Say if something is wrong with the controller, the cluster should remain functional, and users can still manually add tolerations to their DaemonSets if needed.
Auto-tolerations given an optional feature in node-readiness-controller shouldn't be fail-closed. It can be "best-effort".
| - create | ||
| - delete |
There was a problem hiding this comment.
not from your changes, but dont think manager needs create or delete for node objects.
|
|
||
| ### Option 2: Direct kubectl apply | ||
|
|
||
| ```bash | ||
| # Install CRDs first | ||
| make install | ||
|
|
||
| # Deploy policy and binding | ||
| kubectl apply -f config/admission-policy/policy.yaml | ||
| kubectl apply -f config/admission-policy/binding.yaml | ||
| ``` |
There was a problem hiding this comment.
hmm. this looks redundant. also, dont we need to show updating the config-map and how we maintain it with readiness-taints introduced later?
| ↓ | ||
| Fetches Tolerations ConfigMap which contains the tolerations to be injected | ||
| ↓ | ||
| Injects tolerations (if applicable) |
There was a problem hiding this comment.
This doesnt tell what is applicable. could we also add details on the annotation requirement?
| metadata: | ||
| name: inject-daemonset-readiness-tolerations | ||
| spec: | ||
| failurePolicy: Fail |
There was a problem hiding this comment.
| failurePolicy: Fail | |
| failurePolicy: Ignore |
same as discussed above, we shouldn't risk controller becoming a single point of failure for the entire cluster's DaemonSet management.
|
/assign @pehlicd |
Description
This PR implements automatic DaemonSet toleration injection for
readiness.k8s.io/*taints using MutatingAdmissionPolicy with ConfigMap parameter resource and automated sync via the existing controller.Key features:
Related Issue
Fixes #7
Type of Change
/kind feature
Testing
Tested in local kind cluster.
Checklist
make testpassesmake lintpassesDoes this PR introduce a user-facing change?
Doc #7