Add OVN RBAC support with per-node ovn-controller certificates#565
Add OVN RBAC support with per-node ovn-controller certificates#565slawqo wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: slawqo The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
7ec7d97 to
49c6367
Compare
There was a problem hiding this comment.
I don't think a dedicated rootca-ovn-rbac CA is needed here. OVN RBAC only checks that the client cert CN matches the chassis system-id — it does not care which CA signed the cert. The existing rootca-ovn CA can sign the RBAC certs just fine, which would remove the need for the new CA, the combined CA bundle in setup.sh, and the RbacCACertSecretName field. Or am I missing a point?
This would also make the EDPM side much simpler: the existing per-node EDPM OVN cert is already signed by rootca-ovn and has CN= with client auth usage, so it already works for RBAC as-is. The only thing needed on the EDPM side would be to set external-ids:system-id to the hostname so it matches the cert CN, instead of letting ovn-controller generate a random UUID. No new certs or CA changes needed for EDPM.
Also, I think using CN=nodeName instead of CN=UUID5(nodeName) would be simpler and easier to debug — ovn-sbctl list chassis would show actual node names.
What you think?
[update]
initially I was wondering if we could just do the rbac cert creation in the openstack-op for the ovn-controllers and kust provide the list of cert secrets. but then realized that the ovn-controllers use daemonset (was thinking its a statefulset or so), which makes it harder to predict which nodes it gets scheduled to from there, especially if you use nodeselector, or topologyRef. therefor its probably ok to issue the rbac certs for the ovn-controller k8s side in the operator.
| fmt.Sprintf("--certificate=%s", ovn_common.OVNDbCertPath), | ||
| fmt.Sprintf("--private-key=%s", ovn_common.OVNDbKeyPath), | ||
| fmt.Sprintf("--certificate=%s", ovn_common.OVNControllerCertPath), | ||
| fmt.Sprintf("--private-key=%s", ovn_common.OVNControllerKeyPath), |
There was a problem hiding this comment.
Can we expect RBAC is always enabled? Without RBAC configured (RbacIssuerName empty), no config job copies certs to these paths. This should probably be conditional:
if instance.Spec.RbacIssuerName != "" {
cmd = append(cmd, fmt.Sprintf("--certificate=%s", ovn_common.OVNControllerCertPath))
cmd = append(cmd, fmt.Sprintf("--private-key=%s", ovn_common.OVNControllerKeyPath))
} else {
cmd = append(cmd, fmt.Sprintf("--certificate=%s", ovn_common.OVNDbCertPath))
cmd = append(cmd, fmt.Sprintf("--private-key=%s", ovn_common.OVNDbKeyPath))
}
There was a problem hiding this comment.
My idea was to make it uncoditional and use it always when TLS is enabled in the environment. But we can add knob to enable/disable it if needed
| } | ||
|
|
||
| // Add full-access (non-RBAC) port for SB when TLS is enabled | ||
| if instance.Spec.DBType == ovnv1.SBDBType && instance.Spec.TLS.Enabled() { |
There was a problem hiding this comment.
maybe should also be conditional on RbacCACertSecretName != "", so only if rbac is enabled.
There was a problem hiding this comment.
same answer here - my idea is to make it always enabled when TLS is enabled
| ${CTLCMD} set-ssl {{.OVNDB_KEY_PATH}} {{.OVNDB_CERT_PATH}} ${SSL_CA_CERT} | ||
| if [[ "${DB_TYPE}" == "sb" ]]; then | ||
| # Use RBAC for the connections of the ovn-controller to SB | ||
| ${CTLCMD} set-connection role=ovn-controller ${DB_SCHEME}:${DB_PORT}:${DB_ADDR} |
There was a problem hiding this comment.
RBAC role is set on all SB databases when TLS is enabled, even if no RBAC certs exist. Combined with issue mentioned above if RBAC is not enabled (CA set), this means TLS-enabled deployments without RBAC will have the SB reject all ovn-controller connections.
Maybe we should do this?
{{- if index . "OVN_RBAC_CACERT_PATH" }}
${CTLCMD} set-connection role=ovn-controller ${DB_SCHEME}:${DB_PORT}:${DB_ADDR}
...
{{- else }}
${CTLCMD} set-connection ${DB_SCHEME}:${DB_PORT}:${DB_ADDR}
{{- end }}
There was a problem hiding this comment.
as I wrote above, my idea is to use RBAC always when TLS is enabled
| func ComputeSystemID(nodeName string) string { | ||
| return uuid.NewSHA1(uuid.NameSpaceDNS, []byte(nodeName)).String() | ||
| } |
There was a problem hiding this comment.
is it required to use UUID5 system-id ? wouldn't it be easier for debugging if the nodeNames are used/shown?
There was a problem hiding this comment.
the current edpm nodes ovn certs already have CN=hostname. if we'd use the same CA and nodename as CN it might make transition/migration easier?
Subject: O=rootca-ovn, CN=compute-n95n1war-0
There was a problem hiding this comment.
My initial idea was to use hostnames but the problem is that we would need to make changes in neutron for that as today it simply use what is in the system-id and use it as Neutron agent's ID. But more I think about it more I think that I should deeper check it in Neutron
|
|
||
| // EnsureRbacCert creates or updates a cert-manager Certificate CR for a node's | ||
| // RBAC client certificate, and waits for the resulting Secret to become ready. | ||
| func EnsureRbacCert( |
There was a problem hiding this comment.
I think we could use EnsureCert from https://github.com/openstack-k8s-operators/lib-common/blob/main/modules/certmanager/certificate.go#L172 ?
using the cert-duration from the issuer annotation https://github.com/openstack-k8s-operators/lib-common/blob/main/modules/certmanager/issuer.go#L169:
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
annotations:
cert-duration: 43800h0m0s
| Name: "rbac-ca-cert", | ||
| VolumeSource: corev1.VolumeSource{ | ||
| Secret: &corev1.SecretVolumeSource{ | ||
| SecretName: instance.Spec.RbacCACertSecretName, |
There was a problem hiding this comment.
iiuc, OVN RBAC checks CN vs system-id, not CA identity. Using the existing rootca-ovn to sign RBAC certs would eliminate: the RbacCACertSecretName field, the RBAC CA volume mount on SB pods, and the combined bundle construction in setup.sh.
There was a problem hiding this comment.
You are right, I am going to change it. Patches will be smaller then
| {{- if index . "OVN_RBAC_CACERT_PATH" }} | ||
| SSL_CA_CERT=/etc/ovn/combined-ca.pem | ||
| cat {{.OVNDB_CACERT_PATH}} {{ index . "OVN_RBAC_CACERT_PATH" }} > ${SSL_CA_CERT} | ||
| {{- else }} | ||
| SSL_CA_CERT={{.OVNDB_CACERT_PATH}} | ||
| {{- end }} |
There was a problem hiding this comment.
not required if we'd use the dedicated already existing ovn CA
| Namespace: instance.Namespace, | ||
| Labels: labels, | ||
| }, | ||
| Spec: certmgrv1.CertificateSpec{ |
There was a problem hiding this comment.
we are not setting a duration for the cert. maybe it should be longer then the default, like we do with other service certs.
| return ctrl.Result{}, err | ||
| } | ||
| Log.Info("Creating RBAC certificate", "name", certName, "cn", commonName) | ||
| err = c.Create(ctx, cert) |
There was a problem hiding this comment.
one thing which can be followed up is cleanup of RBAC certs for no longer existing nodes?
There was a problem hiding this comment.
I will add new task for this in the epic and will work on it once the initial patches will be done. Is that ok for you?
There was a problem hiding this comment.
I've just created task for that https://redhat.atlassian.net/browse/OSPRH-29979
3792a57 to
cedaa55
Compare
|
Build failed (check pipeline). Post ❌ openstack-k8s-operators-content-provider FAILURE in 11m 14s |
cedaa55 to
6c119ce
Compare
|
Build failed (check pipeline). Post ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 25m 49s |
6c119ce to
1ced423
Compare
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1906 is needed. |
1ced423 to
b3fd5d2
Compare
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1906 is needed. |
|
Build failed (check pipeline). Post ❌ openstack-k8s-operators-content-provider FAILURE in 15m 30s |
|
Build failed (check pipeline). Post ❌ openstack-k8s-operators-content-provider FAILURE in 14m 31s |
31029e4 to
9c6a40a
Compare
9c6a40a to
8e1a500
Compare
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
1 similar comment
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
Enable OVN role-based access control (RBAC) on the Southbound database so that ovn-controller nodes can only modify their own chassis rows. When the openstack-operator provides an RBAC issuer name (from a dedicated rootca-ovn-rbac CA, see patch [1]), this patch: * Creates per-node cert-manager Certificate CRs for each ovn-controller pod, with CN set to a deterministic UUID5 system-id derived from the node name (ComputeSystemID). This CN must match the chassis system-id for RBAC to authorize operations. * Copies the RBAC client cert/key into /etc/openvswitch/ on each node via the config job, and switches ovn-controller to use these dedicated paths instead of the shared OVN DB cert. * Mounts the RBAC CA certificate into ovsdbserver-sb pods and builds a combined CA bundle (regular CA + RBAC CA) so the SB database can verify ovn-controller client certificates. * Sets role=ovn-controller on the SB DB connection (port 6642) to enforce RBAC. * Creates a second SB DB listener on port 16642 with full (unrestricted) access, used by ovn-northd. * Updates inactivity probe handling in setup.sh and runtime-config.sh to iterate over all connections, since SB now has two listeners. * ovn-controller POD now waits for the Northd to be ready before start, it is done to avoid race condition when ovn-controller POD could be started before Northd would populate RBAC rules in the SB DB and that could cause issues with connection of the ovn-controller to the SB DB. * remove OVS DaemonSet readiness gate in the ovncontroller controller - it was there to make sure that local ovsdb is up so that config job would be able to store config values in it. But init scripts are already waiting actively for the ovsdb to become active before it anything else will be done. This check is also causing deadlock with deploying ovs and ovn-controller PODs now with RBAC enabled as ovn-controller needs to have certificates ready to start and create br-int brigde. That can't be done if the config job is not started and config job couldn't be started because ovncontroller controller was waiting for the OVS DaemonSet to be ready. * Remove OVS DaemonSet readiness gate from the ovncontroller controller - the gate ensured that the local ovsdb was running before the config job attempted to store configuration values. However, the init scripts already poll for ovsdb availability before doing anything else, making the gate redundant. With RBAC enabled, this gate also causes a deadlock: ovn-controller needs its RBAC certificates to start and create the br-int bridge, but those certificates are deployed by the config job, which cannot run until the OVS DaemonSet is ready — and the OVS DaemonSet cannot become ready without br-int. [1] openstack-k8s-operators/openstack-operator#1906 Related: #OSPRH-1921 Closes: #OSPRH-1922 Assisted-by: claude-opus-4.6 Signed-off-by: Slawek Kaplonski <skaplons@redhat.com>
8e1a500 to
825808e
Compare
| return ctrl.Result{}, fmt.Errorf("failed to look up OVNNorthd: %w", err) | ||
| } | ||
| if northd == nil || northd.Status.ReadyCount == 0 { | ||
| Log.Info("OVNNorthd is not ready yet, waiting before deploying OVNController...") |
There was a problem hiding this comment.
will check more tomorrow, but this got my attention and got curios on the impact on minor updates and adoption when tls is enabled?
like till now we allow updates to process in multiple maintenance windows. After this change will that stop work or /me missing something here? https://docs.redhat.com/en/documentation/red_hat_openstack_services_on_openshift/18.0/html/updating_your_environment_to_the_latest_maintenance_release/assembly_preparing-for-a-minor-update_update-rhoso#exit-the-update-process-if-an-issue-occurs_preparing-minor-update
Also this seems will impact even without minor update and just with operator updates. i.e with operator updates ovn operator reconcile RBAC changes will get applied on control plane side, and with that OVN controllers on dataplane will not be able to connect until minor update is done?
Also in case of OVN agent which also connects with NB DB, is there some issue on doing with NB? or that will be taken care separately?
|
Build failed (check pipeline). Post ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 11m 07s |
|
@slawqo: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Enable OVN role-based access control (RBAC) on the Southbound database so that ovn-controller nodes can only modify their own chassis rows.
When the openstack-operator provides an RBAC issuer name (from a dedicated rootca-ovn-rbac CA, see patch [1]), this patch:
Creates per-node cert-manager Certificate CRs for each ovn-controller pod, with CN set to a deterministic UUID5 system-id derived from the node name (ComputeSystemID). This CN must match the chassis system-id for RBAC to authorize operations.
Copies the RBAC client cert/key into /etc/openvswitch/ on each node via the config job, and switches ovn-controller to use these dedicated paths instead of the shared OVN DB cert.
Mounts the RBAC CA certificate into ovsdbserver-sb pods and builds a combined CA bundle (regular CA + RBAC CA) so the SB database can verify ovn-controller client certificates.
Sets role=ovn-controller on the SB DB connection (port 6642) to enforce RBAC.
Creates a second SB DB listener on port 16642 with full (unrestricted) access, used by ovn-northd.
Updates inactivity probe handling in setup.sh and runtime-config.sh to iterate over all connections, since SB now has two listeners.
[1] openstack-k8s-operators/openstack-operator#1906
Depends-On: openstack-k8s-operators/openstack-operator#1906
Related: #OSPRH-1921
Closes: #OSPRH-1922