Skip to content

Add OVN RBAC support with per-node ovn-controller certificates#565

Open
slawqo wants to merge 1 commit into
openstack-k8s-operators:mainfrom
slawqo:issue_osprh_1922_v2
Open

Add OVN RBAC support with per-node ovn-controller certificates#565
slawqo wants to merge 1 commit into
openstack-k8s-operators:mainfrom
slawqo:issue_osprh_1922_v2

Conversation

@slawqo
Copy link
Copy Markdown
Contributor

@slawqo slawqo commented Apr 29, 2026

Enable OVN role-based access control (RBAC) on the Southbound database so that ovn-controller nodes can only modify their own chassis rows.

When the openstack-operator provides an RBAC issuer name (from a dedicated rootca-ovn-rbac CA, see patch [1]), this patch:

  • Creates per-node cert-manager Certificate CRs for each ovn-controller pod, with CN set to a deterministic UUID5 system-id derived from the node name (ComputeSystemID). This CN must match the chassis system-id for RBAC to authorize operations.

  • Copies the RBAC client cert/key into /etc/openvswitch/ on each node via the config job, and switches ovn-controller to use these dedicated paths instead of the shared OVN DB cert.

  • Mounts the RBAC CA certificate into ovsdbserver-sb pods and builds a combined CA bundle (regular CA + RBAC CA) so the SB database can verify ovn-controller client certificates.

  • Sets role=ovn-controller on the SB DB connection (port 6642) to enforce RBAC.

  • Creates a second SB DB listener on port 16642 with full (unrestricted) access, used by ovn-northd.

  • Updates inactivity probe handling in setup.sh and runtime-config.sh to iterate over all connections, since SB now has two listeners.

[1] openstack-k8s-operators/openstack-operator#1906

Depends-On: openstack-k8s-operators/openstack-operator#1906

Related: #OSPRH-1921

Closes: #OSPRH-1922

@openshift-ci openshift-ci Bot requested review from dprince and stuggi April 29, 2026 14:46
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 29, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: slawqo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@stuggi stuggi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a dedicated rootca-ovn-rbac CA is needed here. OVN RBAC only checks that the client cert CN matches the chassis system-id — it does not care which CA signed the cert. The existing rootca-ovn CA can sign the RBAC certs just fine, which would remove the need for the new CA, the combined CA bundle in setup.sh, and the RbacCACertSecretName field. Or am I missing a point?

This would also make the EDPM side much simpler: the existing per-node EDPM OVN cert is already signed by rootca-ovn and has CN= with client auth usage, so it already works for RBAC as-is. The only thing needed on the EDPM side would be to set external-ids:system-id to the hostname so it matches the cert CN, instead of letting ovn-controller generate a random UUID. No new certs or CA changes needed for EDPM.

Also, I think using CN=nodeName instead of CN=UUID5(nodeName) would be simpler and easier to debug — ovn-sbctl list chassis would show actual node names.

What you think?

[update]
initially I was wondering if we could just do the rbac cert creation in the openstack-op for the ovn-controllers and kust provide the list of cert secrets. but then realized that the ovn-controllers use daemonset (was thinking its a statefulset or so), which makes it harder to predict which nodes it gets scheduled to from there, especially if you use nodeselector, or topologyRef. therefor its probably ok to issue the rbac certs for the ovn-controller k8s side in the operator.

Comment on lines -63 to +69
fmt.Sprintf("--certificate=%s", ovn_common.OVNDbCertPath),
fmt.Sprintf("--private-key=%s", ovn_common.OVNDbKeyPath),
fmt.Sprintf("--certificate=%s", ovn_common.OVNControllerCertPath),
fmt.Sprintf("--private-key=%s", ovn_common.OVNControllerKeyPath),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we expect RBAC is always enabled? Without RBAC configured (RbacIssuerName empty), no config job copies certs to these paths. This should probably be conditional:

  if instance.Spec.RbacIssuerName != "" {
      cmd = append(cmd, fmt.Sprintf("--certificate=%s", ovn_common.OVNControllerCertPath))
      cmd = append(cmd, fmt.Sprintf("--private-key=%s", ovn_common.OVNControllerKeyPath))  
  } else {                                                                                    
      cmd = append(cmd, fmt.Sprintf("--certificate=%s", ovn_common.OVNDbCertPath))            
      cmd = append(cmd, fmt.Sprintf("--private-key=%s", ovn_common.OVNDbKeyPath))             
  } 

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was to make it uncoditional and use it always when TLS is enabled in the environment. But we can add knob to enable/disable it if needed

}

// Add full-access (non-RBAC) port for SB when TLS is enabled
if instance.Spec.DBType == ovnv1.SBDBType && instance.Spec.TLS.Enabled() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe should also be conditional on RbacCACertSecretName != "", so only if rbac is enabled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same answer here - my idea is to make it always enabled when TLS is enabled

${CTLCMD} set-ssl {{.OVNDB_KEY_PATH}} {{.OVNDB_CERT_PATH}} ${SSL_CA_CERT}
if [[ "${DB_TYPE}" == "sb" ]]; then
# Use RBAC for the connections of the ovn-controller to SB
${CTLCMD} set-connection role=ovn-controller ${DB_SCHEME}:${DB_PORT}:${DB_ADDR}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RBAC role is set on all SB databases when TLS is enabled, even if no RBAC certs exist. Combined with issue mentioned above if RBAC is not enabled (CA set), this means TLS-enabled deployments without RBAC will have the SB reject all ovn-controller connections.
Maybe we should do this?

  {{- if index . "OVN_RBAC_CACERT_PATH" }}
      ${CTLCMD} set-connection role=ovn-controller ${DB_SCHEME}:${DB_PORT}:${DB_ADDR}                                                                                                        
      ...                                                                                                                                     
  {{- else }}                                                                                                                                                                                
      ${CTLCMD} set-connection ${DB_SCHEME}:${DB_PORT}:${DB_ADDR}
  {{- end }}  

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I wrote above, my idea is to use RBAC always when TLS is enabled

Comment on lines +30 to +32
func ComputeSystemID(nodeName string) string {
return uuid.NewSHA1(uuid.NameSpaceDNS, []byte(nodeName)).String()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it required to use UUID5 system-id ? wouldn't it be easier for debugging if the nodeNames are used/shown?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current edpm nodes ovn certs already have CN=hostname. if we'd use the same CA and nodename as CN it might make transition/migration easier?

        Subject: O=rootca-ovn, CN=compute-n95n1war-0

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial idea was to use hostnames but the problem is that we would need to make changes in neutron for that as today it simply use what is in the system-id and use it as Neutron agent's ID. But more I think about it more I think that I should deeper check it in Neutron

Comment thread internal/ovncontroller/cert.go Outdated

// EnsureRbacCert creates or updates a cert-manager Certificate CR for a node's
// RBAC client certificate, and waits for the resulting Secret to become ready.
func EnsureRbacCert(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could use EnsureCert from https://github.com/openstack-k8s-operators/lib-common/blob/main/modules/certmanager/certificate.go#L172 ?

using the cert-duration from the issuer annotation https://github.com/openstack-k8s-operators/lib-common/blob/main/modules/certmanager/issuer.go#L169:

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  annotations:
    cert-duration: 43800h0m0s

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread internal/ovndbcluster/statefulset.go Outdated
Name: "rbac-ca-cert",
VolumeSource: corev1.VolumeSource{
Secret: &corev1.SecretVolumeSource{
SecretName: instance.Spec.RbacCACertSecretName,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc, OVN RBAC checks CN vs system-id, not CA identity. Using the existing rootca-ovn to sign RBAC certs would eliminate: the RbacCACertSecretName field, the RBAC CA volume mount on SB pods, and the combined bundle construction in setup.sh.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I am going to change it. Patches will be smaller then

Comment thread templates/ovndbcluster/bin/setup.sh Outdated
Comment on lines +23 to +28
{{- if index . "OVN_RBAC_CACERT_PATH" }}
SSL_CA_CERT=/etc/ovn/combined-ca.pem
cat {{.OVNDB_CACERT_PATH}} {{ index . "OVN_RBAC_CACERT_PATH" }} > ${SSL_CA_CERT}
{{- else }}
SSL_CA_CERT={{.OVNDB_CACERT_PATH}}
{{- end }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not required if we'd use the dedicated already existing ovn CA

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will be changed

Comment thread internal/ovncontroller/cert.go Outdated
Namespace: instance.Namespace,
Labels: labels,
},
Spec: certmgrv1.CertificateSpec{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not setting a duration for the cert. maybe it should be longer then the default, like we do with other service certs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is changed now

Comment thread internal/ovncontroller/cert.go Outdated
return ctrl.Result{}, err
}
Log.Info("Creating RBAC certificate", "name", certName, "cn", commonName)
err = c.Create(ctx, cert)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing which can be followed up is cleanup of RBAC certs for no longer existing nodes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add new task for this in the epic and will work on it once the initial patches will be done. Is that ok for you?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just created task for that https://redhat.atlassian.net/browse/OSPRH-29979

@slawqo slawqo force-pushed the issue_osprh_1922_v2 branch 2 times, most recently from 3792a57 to cedaa55 Compare May 4, 2026 10:38
@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdo/buildset/101b91be51ff46aea15d87c84719977e

openstack-k8s-operators-content-provider FAILURE in 11m 14s
⚠️ ovn-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@slawqo slawqo force-pushed the issue_osprh_1922_v2 branch from cedaa55 to 6c119ce Compare May 4, 2026 12:51
@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdo/buildset/72eb8d2539cd44a2a21eeafe26578e09

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 25m 49s
ovn-operator-tempest-multinode FAILURE in 1h 06m 37s

@slawqo slawqo force-pushed the issue_osprh_1922_v2 branch from 6c119ce to 1ced423 Compare May 5, 2026 13:20
@centosinfra-prod-github-app
Copy link
Copy Markdown

This change depends on a change that failed to merge.

Change openstack-k8s-operators/openstack-operator#1906 is needed.

@slawqo slawqo force-pushed the issue_osprh_1922_v2 branch from 1ced423 to b3fd5d2 Compare May 6, 2026 08:33
@centosinfra-prod-github-app
Copy link
Copy Markdown

This change depends on a change that failed to merge.

Change openstack-k8s-operators/openstack-operator#1906 is needed.

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdoproject.org/buildset/c2280103785140d5ac2d679c1a1aa86f

openstack-k8s-operators-content-provider FAILURE in 15m 30s
⚠️ ovn-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdoproject.org/buildset/113f94cc647b4164acf3fd8a06d3f421

openstack-k8s-operators-content-provider FAILURE in 14m 31s
⚠️ ovn-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@slawqo slawqo force-pushed the issue_osprh_1922_v2 branch from 31029e4 to 9c6a40a Compare May 11, 2026 10:44
@slawqo slawqo force-pushed the issue_osprh_1922_v2 branch from 9c6a40a to 8e1a500 Compare May 11, 2026 10:47
@centosinfra-prod-github-app
Copy link
Copy Markdown

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/openstack-operator for 1906,1fa6989c7a184fab80d53000a161f9d5eb0702d7

1 similar comment
@centosinfra-prod-github-app
Copy link
Copy Markdown

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/openstack-operator for 1906,1fa6989c7a184fab80d53000a161f9d5eb0702d7

Enable OVN role-based access control (RBAC) on the Southbound database
so that ovn-controller nodes can only modify their own chassis rows.

When the openstack-operator provides an RBAC issuer name (from a
dedicated rootca-ovn-rbac CA, see patch [1]), this patch:

* Creates per-node cert-manager Certificate CRs for each ovn-controller
  pod, with CN set to a deterministic UUID5 system-id derived from the
  node name (ComputeSystemID). This CN must match the chassis
  system-id for RBAC to authorize operations.

* Copies the RBAC client cert/key into /etc/openvswitch/ on each node
  via the config job, and switches ovn-controller to use these dedicated
  paths instead of the shared OVN DB cert.

* Mounts the RBAC CA certificate into ovsdbserver-sb pods and builds a
  combined CA bundle (regular CA + RBAC CA) so the SB database can
  verify ovn-controller client certificates.

* Sets role=ovn-controller on the SB DB connection (port 6642) to enforce RBAC.

* Creates a second SB DB listener on port 16642 with full (unrestricted)
  access, used by ovn-northd.

* Updates inactivity probe handling in setup.sh and runtime-config.sh to
  iterate over all connections, since SB now has two listeners.

* ovn-controller POD now waits for the Northd to be ready before start,
  it is done to avoid race condition when ovn-controller POD could be
  started before Northd would populate RBAC rules in the SB DB and that
  could cause issues with connection of the ovn-controller to the SB DB.

* remove OVS DaemonSet readiness gate in the ovncontroller controller -
  it was there to make sure that local ovsdb is up so that config job
  would be able to store config values in it. But init scripts are
  already waiting actively for the ovsdb to become active before it
  anything else will be done. This check is also causing deadlock with
  deploying ovs and ovn-controller PODs now with RBAC enabled as
  ovn-controller needs to have certificates ready to start and create
  br-int brigde. That can't be done if the config job is not started and
  config job couldn't be started because ovncontroller controller was
  waiting for the OVS DaemonSet to be ready.

* Remove OVS DaemonSet readiness gate from the ovncontroller controller -
  the gate ensured that the local ovsdb was running before the config job
  attempted to store configuration values. However, the init scripts
  already poll for ovsdb availability before doing anything else, making
  the gate redundant.
  With RBAC enabled, this gate also causes a deadlock: ovn-controller
  needs its RBAC certificates to start and create the br-int bridge, but
  those certificates are deployed by the config job, which cannot run
  until the OVS DaemonSet is ready — and the OVS DaemonSet cannot become
  ready without br-int.

[1] openstack-k8s-operators/openstack-operator#1906

Related: #OSPRH-1921
Closes: #OSPRH-1922

Assisted-by: claude-opus-4.6

Signed-off-by: Slawek Kaplonski <skaplons@redhat.com>
@slawqo slawqo force-pushed the issue_osprh_1922_v2 branch from 8e1a500 to 825808e Compare May 11, 2026 13:19
return ctrl.Result{}, fmt.Errorf("failed to look up OVNNorthd: %w", err)
}
if northd == nil || northd.Status.ReadyCount == 0 {
Log.Info("OVNNorthd is not ready yet, waiting before deploying OVNController...")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will check more tomorrow, but this got my attention and got curios on the impact on minor updates and adoption when tls is enabled?
like till now we allow updates to process in multiple maintenance windows. After this change will that stop work or /me missing something here? https://docs.redhat.com/en/documentation/red_hat_openstack_services_on_openshift/18.0/html/updating_your_environment_to_the_latest_maintenance_release/assembly_preparing-for-a-minor-update_update-rhoso#exit-the-update-process-if-an-issue-occurs_preparing-minor-update

Also this seems will impact even without minor update and just with operator updates. i.e with operator updates ovn operator reconcile RBAC changes will get applied on control plane side, and with that OVN controllers on dataplane will not be able to connect until minor update is done?

Also in case of OVN agent which also connects with NB DB, is there some issue on doing with NB? or that will be taken care separately?

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdoproject.org/buildset/9ab19fa585504d03b9e9f4c89910bcf4

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 11m 07s
ovn-operator-tempest-multinode FAILURE in 1h 51m 35s

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 11, 2026

@slawqo: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ovn-operator-build-deploy-kuttl 825808e link true /test ovn-operator-build-deploy-kuttl

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants