Skip to content

Adds support to deploy cyborg controlplane services#1102

Open
amoralej wants to merge 7 commits into
openstack-k8s-operators:mainfrom
amoralej:add-cyborg
Open

Adds support to deploy cyborg controlplane services#1102
amoralej wants to merge 7 commits into
openstack-k8s-operators:mainfrom
amoralej:add-cyborg

Conversation

@amoralej
Copy link
Copy Markdown

@amoralej amoralej commented Apr 15, 2026

Add support for OpenStack Cyborg (accelerator lifecycle management service) in nova-operator, introducing three new CRDs and their controllers.

  • Define Cyborg, CyborgAPI and CyborgConductor CRDs with full spec/status types, defaulting and validation webhooks, and printer columns for oc get cyborg/cyborgapi/cyborgconductor.
  • Implement the Cyborg top-level controller: manages RBAC, MariaDB database and account, RabbitMQ TransportURL, Keystone service registration, DB sync job, and creates a sub-level secret that aggregates credentials (DB, transport URL, service password) for consumption by the child CRs.
  • Implement the CyborgAPI controller: renders WSGI/httpd configuration, creates a StatefulSet for cyborg-api pods with TLS support, and registers Keystone public/internal endpoints.
  • Implement the CyborgConductor controller: renders conductor configuration and creates a StatefulSet for cyborg-conductor.
  • All controllers are gated behind the ENABLE_CYBORG=true environment variable.
  • Add kuttl end-to-end tests validating CR conditions, sub-level secret content, config-data secrets, DB sync job completion, StatefulSet readiness, Services, and pod volume mounts.

Assisted-By: Claude

Jira: OSPRH-27674

Using operator-sdk command:

operator-sdk create api --group cyborg --version v1beta1 --kind Cyborg --resource --controller
operator-sdk create api --group cyborg --version v1beta1 --kind CyborgAPI --resource --controller
operator-sdk create api --group cyborg --version v1beta1 --kind CyborgConductor --resource --controller

Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
Define CRD specs for Cyborg, CyborgAPI and CyborgConductor resources:
- Add CyborgSpec with DB, RabbitMQ, Keystone and TLS configuration
- Add CyborgAPISpec and CyborgConductorSpec with configSecret,
  replicas, resources, nodeSelector and TLS fields
- Implement defaulting and validation webhooks for all three CRDs
- Register CRDs in the operator scheme
- Update CRD YAML manifests and CSV for OLM

Reconcile and configuration logic will be created in next commits.

Assisted-By: Claude
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
Add full reconcile logic for the Cyborg CR:
- Manage RBAC resources (ServiceAccount, Role, RoleBinding)
- Validate input password secret and RabbitMQ TransportURL secret
- Create MariaDB database and run DB sync job via a batch Job
- Register Cyborg service in Keystone
- Create a sub-level secret aggregating DB credentials, transport URL
  and service password to be consumed by CyborgAPI and CyborgConductor
- Track readiness via structured conditions on CyborgStatus
- Add functional tests covering the full reconcile flow

Assisted-By: Claude
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
Add full reconcile logic for the CyborgConductor CR:
- Validate input from the config secret created by the Cyborg controller
- Generate conductor config from templates (00-default.conf)
- Create a StatefulSet to run cyborg-conductor pods
- Track readiness (ReadyCount, conditions, hash, topology)
- Expose IsReady and topology helpers on CyborgConductor type
- Update CyborgConductorStatus with structured conditions and hash
- Extend Cyborg controller to propagate conductor and check readiness upwards
- Add functional tests for the conductor reconcile loop

Assisted-By: Claude
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
Add full reconcile logic for the CyborgAPI CR:
- Validate input from config secret provided by the Cyborg controller
- Render WSGI/httpd and cyborg-api configuration templates
- Create a StatefulSet for cyborg-api pods with TLS support
- Register Keystone endpoints (public and internal) for the API
- Track readiness (ReadyCount, conditions, hash, topology)
- Expose IsReady and topology helpers on CyborgAPI type
- Extend Cyborg controller to create CyborgAPI and check readiness upwards
- Add functional tests covering the full API reconcile flow

Assisted-By: Claude
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
Add an end-to-end kuttl test suite for the Cyborg operator:
- Cleanup step to delete any pre-existing Cyborg CR before the test
- Deploy step creating a full Cyborg CR (cyborg-kuttl)
- Assert step verifying all conditions are True on Cyborg, CyborgAPI,
  CyborgConductor and MariaDBDatabase CRs
- Error step covering missing-dependency failure scenarios
- Register cyborg container images (api, conductor, agent) as default
  RELATED_IMAGE env vars in the manager deployment
- Enable ENABLE_CYBORG=true in the CI webhook deploy script

Assisted-By: Claude
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
Similar to for CyborgAPI and CyborgConductor and other OpenStack CRDs.

Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
@openshift-ci openshift-ci Bot requested review from gibizer and jamepark4 April 15, 2026 14:38
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 15, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: amoralej
Once this PR has been reviewed and has the lgtm label, please assign stuggi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/3a20fb346aa44e9d80abc0ba1ff8cdf5

✔️ openstack-meta-content-provider SUCCESS in 2h 58m 29s
✔️ nova-operator-kuttl SUCCESS in 53m 19s
nova-operator-tempest-multinode RETRY_LIMIT in 3m 43s
✔️ nova-operator-tempest-multinode-ceph SUCCESS in 2h 41m 14s

@amoralej
Copy link
Copy Markdown
Author

check-rdo

labels:
app.kubernetes.io/name: nova-operator
app.kubernetes.io/managed-by: kustomize
name: cyborg-cyborg-admin-role
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the existing roles for nova do not repeat the service name, e.g nova_admin_role, novaconductor_admin_role. If possible I think it would be good to use the same pattern for the cyborg roles

// ensureTopology - when a Topology CR is referenced, remove the
// finalizer from a previous referenced Topology (if any), and retrieve the
// newly referenced topology object
func ensureTopology(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function seems to be an exact duplicate of

. This might be a question more for the nova-operator maintainers, but is there a way to reuse code between the controllers of different services?

{{ end }}
auth_url = {{ .KeystoneAuthURL }}
interface = internal
{{ if (index . "Region") }}region_name = {{ index . "Region" }}{{ end }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for readability, I think it would be better to have this condition in three lines like the one below

}

// GetSampleTopologySpec - An opinionated Topology Spec sample used to
// test Nova components. It returns both the user input representation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Nova/Cyborg

# --- Sub-level secret (cyborg-kuttl) content ---
# transport_url must be a RabbitMQ URL created from the TransportURL CR
- script: |
oc get secret cyborg-kuttl -n nova-kuttl-default \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be best to use the $NAMESPACE variable instead of hardcoding the namespace value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants