Skip to content

Add fdp_update_edpm role for EDPM node updates#1

Open
mnietoji wants to merge 70 commits into
fdp_update_container_imagesfrom
fdp_update_edpm
Open

Add fdp_update_edpm role for EDPM node updates#1
mnietoji wants to merge 70 commits into
fdp_update_container_imagesfrom
fdp_update_edpm

Conversation

@mnietoji
Copy link
Copy Markdown
Owner

@mnietoji mnietoji commented Nov 7, 2025

Implement EDPM node update automation for FDP updates:

  • Role fdp_update_edpm: Updates EDPM nodes declaratively via Kubernetes CRs
    • Patches OpenStackDataPlaneNodeSet CRs with updated container images
    • Configures package updates via edpm_bootstrap_packages
    • Sets up registry authentication and CA certificates
    • Creates OpenStackDataPlaneDeployment to apply changes
    • Includes hypervisor firewall configuration for registry access
  • Integration in post-deployment.yml after control plane updates
  • Zuul CI configuration for automated testing

This role enables updating Fast Data Path components on EDPM (External Data Plane Management) nodes using a declarative approach. Updates are applied by modifying Kubernetes CRs and letting the OpenStack Data Plane Operator execute the changes via native edpm-ansible roles. Works in conjunction with fdp_update_container_images to provide a complete FDP update workflow across both control plane and data plane.

Assisted-By: Claude noreply@anthropic.com

@mnietoji mnietoji force-pushed the fdp_update_container_images branch from a861282 to 2f61a6a Compare November 7, 2025 16:16
@mnietoji mnietoji force-pushed the fdp_update_edpm branch 5 times, most recently from 02be96f to c06f97a Compare November 7, 2025 19:55
@mnietoji mnietoji force-pushed the fdp_update_container_images branch from 2f61a6a to 90f4165 Compare November 7, 2025 20:02
@mnietoji mnietoji force-pushed the fdp_update_edpm branch 5 times, most recently from 2f1a170 to 8e50296 Compare November 11, 2025 08:53
@mnietoji mnietoji force-pushed the fdp_update_container_images branch 11 times, most recently from 3f92c93 to 12005c4 Compare November 12, 2025 16:49
danpawlik and others added 4 commits November 12, 2025 19:23
…f-By

Commit fixes issues:
- git body count characters "fits" for requirements when "Signed-Off-By"
  was set
- take commit message from commit SHA-1 instead of compute it in the
  script in Github workflows

Signed-off-by: Daniel Pawlik <dpawlik@redhat.com>
- Rename _cifmw_kustomize_deploy_olm_osp_operator_sub to
  _cifmw_kustomize_deploy_olm_osp_operator_subscription for better
  readability and consistency.

- Add dedicated cifmw_kustomize_deploy_retries_subscription parameter
  (default: 90) to allow independent configuration of Subscription vs
  InstallPlan retry timeouts.

- Fix task name from 'Wait for InstallPlan to be created' to
  'Wait for Subscription to be created' to correctly reflect what
  the task is actually waiting for.

- Update role README.md to document the new retries_subscription
  parameter in the Timeouts section.

This change improves maintainability by using more descriptive variable
names and properly separating concerns between Subscription and
InstallPlan wait operations.

Resolves: https://issues.redhat.com/browse/OSPCIX-1100
Assisted-By: Claude Code/claude-sonnet-4.5
Signed-off-by: Sergii Golovatiuk <sgolovat@redhat.com>
We're moving crawl_n_mask to regexp based logic which is able
to mask any type of file (json, yaml, txt, log).

Regexps search for exact words for avoiding false positives (We don't
want to maske SecretName when searching for Secret keyword. This is
the reason of increase PROTECT_KEYS impacts positive in the performance.

New version also capable to mask two secrets in the same log line.

Avoiding masking Ansible headers such Task and Play.
Added multiprocessing support for parallel file masking.

Added file masking integration tests, with temporary files.

AI-Assisted: This change was developed with assistance from Claude
(Anthropic's AI assistant) for code refactoring, test development,
and PEP 8 compliance."

Signed-off-by: Enrique Vallespi Gil <evallesp@redhat.com>
Implement container image rebuild automation for FDP updates:
- Role fdp_update_container_images: Rebuilds container images with updated packages
  * Jinja2 templates for Dockerfile and repo configuration
- Integration in post-deployment.yml with variable validation
- Zuul CI configuration for automated testing

This role enables updating Fast Data Path components in OpenStack
control plane containers by rebuilding images with updated RPM packages
from a specified repository. Part of the broader FDP update workflow,
focusing specifically on container image management.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Miguel Angel Nieto Jimenez <mnietoji@redhat.com>
fmount and others added 2 commits November 13, 2025 17:04
rgw_frontend_ssl_certificate has been deprecated in ceph8 and in ceph9 doesn't
work properly anymore. There's a new way of setting both cert and key when ssl
is used and is fully documented in [1].
This patch still preserves the old way of deploying rgw through a new var used
to execute the old code. When "rgw_ssl_backward_compatibility" is set,
the old facts are set, resulting in populating the old variables, otherwise
the new method based on ssl_cert and ssl_key is applied.

[1] https://docs.ceph.com/en/latest/cephadm/services/rgw/

Signed-off-by: Francesco Pantano <fpantano@redhat.com>
openshift-metal3/dev-scripts writes an entry for virthost in
/etc/hosts, but the dnsmasq server created by libvirt-manager
does not read /etc/hosts.

Add virthost.<cluster>.<domain> to the utility DNS host record to
support devscripts local registry mirror when mirror_images is enabled.

Signed-off-by: Harald Jensås <hjensas@redhat.com>
mnietoji and others added 16 commits December 12, 2025 13:46
Introduce cifmw_nat64_appliance_image_url parameter to download
pre-built NAT64 images, with optional checksum verification support.

Signed-off-by: Miguel Angel Nieto Jimenez <mnietoji@redhat.com>
The kustomize lookup plugin runs on localhost (control node), but the
values.yaml file was being copied on the remote host. This caused the
generated values to be ignored when kustomize executed.

Changed to use slurp + copy with delegate_to localhost to ensure the
values.yaml file is available on the control node where kustomize runs.

Assisted-By: Claude Code/claude-sonnet-4.5
Signed-off-by: Eduardo Olivares <eolivare@redhat.com>
Added two new IPv6 deployment scenarios:
* OVS-DPDK SR-IOV with 1 nodeset (va-nfv-ovs-dpdk-sriov-ipv6.yml)
* OVS-DPDK SR-IOV with 2 nodesets (dt-nfv-ovs-dpdk-sriov-ipv6-2nodesets.yml)

[ci_gen_kustomize_values] Added IPv6 nodeset templates
  * Added edpm-nodeset-values template with IPv6 configuration
  * Added edpm-nodeset2-values template for dual nodeset deployments
  * Updated common network-values template for IPv6 support

[libvirt_manager] Fixed IPv6 address formatting
  * Updated generate_networking_data.yml to properly handle IPv6 addresses
  * Prevented bracket corruption when IPv6 addresses are embedded in YAML

Signed-off-by: Miguel Angel Nieto Jimenez <mnietoji@redhat.com>
Use correct variables for ipv6_address_mode and ipv6_ra_mode

Signed-off-by: Miguel Angel Nieto Jimenez <mnietoji@redhat.com>
Avoid IP conflicts when dnsmasq pods try to reach dnsmasq service
running on the hypervisor by using a new NAD and LB range, ctlplane_ocp_nad

OSPRH-23100

Signed-off-by: Eduardo Olivares <eolivare@redhat.com>
… generic k8s labels

Some of the operators with older build did not have the generic app.kubernetes.io/name label, thus they need to be identified using openstack.org/operator-name label.

Signed-off-by: Amartya Sinha <amsinha@redhat.com>
- cifmw-client container is no longer used, and it leads to resource wastage for container build and push to quay.io
- Reverts: openstack-k8s-operators#2230

Signed-off-by: Amartya Sinha <amsinha@redhat.com>
With recent passt release a regression is introduced
where outbount TCP requests are no longer working.
Until the fix[1] is available temporary using host network

Similar to openstack-k8s-operators/edpm-ansible#1088

[1] https://issues.redhat.com/browse/RHEL-136313

Related-Issue: #OSPCIX-1146
Signed-off-by: Yatin Karel <ykarel@redhat.com>
Fixing the UUID allows to get predictible value for that field. This is
useful when working with GitOps principles and baremetal on virutal
manchines, since the UUID is used by sushy-emulator to access the right
VM.

Signed-off-by: Cédric Jeanneret <cjeanner@redhat.com>
Apply the same fix as _user_data_change: handle skipped tasks when
cifmw_config_drive_networkconfig is undefined on subsequent runs.

- Update assert to check: _userdata/_netdata is not none, since these
  are set in defaults/main.yml they will never be undefined.
- Update assertion to check: _net_data_change is skipped or is not changed
- Make network-config when condition consistent with user-data (add | length > 0)
- Add soft-clean test coverage to default molecule scenario

This prevents assertion failures when create-infra is run after a soft
clean where the ISO already exists but network-config vars are undefined.

Jira: OSPRH-22377

Assisted-By: Claude Code/claude-4.5-sonnet
Signed-off-by: Harald Jensås <hjensas@redhat.com>
Make the role more generic by allowing it to work outside the
ci-framework context. The `cifmw.general.ci_script` module is now
optional and can be disabled by setting:
  `cifmw_nat64_appliance_use_ci_script: false`

The role defaults to using `ci_script`.

Also refactored the image building logic into a separate
`build_image.yml` task file for better code organization.

Assisted-By: Claude Code/claude-4.5-sonnet
Signed-off-by: Harald Jensås <hjensas@redhat.com>
There is a networking issue affecting the communications between
vexxhost and ibm providers. In order to mitigate the issue, I am
creating some nodesets which will run in vexxhost only so that we can be
sure that provider and edpm or kuttl jobs run in the same infrastructure.

Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
Add retries, delay, and until parameters to the podman_image task to handle temporary network failures when pulling the sushy emulator container image from the registry.

Signed-off-by: Vito Castellano <vcastell@redhat.com>
@mnietoji mnietoji force-pushed the fdp_update_edpm branch 2 times, most recently from 39a340c to 93e41a8 Compare December 22, 2025 10:24
amartyasinha and others added 8 commits December 22, 2025 16:36
As of now, changes in our github workflows does not impact edpm jobs. Not triggering edpm jobs for such change will save resource.

Signed-off-by: Amartya Sinha <amsinha@redhat.com>
The previous default version is very old and should be updated.

Signed-off-by: Daniel Pawlik <dpawlik@redhat.com>
In order to run tobiko after update, a different
cifmw_test_operator_tobiko_name is needed.

OSPRH-23880

Signed-off-by: Eduardo Olivares <eolivare@redhat.com>
Any module in a task should be called through fully-qualified collection names (FQCN) to reduce ambiguity.a
Ref: https://docs.ansible.com/projects/lint/rules/fqcn/

Signed-off-by: Amartya Sinha <amsinha@redhat.com>
There are multiple variables related to openstack namespace in OCP. This commit tries to consolidate them all with a single variable defined in group_vars. At some places, ns var were being read from cifmw_install_yamls_defaults, thus it is being used as the value for cifmw_openstack_namespace var in group_vars along with default value of openstack.

Signed-off-by: Amartya Sinha <amsinha@redhat.com>
krb_request module is used in some DS zuul ci playbooks, and zuul runs restricted version of Ansible, thus FQCN call of module is not getting resolved.

Making custom module with FQCN work there would require to make symlink of the entire cifmw collection there, which does not make sense. Better to have an exception for such usage.

Signed-off-by: Amartya Sinha <amsinha@redhat.com>
- In previous commit [1], job to build cifmw-client container using this role was removed as no job uses containerized cifmw-client.
- This commit continues the cleanup, and remove the build_push_container role.
- This PR is a manual revert of original PR which added this role [2].

[1] openstack-k8s-operators#3563
[2] openstack-k8s-operators#2257

Signed-off-by: Amartya Sinha <amsinha@redhat.com>
So now we're skipping lines that doesn't match the expected format of:
TIMESTAMP | Details
So any log line like:
"  - PLAY [Manage and Provide ironic baremetal nodes]
  stdout: "[WARNING]: Found variable using reserved name: namespace\n\n"
should not be parsed and then not making a runtime Error.

Signed-off-by: Enrique Vallespi Gil <evallesp@redhat.com>
amartyasinha and others added 2 commits January 7, 2026 16:46
The way ceph playbook from ci-framework is used in adoption jobs needs custom modules to be called without FQCN (i.e. through relative path).

Making custom module with FQCN work there would require to make symlink of the entire cifmw collection there, which does not make sense. Better to have an exception for such usage.

Signed-off-by: Amartya Sinha <amsinha@redhat.com>
[fdp_update_edpm] Implement EDPM node update automation for FDP updates:
- Role fdp_update_edpm: Updates EDPM nodes declaratively via Kubernetes CRs
  * Patches OpenStackDataPlaneNodeSet CRs with updated container images
  * Configures package updates via edpm_bootstrap_packages
  * Sets up registry authentication and CA certificates
  * Creates OpenStackDataPlaneDeployment to apply changes
  * Includes hypervisor firewall configuration for registry access
- Fix hypervisor firewall configuration
  * Add delegate_to to execute iptables on correct hypervisor host
  * Previously executed on localhost instead of hypervisor
- Integration in post-deployment.yml after control plane updates
- Zuul CI configuration for automated testing

[fdp_update_container_images] Fix to properly update OpenStackVersion CR
  * Add set_fact task to build customContainerImages dict correctly

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Miguel Angel Nieto Jimenez <mnietoji@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.