Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
cdc944e
Merge pull request #4605 from abhishek-sa1/pub/q2_ansible
abhishek-sa1 May 30, 2026
ef32c83
Merge pull request #4609 from dell/pub/q2_upgrade
abhishek-sa1 May 30, 2026
48dc462
Merge pull request #4616 from dell/pub/q2_upgrade
abhishek-sa1 Jun 1, 2026
923c43b
Add catalog generator documentation.
Venu-p1 Jun 1, 2026
39bae76
Merge pull request #4626 from dell/pub/q2_upgrade
abhishek-sa1 Jun 1, 2026
25f8503
Create setup_doca_mpi_env.sh.j2 for environment setup of mpi
Nagachandan-P Jun 2, 2026
a2c8e97
Update ci-group-login_compiler_node_aarch64.yaml.j2
Nagachandan-P Jun 2, 2026
03abe15
Update ci-group-login_compiler_node_x86_64.yaml.j2
Nagachandan-P Jun 2, 2026
b168549
Update ci-group-slurm_node_aarch64.yaml.j2
Nagachandan-P Jun 2, 2026
3922561
Update ci-group-slurm_node_x86_64.yaml.j2
Nagachandan-P Jun 2, 2026
0b64c58
Merge branch 'dell:pub/q2_upgrade' into pub/q2_upgrade
Nagachandan-P Jun 2, 2026
50eef19
Update ci-group-login_compiler_node_aarch64.yaml.j2
Nagachandan-P Jun 2, 2026
4116243
Update main.yml (#4637)
Kratika-P Jun 2, 2026
1ee2201
Merge pull request #4636 from Nagachandan-P/pub/q2_upgrade
snarthan Jun 2, 2026
62a8753
Better logging visibility (#4639)
Rajeshkumar-s2 Jun 2, 2026
353fc35
rollback and upgrade formattign and skipped issues fixed
jagadeeshnv Jun 2, 2026
f06610c
Merge branch 'dell:pub/q2_upgrade' into pub/q2_upgrade
jagadeeshnv Jun 2, 2026
429b87b
Telemetry upgrade check updated based in service_k8s present (#4638)
Kratika-P Jun 2, 2026
6753c25
build image selinux fix for aarch64 and x86_64 (#4640)
abhishek-sa1 Jun 2, 2026
ab127a5
Merge pull request #4642 from jagadeeshnv/pub/q2_upgrade
snarthan Jun 2, 2026
d283752
Fix for bss and cloudinit update for service_kube_control_plane_x86_6…
Katakam-Rakesh Jun 2, 2026
7e300f3
Merge branch 'q2_ansible_sync' into q2_upgrade_sync
abhishek-sa1 Jun 2, 2026
8f29320
Merge pull request #72 from abhishek-sa1/q2_upgrade_sync
abhishek-sa1 Jun 2, 2026
990220d
ansible 2.20 fixes
abhishek-sa1 Jun 2, 2026
f9e0d8c
lint update
abhishek-sa1 Jun 2, 2026
d9c3474
Update OME Discovery completion message for BuildStream workflow
balajikumaran-c-s Jun 2, 2026
aa030ca
Merge pull request #4647 from balajikumaran-c-s/pub/q2_upgrade
abhishek-sa1 Jun 2, 2026
0047f42
Added skip logic for rollback_slurm
jagadeeshnv Jun 2, 2026
182486e
Update install_dcgm.sh.j2
sakshi-singla-1735 Jun 3, 2026
9202134
upgrade defect fixes (#4641)
priti-parate Jun 3, 2026
a465e6f
Update install_dcgm.sh.j2
sakshi-singla-1735 Jun 3, 2026
663da9e
Merge pull request #4650 from sakshi-singla-1735/pub/q2_upgrade
sakshi-singla-1735 Jun 3, 2026
d38fe7c
Merge pull request #4646 from abhishek-sa1/q2_ansible_sync
abhishek-sa1 Jun 3, 2026
b661faf
Delegate manifest to localhost
jagadeeshnv Jun 3, 2026
44444e4
Merge pull request #4649 from mithileshreddy04/q2_upgrade_rollback_fixes
mithileshreddy04 Jun 3, 2026
0d55c35
Merge pull request #4648 from jagadeeshnv/pub/q2_upgrade
jagadeeshnv Jun 3, 2026
5082514
Merge pull request #4654 from dell/pub/q2_upgrade
abhishek-sa1 Jun 3, 2026
bbafdc7
oim cleanup issue fixed for scratch dir
Nagachandan-P Jun 3, 2026
08d4554
lint issue fixed
Nagachandan-P Jun 3, 2026
bea4595
Validation check for pxe mapping file and cluster initialised check a…
Katakam-Rakesh Jun 3, 2026
89d4490
Update slurm_custom.json
Nagachandan-P Jun 3, 2026
62e2eb1
Update slurm_custom.json
Nagachandan-P Jun 3, 2026
7245299
Merge pull request #4656 from Nagachandan-P/pub/q2_upgrade
snarthan Jun 3, 2026
3523678
Upgrade backup moved from upgrade to provision
jagadeeshnv Jun 3, 2026
6cde878
Update cleaned images via OIM cleanup to 'CLEANED' status (#4657)
Rajeshkumar-s2 Jun 3, 2026
0b20c0c
ib ip port matching enhanced logic
Nagachandan-P Jun 3, 2026
ebe7517
slurm duplicate dir fix
jagadeeshnv Jun 3, 2026
a38f3c6
Merge branch 'dell:pub/q2_upgrade' into pub/q2_upgrade
jagadeeshnv Jun 3, 2026
bd60099
Update software_utils.py
abhishek-sa1 Jun 3, 2026
50e45c6
Upgrade and rollback defect fixes (#4661)
mithileshreddy04 Jun 3, 2026
d3a7f85
Slurm backup during upgrade_provision rather than upgrade_slurm & omn…
jagadeeshnv Jun 3, 2026
e0c2d97
Merge pull request #4662 from abhishek-sa1/pub/q2_ansible
snarthan Jun 3, 2026
a1df32a
Fix for issue oim_metadata file reading
jagadeeshnv Jun 3, 2026
5dfbe95
Merge branch 'dell:pub/q2_upgrade' into pub/q2_upgrade
jagadeeshnv Jun 3, 2026
9dea7ce
Fix for issue oim_metadata read error (#4664)
jagadeeshnv Jun 3, 2026
568a9ca
Replace the hardcoded user for Postgres user (#4665)
Rajeshkumar-s2 Jun 3, 2026
7978784
telemetry mysql condition update
abhishek-sa1 Jun 3, 2026
77cf805
telemetry mysql condition update (#4666)
abhishek-sa1 Jun 3, 2026
ae2bf7c
Slurm added chcek for non idle nodes and timeout of reboot increased to
jagadeeshnv Jun 3, 2026
be80852
Merge branch 'dell:pub/q2_upgrade' into pub/q2_upgrade
jagadeeshnv Jun 3, 2026
960274b
Output format imroved for non oidel nodes message
jagadeeshnv Jun 3, 2026
04a8b01
Merge pull request #4667 from jagadeeshnv/pub/q2_upgrade
jagadeeshnv Jun 4, 2026
9f9ebeb
Merge pull request #4669 from mithileshreddy04/q2_upgrade_new_fixes
mithileshreddy04 Jun 4, 2026
4b9070d
Restore SELinux policy fix task for aarch64 (#4668)
balajikumaran-c-s Jun 4, 2026
38ecac9
Merge pull request #4658 from dell/pub/q2_ansible
abhishek-sa1 Jun 4, 2026
40f2605
Merge pull request #4621 from Venu-p1/dev/catalog-gen-readme
abhishek-sa1 Jun 4, 2026
51f8e71
Update configure-ib-network.sh.j2
Nagachandan-P Jun 4, 2026
c22280f
Merge pull request #4659 from Nagachandan-P/pub/q2_upgrade
snarthan Jun 4, 2026
47b6149
Fix for upgarde is failing in localrepo when omnia_config_credential.…
pullan1 Jun 4, 2026
bff9ad2
Fixing ansible 2.20 issues in upgrade and rollback flow (#4673)
abhishek-sa1 Jun 4, 2026
7f4e422
Merge pull request #4674 from pullan1/pub/q2_upgrade
snarthan Jun 4, 2026
8b5bf18
Simplify ID format to use package name only (version stripped) instea…
Venu-p1 Jun 4, 2026
4b3405b
Fix the incorrect pipeline status in Gitlab (#4676)
Rajeshkumar-s2 Jun 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/ansible-lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ on:
- pub/q2_dev
- pub/telemetry
- pub/q2_upgrade
- pub/q2_ansible

jobs:
build:
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/pylint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ on:
- pub/q2_dev
- pub/telemetry
- pub/q2_upgrade
- pub/q2_ansible

jobs:
build:
Expand Down
1 change: 1 addition & 0 deletions build_image_aarch64/ansible.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ host_key_checking = false
forks = 5
timeout = 180
executable = /bin/bash
interpreter_python = /usr/bin/python3
library = ../common/library/modules
module_utils = ../common/library/module_utils

Expand Down
12 changes: 11 additions & 1 deletion build_image_aarch64/build_image_aarch64.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
- name: Set dynamic run tags including 'build_aarch_image'
when: not config_file_status | default(false) | bool
ansible.builtin.set_fact:
omnia_run_tags: "{{ (ansible_run_tags | default([]) + ['build_aarch_image']) | unique }}"
omnia_run_tags: "{{ (ansible_run_tags | default([]) | list + ['build_aarch_image']) | unique }}"
cacheable: true

- name: Invoke validate_config.yml to perform L1 and L2 validations with build_image tag
Expand Down Expand Up @@ -137,6 +137,16 @@
roles:
- prepare_arm_node

- name: Pre-flight SELinux policy fix on aarch64 node
hosts: admin_aarch64
connection: ssh
gather_facts: false
tasks:
- name: Install SELinux policy module for container runtime
ansible.builtin.include_role:
name: image_creation
tasks_from: preflight_selinux_check.yml

- name: Fetch packages for aarch64
hosts: localhost
connection: local
Expand Down
15 changes: 15 additions & 0 deletions build_image_aarch64/roles/image_creation/files/omnia-crun-bpf.te
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
module omnia-crun-bpf 1.0;

require {
type init_t;
type container_runtime_t;
class bpf prog_run;
}

#============= init_t ==============
# Fix: container-selinux policy regression on RHEL 10.2 (kernel 6.12+, crun 1.27+).
# systemd (init_t) needs prog_run on container_runtime_t bpf programs to install
# eBPF device filters on container cgroups. Without this, Podman containers fail:
# "crun: systemd failed to install eBPF device filter on cgroup ..."
# Retire this module once an updated container-selinux ships the fix.
allow init_t container_runtime_t:bpf prog_run;
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,17 @@
{{ '-e AWS_REQUEST_CHECKSUM_CALCULATION=when_required
-e AWS_RESPONSE_CHECKSUM_VALIDATION=when_required'
if s3_configurations.provider == 'powerscale' else '' }}

- name: Verify Podman can run containers
ansible.builtin.command:
cmd: podman run --rm localhost/{{ aarch64_local_tag }} echo ok
register: _podman_verify
changed_when: false
failed_when: false
delegate_to: "{{ aarch64_build_host }}"
connection: ssh

- name: Fail if Podman container runtime is broken
ansible.builtin.fail:
msg: "{{ podman_verify_fail_msg }} Error: {{ _podman_verify.stderr | default('unknown') }}"
when: _podman_verify.rc != 0
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Copyright 2026 Dell Inc. or its subsidiaries. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
---

# Pre-flight: Install minimal SELinux policy module to fix container-selinux
# regression on RHEL 10.2 (crun 1.27 + kernel 6.12). SELinux stays enforcing.
# Retire this when an updated container-selinux ships the bpf prog_run rule.

- name: Install omnia-crun-bpf SELinux policy module
block:
- name: Ensure SELinux policy build tools are present
ansible.builtin.package:
name:
- policycoreutils
- checkpolicy
state: present

- name: Check if omnia-crun-bpf SELinux module is loaded
ansible.builtin.command: semodule -lfull
register: _selinux_modules
changed_when: false

- name: Create SELinux custom policy directory
ansible.builtin.file:
path: "{{ selinux_policy_dir }}"
state: directory
mode: "0755"
when: selinux_module_name not in _selinux_modules.stdout

- name: Copy omnia-crun-bpf policy source
ansible.builtin.copy:
src: omnia-crun-bpf.te
dest: "{{ selinux_policy_dir }}/omnia-crun-bpf.te"
mode: "0600"
when: selinux_module_name not in _selinux_modules.stdout

- name: Compile SELinux policy module
ansible.builtin.command:
cmd: >-
checkmodule -M -m
-o {{ selinux_policy_dir }}/omnia-crun-bpf.mod
{{ selinux_policy_dir }}/omnia-crun-bpf.te
changed_when: true
when: selinux_module_name not in _selinux_modules.stdout

- name: Package SELinux policy module
ansible.builtin.command:
cmd: >-
semodule_package
-o {{ selinux_policy_dir }}/omnia-crun-bpf.pp
-m {{ selinux_policy_dir }}/omnia-crun-bpf.mod
changed_when: true
when: selinux_module_name not in _selinux_modules.stdout

- name: Load SELinux policy module
ansible.builtin.command:
cmd: semodule -X 300 -i {{ selinux_policy_dir }}/omnia-crun-bpf.pp
changed_when: true
when: selinux_module_name not in _selinux_modules.stdout

rescue:
- name: Warn that SELinux policy module installation failed
ansible.builtin.debug:
msg: "{{ selinux_install_warn_msg }}"
13 changes: 13 additions & 0 deletions build_image_aarch64/roles/image_creation/vars/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,16 @@ compute_image_failure_msg: |
openchami_compute_image_config_template: "{{ role_path }}/templates/images/rhel-compute-config.yaml.j2"
storage_config_file_path: "{{ input_project_dir }}/storage_config.yml"
storage_config_syntax_fail_msg: "Failed to load storage_config.yml due to syntax error"

# preflight_selinux_check.yml
selinux_module_name: "omnia-crun-bpf"
selinux_policy_dir: "/etc/selinux/targeted/custom"
podman_verify_fail_msg: >-
Podman cannot start containers on this node.
Ensure the node was rebooted after the kernel update and that the
omnia-crun-bpf SELinux module loaded successfully
(semodule -lfull | grep omnia-crun-bpf).
selinux_install_warn_msg: >-
WARNING: Failed to install omnia-crun-bpf SELinux policy module.
Build will continue but may fail if the eBPF device filter issue is present.
Check SELinux policy tools and audit log on this node.
1 change: 1 addition & 0 deletions build_image_x86_64/ansible.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ host_key_checking = false
forks = 5
timeout = 180
executable = /bin/bash
interpreter_python = /usr/bin/python3
library = ../common/library/modules
module_utils = ../common/library/module_utils

Expand Down
12 changes: 11 additions & 1 deletion build_image_x86_64/build_image_x86_64.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
- name: Set dynamic run tags including 'build_image'
when: not config_file_status | default(false) | bool
ansible.builtin.set_fact:
omnia_run_tags: "{{ (ansible_run_tags | default([]) + ['build_image']) | unique }}"
omnia_run_tags: "{{ (ansible_run_tags | default([]) | list + ['build_image']) | unique }}"
cacheable: true

- name: Invoke validate_config.yml to perform L1 and L2 validations with build_image tag
Expand Down Expand Up @@ -97,6 +97,16 @@
oim_group: true
tags: always

- name: Pre-flight SELinux policy fix on OIM node
hosts: oim
connection: ssh
gather_facts: false
tasks:
- name: Install SELinux policy module for container runtime
ansible.builtin.include_role:
name: image_creation
tasks_from: preflight_selinux_check.yml

- name: Configure auth for OpenCHAMI
hosts: oim
connection: ssh
Expand Down
15 changes: 15 additions & 0 deletions build_image_x86_64/roles/image_creation/files/omnia-crun-bpf.te
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
module omnia-crun-bpf 1.0;

require {
type init_t;
type container_runtime_t;
class bpf prog_run;
}

#============= init_t ==============
# Fix: container-selinux policy regression on RHEL 10.2 (kernel 6.12+, crun 1.27+).
# systemd (init_t) needs prog_run on container_runtime_t bpf programs to install
# eBPF device filters on container cgroups. Without this, Podman containers fail:
# "crun: systemd failed to install eBPF device filter on cgroup ..."
# Retire this module once an updated container-selinux ships the fix.
allow init_t container_runtime_t:bpf prog_run;
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,15 @@
{{ '-e AWS_REQUEST_CHECKSUM_CALCULATION=when_required
-e AWS_RESPONSE_CHECKSUM_VALIDATION=when_required'
if s3_configurations.provider == 'powerscale' else '' }}

- name: Verify Podman can run containers
ansible.builtin.command:
cmd: podman run --rm localhost/{{ x86_64_local_tag }} echo ok
register: _podman_verify
changed_when: false
failed_when: false

- name: Fail if Podman container runtime is broken
ansible.builtin.fail:
msg: "{{ podman_verify_fail_msg }} Error: {{ _podman_verify.stderr | default('unknown') }}"
when: _podman_verify.rc != 0
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Copyright 2026 Dell Inc. or its subsidiaries. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
---

# Pre-flight: Install minimal SELinux policy module to fix container-selinux
# regression on RHEL 10.2 (crun 1.27 + kernel 6.12). SELinux stays enforcing.
# Retire this when an updated container-selinux ships the bpf prog_run rule.

- name: Install omnia-crun-bpf SELinux policy module
block:
- name: Ensure SELinux policy build tools are present
ansible.builtin.package:
name:
- policycoreutils
- checkpolicy
state: present

- name: Check if omnia-crun-bpf SELinux module is loaded
ansible.builtin.command: semodule -lfull
register: _selinux_modules
changed_when: false

- name: Create SELinux custom policy directory
ansible.builtin.file:
path: "{{ selinux_policy_dir }}"
state: directory
mode: "0755"
when: selinux_module_name not in _selinux_modules.stdout

- name: Copy omnia-crun-bpf policy source
ansible.builtin.copy:
src: omnia-crun-bpf.te
dest: "{{ selinux_policy_dir }}/omnia-crun-bpf.te"
mode: "0600"
when: selinux_module_name not in _selinux_modules.stdout

- name: Compile SELinux policy module
ansible.builtin.command:
cmd: >-
checkmodule -M -m
-o {{ selinux_policy_dir }}/omnia-crun-bpf.mod
{{ selinux_policy_dir }}/omnia-crun-bpf.te
changed_when: true
when: selinux_module_name not in _selinux_modules.stdout

- name: Package SELinux policy module
ansible.builtin.command:
cmd: >-
semodule_package
-o {{ selinux_policy_dir }}/omnia-crun-bpf.pp
-m {{ selinux_policy_dir }}/omnia-crun-bpf.mod
changed_when: true
when: selinux_module_name not in _selinux_modules.stdout

- name: Load SELinux policy module
ansible.builtin.command:
cmd: semodule -X 300 -i {{ selinux_policy_dir }}/omnia-crun-bpf.pp
changed_when: true
when: selinux_module_name not in _selinux_modules.stdout

rescue:
- name: Warn that SELinux policy module installation failed
ansible.builtin.debug:
msg: "{{ selinux_install_warn_msg }}"
13 changes: 13 additions & 0 deletions build_image_x86_64/roles/image_creation/vars/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,16 @@ network_spec: "{{ input_project_dir }}/network_spec.yml"
network_spec_syntax_fail_msg: "Failed to load network_spec.yml due to syntax error"
storage_config_file_path: "{{ input_project_dir }}/storage_config.yml"
storage_config_syntax_fail_msg: "Failed to load storage_config.yml due to syntax error"

# preflight_selinux_check.yml
selinux_module_name: "omnia-crun-bpf"
selinux_policy_dir: "/etc/selinux/targeted/custom"
podman_verify_fail_msg: >-
Podman cannot start containers on this node.
Ensure the node was rebooted after the kernel update and that the
omnia-crun-bpf SELinux module loaded successfully
(semodule -lfull | grep omnia-crun-bpf).
selinux_install_warn_msg: >-
WARNING: Failed to install omnia-crun-bpf SELinux policy module.
Build will continue but may fail if the eBPF device filter issue is present.
Check SELinux policy tools and audit log on this node.
Loading