Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
bc94810
modify templates for doca ofed
VrindaMarwah Dec 31, 2025
2766c97
doca ofed installation changes for k8s
VrindaMarwah Dec 31, 2025
a71819d
add ansible builtin
VrindaMarwah Jan 2, 2026
07dbf6b
Merge pull request #3826 from VrindaMarwah/pub/ib_support
jagadeeshnv Jan 5, 2026
30d5135
Update ansible-lint.yml
VrindaMarwah Jan 5, 2026
b20eb56
Update pylint.yml
VrindaMarwah Jan 5, 2026
46a8a92
Merge pull request #3827 from VrindaMarwah/pub/ib_support
jagadeeshnv Jan 5, 2026
8f7dec7
Update image-build to use docker.io/dellhpcomniaaisolution/image-buil…
balajikumaran-c-s Jan 6, 2026
94e7e5e
Remove rpmdb rebuild commands from base_image_commands
balajikumaran-c-s Jan 6, 2026
f120e16
Add retry logic for image pull with pull_image_retries and pull_image…
balajikumaran-c-s Jan 6, 2026
942f0ce
Merge pull request #3829 from balajikumaran-c-s/pub/ib_support
abhishek-sa1 Jan 6, 2026
50c87bd
doca changes to build image
VrindaMarwah Jan 7, 2026
4261525
slurm user uid set to 6001
jagadeeshnv Jan 7, 2026
9d97814
Merge pull request #3834 from jagadeeshnv/pub/ib_support
snarthan Jan 8, 2026
27974c5
add static ip for ib interface
VrindaMarwah Jan 8, 2026
c07acc6
Merge branch 'dell:pub/ib_support' into pub/ib_support
VrindaMarwah Jan 8, 2026
40f36b9
Update openchami_image_cmd.yml
VrindaMarwah Jan 8, 2026
5096861
Update slurm_custom.json
VrindaMarwah Jan 8, 2026
43827ef
Update slurm_custom.json
VrindaMarwah Jan 8, 2026
6df6515
Update service_k8s.json
VrindaMarwah Jan 8, 2026
6630ec7
Update local_repo_config.yml
VrindaMarwah Jan 8, 2026
f03b32c
remove unused vars main.yml
Jan 10, 2026
322ccd0
Updated image tag in main.yml
Jan 10, 2026
0407906
Update image tag in default_packages.json
Jan 10, 2026
6f17b12
Merge pull request #3838 from balajikumaran-c-s/pub/ib_support
abhishek-sa1 Jan 10, 2026
951a5e2
Merge branch 'dell:pub/ib_support' into pub/ib_support
VrindaMarwah Jan 10, 2026
9974216
add package mounts for doca installation
VrindaMarwah Jan 11, 2026
2a86f1c
updating comments in network_spec
VrindaMarwah Jan 11, 2026
3abb36c
passwordless_ssh changes
sakshi-singla-1735 Jan 12, 2026
216a06c
ansible lint fixes
sakshi-singla-1735 Jan 12, 2026
cd729f5
input validation for ib network
sakshi-singla-1735 Jan 12, 2026
d38cf10
Merge pull request #3841 from VrindaMarwah/pub/ib_support
snarthan Jan 12, 2026
2cca244
Merge branch 'pub/ib_support' into pub/input_validation_ib
sakshi-singla-1735 Jan 12, 2026
a12179e
Merge pull request #3844 from sakshi-singla-1735/pub/input_validation_ib
snarthan Jan 12, 2026
f015e98
removing duplicate code
sakshi-singla-1735 Jan 13, 2026
e0b1fe5
Merge branch 'pub/v2.1_rc1' into pub/passwordlessssh
sakshi-singla-1735 Jan 13, 2026
e770d86
variablize filenames
sakshi-singla-1735 Jan 13, 2026
d3ac541
Merge branch 'pub/passwordlessssh' of github.com:sakshi-singla-1735/o…
sakshi-singla-1735 Jan 13, 2026
a700dd3
Merge pull request #3843 from sakshi-singla-1735/pub/passwordlessssh
snarthan Jan 13, 2026
078997e
extract cuda in nfs
Nagachandan-P Jan 14, 2026
ddc00f8
making path changes
sakshi-singla-1735 Jan 14, 2026
64d4b28
Update ci-group-login_compiler_node_aarch64.yaml.j2
Nagachandan-P Jan 14, 2026
34aea37
Update ci-group-login_compiler_node_x86_64.yaml.j2
Nagachandan-P Jan 14, 2026
53290e6
Merge pull request #3857 from Nagachandan-P/pub/v2.1_rc1
jagadeeshnv Jan 14, 2026
e3dc75a
adding the repo for apptainer
sakshi-singla-1735 Jan 14, 2026
66661de
add set pipefail to doca-ofed script
VrindaMarwah Jan 14, 2026
6670061
Update ansible-lint.yml
VrindaMarwah Jan 14, 2026
05c1146
Update pylint.yml
VrindaMarwah Jan 14, 2026
72e5971
Merge pull request #3858 from VrindaMarwah/pub/v2.1_rc1
snarthan Jan 14, 2026
a7c3a62
Merge pull request #3856 from sakshi-singla-1735/pub/passwordlessssh
jagadeeshnv Jan 14, 2026
be91349
variablize the cuda version
Nagachandan-P Jan 16, 2026
63106ba
Merge branch 'pub/v2.1_rc1' of https://github.com/Nagachandan-P/omnia…
Nagachandan-P Jan 16, 2026
eeda08f
dynamic extraction of cuda version
Nagachandan-P Jan 19, 2026
e392595
lint issue fixed
Nagachandan-P Jan 19, 2026
7851138
Merge pull request #3862 from Nagachandan-P/pub/v2.1_rc1
snarthan Jan 19, 2026
83a5625
file path change
sakshi-singla-1735 Jan 20, 2026
503a295
Update image-builder version to 1.1
Jan 20, 2026
2d74de0
Update image-builder version to 1.1 in default_packages.json
Jan 20, 2026
fad0025
Merge pull request #3875 from balajikumaran-c-s/pub/v2.1_rc1
abhishek-sa1 Jan 20, 2026
3b770c0
Merge branch 'pub/v2.1_rc1' into main
balajikumaran-c-s Jan 21, 2026
072d557
Merge pull request #3878 from balajikumaran-c-s/main
abhishek-sa1 Jan 21, 2026
5dd6678
Merge pull request #3873 from sakshi-singla-1735/origin/pub/ssh
snarthan Jan 21, 2026
f5f4f57
Update configure-ib-network for fixing race condition
Katakam-Rakesh Jan 21, 2026
719da55
Merge pull request #3879 from Katakam-Rakesh/pub/v2.1_rc1
snarthan Jan 21, 2026
7640fa7
Added powervault input
jagadeeshnv Jan 22, 2026
112681f
added powervault packages
balajikumaran-c-s Jan 22, 2026
903157f
Merge branch 'dell:pub/v2.1_rc1' into pub/v2.1_rc1
Jan 22, 2026
ed551a7
Update storage_config.yml
jagadeeshnv Jan 22, 2026
0c28ab6
Commented powervault details
Jan 22, 2026
38a7bc2
powervault cloud-init changes
balajikumaran-c-s Jan 22, 2026
e892dc8
Merge pull request #3882 from balajikumaran-c-s/pub/v2.1_rc1
jagadeeshnv Jan 22, 2026
f8c35e2
Real memory value from iDRAC
Nagachandan-P Jan 27, 2026
7627894
Merge pull request #3889 from Nagachandan-P/pub/v2.1_rc1
snarthan Jan 27, 2026
9016116
Fix to Update the repository and distribution name based on the arch
pullan1 Jan 30, 2026
332c9e1
Merge pull request #3895 from pullan1/pub/v2.1_rc1
snarthan Jan 30, 2026
3a0a6e8
pulp respository name update
pullan1 Jan 30, 2026
9ce8e76
Merge pull request #3897 from pullan1/pub/v2.1_rc1
jagadeeshnv Jan 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/ansible-lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ on:
- pub/ochami
- pub/ochami_aarch64
- pub/k8s_telemetry
- pub/ib_support
- pub/v2.1_rc1

jobs:
build:
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/pylint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ on:
- pub/ochami
- pub/ochami_aarch64
- pub/k8s_telemetry
- pub/ib_support
- pub/v2.1_rc1

jobs:
build:
Expand Down
2 changes: 1 addition & 1 deletion build_image_aarch64/roles/prepare_arm_node/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@

- name: Build full Podman image path
ansible.builtin.set_fact:
pulp_aarch_image: "{{ hostvars['localhost']['oim_pxe_ip'] }}:2225/dellhpcomniaaisolution/image-build-aarch64:1.0"
pulp_aarch_image: "{{ hostvars['localhost']['oim_pxe_ip'] }}:2225/dellhpcomniaaisolution/image-build-aarch64:1.1"

- name: Pull aarch64 image using Podman
ansible.builtin.command:
Expand Down
15 changes: 5 additions & 10 deletions build_image_x86_64/roles/image_creation/tasks/build_image_tag.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,21 +13,16 @@
# limitations under the License.
---

- name: Pull specific OpenCHAMI image by version tag
- name: Pull image-build image
ansible.builtin.command:
cmd: "podman pull {{ openchami_image_sha }}"
cmd: "podman pull {{ image_build_el10 }}"
register: pull_result
retries: "{{ pull_image_retries }}"
delay: "{{ pull_image_delay }}"
until: pull_result.rc == 0
changed_when: "'Image is up to date' not in pull_result.stdout"

- name: Fail if image not pulled successfully
ansible.builtin.fail:
msg: "{{ pull_result.stdout }}"
when: pull_result.rc != 0

- name: Tagging OpenCHAMI image with stable name
ansible.builtin.command:
cmd: "{{ ochami_stable_image_tag }}"
args:
creates: "{{ ochami_stable_image_path }}"
register: tag_result
changed_when: "'Tagged' in tag_result.stdout"
10 changes: 4 additions & 6 deletions build_image_x86_64/roles/image_creation/vars/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@
# See the License for the specific language governing permissions and
# limitations under the License.
---
openchami_image_sha: "ghcr.io/openchami/image-build@sha256:52dd9d546951ce4f2f6f9febd08a228cfcb5b9e8e204ca4f5ee232f6be65d3a4"
image_build_el10: "docker.io/dellhpcomniaaisolution/image-build-el10:1.0"
pull_image_retries: "3"
pull_image_delay: "10"
input_project_dir: "{{ hostvars['localhost']['input_project_dir'] }}"
omnia_metadata_file: "/opt/omnia/.data/oim_metadata.yml"
dir_permissions_644: "0644"
Expand All @@ -33,7 +35,7 @@ ochami_compute_mounts:

ochami_x86_64_image:
- --entrypoint /bin/bash
- ghcr.io/openchami/image-build:stable
- docker.io/dellhpcomniaaisolution/image-build-el10:1.0
ochami_base_command:
- -c 'update-ca-trust extract && image-build --config /home/builder/config.yaml --log-level DEBUG'

Expand All @@ -52,7 +54,3 @@ compute_image_failure_msg: |
# build_compute_image.yml
openchami_compute_image_vars_template: "{{ role_path }}/templates/compute_images_templates.j2"
openchami_compute_image_vars_path: "/opt/omnia/openchami/compute_images_template.yaml"

# build_image_tag.yml
ochami_stable_image_tag: "podman tag {{ openchami_image_sha }} ghcr.io/openchami/image-build:stable"
ochami_stable_image_path: "/var/lib/containers/storage/overlay-images/{{ openchami_image_sha }}"
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,12 @@ def json_file_mandatory(file_path):
"Please ensure the CSV file has the required headers."
)
NETWORK_SPEC_FILE_NOT_FOUND_MSG = "network_spec.yml file not found in input folder."
IB_NETMASK_BITS_MISMATCH_MSG = (
"netmask_bits configured for ib_network must match admin_network netmask_bits in network_spec.yml."
)
IB_SUBNET_IN_ADMIN_RANGE_MSG = (
"ib_network subnet must be outside the admin network range derived from primary_oim_admin_ip/netmask_bits in network_spec.yml."
)

# telemetry
MANDATORY_FIELD_FAIL_MSG = "must not be empty"
Expand Down Expand Up @@ -427,3 +433,4 @@ def get_logic_failed(input_file_path):
def get_logic_success(input_file_path):
"""Returns a formatted message indicating logic validation success for a file."""
return f"{'#' * 10} Logic validation successful for {input_file_path} {'#' * 10}"

Original file line number Diff line number Diff line change
Expand Up @@ -100,9 +100,35 @@
}
},
"additionalProperties": false
},
{
"type": "object",
"required": ["ib_network"],
"properties": {
"ib_network": {
"type": "object",
"required": [
"subnet",
"netmask_bits"
],
"properties": {
"subnet": {
"type": "string",
"pattern": "^(?:(?:25[0-5]|2[0-4][0-9]|1?[0-9]{1,2})\\.){3}(?:25[0-5]|2[0-4][0-9]|1?[0-9]{1,2})$"
},
"netmask_bits": {
"type": "string",
"pattern": "^(1[0-9]|2[0-9]|[1-9])$|^3[0-2]$"
}
},
"additionalProperties": false
}
},
"additionalProperties": false
}
]
}
}
}
}

Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,36 @@
]
},
"minItems": 1
},
"powervault_config": {
"required": ["ip", "isci_initiators", "volume_id"],
"properties": {
"ip": {
"description": "List of target controller IP addresses",
"type": "array",
"minItems": 1,
"items": {
"type": "string",
"format": "ipv4"
},
"uniqueItems": true
},

"port": {
"description": "TCP port for iSCSI (default 3260)",
"type": "integer"
},

"isci_initiators": {
"description": "iSCSI initiator IQN",
"type": "string"
},

"volume_id": {
"description": "Volume identifier (hex string)",
"type": "string"
}
}
}
},
"required": [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import itertools
import csv
import yaml
import ipaddress
from ansible.module_utils.input_validation.common_utils import validation_utils
from ansible.module_utils.input_validation.common_utils import config
from ansible.module_utils.input_validation.common_utils import en_us_validation_msg
Expand Down Expand Up @@ -744,6 +745,54 @@ def validate_network_spec(
)
return errors

# Extract admin and IB parameters for cross-validation
admin_netmask_bits = None
admin_primary_ip = None
ib_netmask_bits = None
ib_subnet = None
ib_present = False

for network in data["Networks"]:
if "admin_network" in network and isinstance(network["admin_network"], dict):
admin_net = network["admin_network"]
admin_netmask_bits = admin_net.get("netmask_bits", admin_netmask_bits)
admin_primary_ip = admin_net.get("primary_oim_admin_ip", admin_primary_ip)

if "ib_network" in network and isinstance(network["ib_network"], dict):
ib_net = network["ib_network"]
# Consider IB network present only when config is non-empty
if ib_net:
ib_present = True
ib_netmask_bits = ib_net.get("netmask_bits", ib_netmask_bits)
ib_subnet = ib_net.get("subnet", ib_subnet)

# If IB network is configured and both netmask bits are available, they must match
if ib_present and ib_netmask_bits and admin_netmask_bits and ib_netmask_bits != admin_netmask_bits:
errors.append(
create_error_msg(
"ib_network.netmask_bits",
ib_netmask_bits,
en_us_validation_msg.IB_NETMASK_BITS_MISMATCH_MSG,
)
)

# If IB subnet and admin primary IP are available, ensure IB subnet is not in admin range
if ib_present and ib_subnet and admin_primary_ip and admin_netmask_bits:
try:
admin_network = ipaddress.IPv4Network(f"{admin_primary_ip}/{admin_netmask_bits}", strict=False)
ib_ip = ipaddress.IPv4Address(ib_subnet)
if ib_ip in admin_network:
errors.append(
create_error_msg(
"ib_network.subnet",
ib_subnet,
en_us_validation_msg.IB_SUBNET_IN_ADMIN_RANGE_MSG,
)
)
except ValueError:
# If IPs/netmask are invalid, rely on existing validations to report issues
pass

for network in data["Networks"]:
errors.extend(_validate_admin_network(network))

Expand Down Expand Up @@ -941,3 +990,4 @@ def _validate_ip_ranges(dynamic_range, network_type, netmask_bits):
)

return errors

14 changes: 7 additions & 7 deletions common/library/module_utils/local_repo/download_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -477,7 +477,7 @@ def process_manifest(file,repo_store_path, status_file_path, cluster_os_type, cl
manifest_directory = os.path.join(repo_store_path, "offline_repo", "cluster",arc.lower(), cluster_os_type, cluster_os_version, "manifest", package_name)
# # Determine the manifest file path
file_path = os.path.join(manifest_directory, f"{package_name}.yaml")
repository_name = "manifest" + package_name
repository_name = arc.lower() + "_manifest" + package_name
output_file = package_name + ".yml"
relative_path = output_file
base_path = manifest_directory.strip("/")
Expand Down Expand Up @@ -531,7 +531,7 @@ def process_git(file,repo_store_path, status_file_path, cluster_os_type, cluster
clone_directory = os.path.join(git_modules_directory, package_name)
clone_directory = shlex.quote(clone_directory).strip("'\"")
tarball_path = os.path.join(git_modules_directory, f'{package_name}.tar.gz')
repository_name = "git" + package_name
repository_name = arc.lower() + "_git" + package_name
output_file = package_name + ".tar.gz"
relative_path = output_file
base_path = git_modules_directory.strip("/")
Expand Down Expand Up @@ -600,7 +600,7 @@ def process_shell(file,repo_store_path, status_file_path, cluster_os_type, clus
os.makedirs(sh_directory, exist_ok=True) # Ensure the directory exists

sh_path = os.path.join(sh_directory, f"{package_name}.sh")
repository_name = "shell" + package_name
repository_name = arc.lower() + "_shell" + package_name
output_file = package_name + ".sh"
relative_path = output_file
base_path = sh_directory.strip("/")
Expand Down Expand Up @@ -651,7 +651,7 @@ def process_ansible_galaxy_collection(file, repo_store_path, status_file_path, c
galaxy_collections_directory = shlex.quote(galaxy_collections_directory).strip("'\"")
os.makedirs(galaxy_collections_directory, exist_ok=True) # Ensure the directory exists
collections_tarball_path = os.path.join(galaxy_collections_directory, f'{package_name.replace(".", "-")}-{version}.tar.gz')
repository_name = "ansible_galaxy_collection" + package_name
repository_name = arc.lower() + "_ansible_galaxy_collection" + package_name
output_file = f"{file['package'].replace('.', '-')}-{file['version']}.tar.gz"
relative_path = output_file
base_path = galaxy_collections_directory.strip("/")
Expand Down Expand Up @@ -758,7 +758,7 @@ def process_tarball(package, repo_store_path, status_file_path, version_variable
tarball_path = os.path.join(tarball_directory, f"{package_name}.tar.gz")
tarball_path = shlex.quote(tarball_path).strip("'\"")

repository_name = "tarball" + package_name
repository_name = arc.lower() + "_tarball" + package_name
output_file = package_name + ".tar.gz"
relative_path = output_file
base_path = tarball_directory.strip("/")
Expand Down Expand Up @@ -844,7 +844,7 @@ def process_iso(package, repo_store_path, status_file_path,
url_support = True
package_name = package['package']
package_type = package['type']
repository_name = "iso" + package_name + arc
repository_name = arc.lower() + "_iso" + package_name

distribution_name = repository_name
if 'url' in package:
Expand Down Expand Up @@ -941,7 +941,7 @@ def process_pip(package, repo_store_path, status_file_path, cluster_os_type, cl
package_name = shlex.quote(package['package']).strip("'\"")
package_type = package['type']
version = package.get('version', None)
pip_repo = "pip_module" + package_name
pip_repo = arc.lower() + "_pip_module" + package_name
distribution_name = pip_repo

logger.info(f"Processing Pip Package: {package_name}, Version: {version}")
Expand Down
2 changes: 0 additions & 2 deletions common/vars/openchami_image_cmd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@ rhel_aarch64_base_image_name: "rhel-aarch64_base"
base_image_commands:
- "dracut --add 'dmsquash-live livenet network-manager' --install '/usr/lib/systemd/systemd-sysroot-fstab-check' --kver $(basename /lib/modules/*) -N -f --logfile /tmp/dracut.log 2>/dev/null" # noqa: yaml[line-length]
- "echo DRACUT LOG:; cat /tmp/dracut.log"
- "rm -f /var/lib/rpm/__db*"
- "rpmdb --rebuilddb"

# x86_64 compute commands
default_x86_64_compute_commands:
Expand Down
17 changes: 17 additions & 0 deletions discovery/discovery.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,18 @@
name: discovery_validations
tasks_from: validate_oim_timezone.yml

- name: Build cluster host lists from PXE mapping
hosts: localhost
connection: local
roles:
- passwordless_ssh

- name: Configure OIM SSH from cluster host lists
hosts: oim
connection: ssh
roles:
- passwordless_ssh

- name: Validate discovery parameters
hosts: oim
connection: ssh
Expand Down Expand Up @@ -102,6 +114,11 @@
ansible.builtin.include_role:
name: configure_ochami
tasks_from: discover_mapping_nodes.yml

- name: Read nodes.yaml and derive Omnia node facts
ansible.builtin.include_role:
name: passwordless_ssh
tasks_from: read_nodes_yaml.yml
roles:
- nfs_client
- k8s_config
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,12 @@
register: read_ssh_key
no_log: true

- name: Read the ssh private key
ansible.builtin.command: cat {{ ssh_private_key_path }}
changed_when: false
register: read_ssh_private_key
no_log: true

- name: Hash the password
ansible.builtin.command: openssl passwd -6 "{{ hostvars['localhost']['provision_password'] }}"
changed_when: false
Expand Down
Loading