Feature branch sync - pub/q2_upgrade to staging#4740
Merged
Conversation
Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> Revert "task failure ansible.cfg update" This reverts commit 7b2a70b. callback plugin update Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> Update omnia_default.py Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> Update omnia_default.py Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> Update omnia_default.py Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> Update omnia_default.py Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> Update omnia_default.py Signed-off-by: Abhishek S A <abhishek.sa3@dell.com>
…ubnet (OMN01D-2534) In multi-subnet deployments, service K8s control plane nodes may reside in an additional_subnet (e.g. 10.40.2.0/24) rather than the primary admin subnet (e.g. 10.40.1.0/24). The VIP for K8s HA must be in the same subnet as the control plane nodes, not the OIM admin NIC subnet. The fix: 1. In validate_service_k8s_cluster_ha(), extract control plane node IPs from PXE mapping (FUNCTIONAL_GROUP_NAME starts with service_kube_control_plane) and determine their subnet by checking the primary admin subnet and additional_subnets. 2. Pass the control plane subnet (kcp_subnet_ip, kcp_subnet_bits) to validate_vip_address(). 3. In validate_vip_address(), validate the VIP against the control plane subnet if provided, otherwise fall back to the primary admin subnet for backward compatibility. Fixes: OMN01D-2534 Signed-off-by: Sujit Jadhav <sujit.jadhav@dell.com>
Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>
Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>
Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com>
Add a wait for kube controller pod to be created and then check for pod running
feat: Add custom callback plugin to suppress duplicate error output in ansible-core 2.20
…3_secret_key Signed-off-by: venu <236371043+Venu-p1@users.noreply.github.com>
Fix/cleanup image
Signed-off-by: Jagadeesh N V <jagadeesh_n_v@dell.com>
…led is true Two issues prevent nid hostname resolution on slurm and login nodes: 1. OIM firewall blocks port 53 (DNS) for external access CoreDNS on the OIM binds to admin_nic_ip:53, but firewalld only opens ports for DHCP/TFTP/HTTP/etc. Nodes querying 10.x.x.x:53 get their packets dropped. From the OIM itself, DNS works because podman interfaces are in the trusted zone (local traffic bypasses the firewall). Fix: Open port 53/tcp and 53/udp in the OIM firewall when dns_enabled is true. 2. NetworkManager overwrites /etc/resolv.conf after cloud-init set-ssh.sh runs nmcli con add/up which triggers NetworkManager to overwrite /etc/resolv.conf with DHCP-provided DNS servers, removing the CoreDNS nameserver entry. Fix: After set-ssh.sh completes, restore /etc/resolv.conf and lock it with chattr +i. Matches existing K8s template protection. Files changed: - prepare_oim/.../openchami/tasks/configs/firewall.yml (port 53) - ci-group-slurm_control_node_x86_64.yaml.j2 - ci-group-slurm_node_x86_64.yaml.j2 - ci-group-slurm_node_aarch64.yaml.j2 - ci-group-login_node_x86_64.yaml.j2 - ci-group-login_node_aarch64.yaml.j2 - ci-group-login_compiler_node_x86_64.yaml.j2 - ci-group-login_compiler_node_aarch64.yaml.j2 Only active when dns_enabled is true (no impact on non-DNS deployments). Signed-off-by: Sujit Jadhav <sujit.jadhav@dell.com>
Signed-off-by: sakshi-singla-1735 <sakshi.s@dell.com>
fix(provision): fix DNS resolution on slurm/login nodes when dns_enabled is true
…plate (OMN01D-2533) (#4729) The cloud-init template has two YAML literal block scalar levels: 1. Outer content: | (base indent 6sp) - strips 6 spaces 2. Inner runcmd - | (base indent 4sp after outer) - strips 4 spaces Total: 10 spaces stripped from template lines. Previous heredoc fix used 12sp indent with spaces embedded in the delimiter string (' PYEOF'). After YAML stripping, the terminator line became ' PYEOF' (2sp) but the shell expected ' PYEOF' (12sp literal) — heredoc never terminated. Fix: Place Python code and PYEOF terminator at 10sp in the template. After both YAML levels strip their indentation, these lines land at column 0 in the shell script. The simple delimiter 'PYEOF' matches the column-0 terminator exactly. Python receives column-0 code with correct relative indentation for with/if/else blocks. All lines >= 10sp > 6sp, so the outer YAML content: | block stays intact (lines at < 6sp would prematurely terminate it). Signed-off-by: Sujit Jadhav <sujit.jadhav@dell.com>
…ction (OMN01D-2532) (#4724) In multi-subnet deployments, service K8s control plane nodes may reside in an additional_subnet (e.g. 10.40.2.0/24) while the OIM admin NIC is in the primary subnet (e.g. 10.40.1.0/24). Calico's IP_AUTODETECTION_METHOD was hardcoded to admin_nic_cidr (the OIM subnet), causing Calico to fail IP auto-detection on nodes in different subnets with: 'Unable to auto-detect an IPv4 address using interface cidr [10.40.1.0/24]: no valid IPv4 addresses found' The fix: 1. In create_k8s_config_nfs.yml, read the PXE mapping to find the first service_kube_control_plane node's ADMIN_IP and determine which subnet (primary or additional) it belongs to. Set calico_cidr to that subnet's CIDR. 2. Update the cloud-init template to use calico_cidr instead of admin_nic_cidr for Calico's IP_AUTODETECTION_METHOD. The upgrade path is intentionally left unchanged (uses admin_nic_cidr) since multi-subnet is a fresh deployment feature and changing the upgrade flow could impact existing deployments. Fixes: OMN01D-2532 Signed-off-by: Sujit Jadhav <sujit.jadhav@dell.com>
fix(validation): validate HA VIP against service_kube_control_plane subnet
Signed-off-by: Kratika_Patidar <Kratika.Patidar@dell.com>
Signed-off-by: Kratika_Patidar <Kratika.Patidar@dell.com>
defct fix for input valdition and pxe mapping check
Push software_config..json from artifacts during deploy
csi version change from 2.16 to 2.17
* Update container tag for vulnerability Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update requirements.txt Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * tag update Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Buildstream upgrade validation Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update upgrade.yml Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update main.yml Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update catalog_rhel.json Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update requirements.txt Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update provision_config.yml Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update provision_config.j2 Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> --------- Signed-off-by: Abhishek S A <abhishek.sa3@dell.com>
…ralized Python L2 validation (#4735) * add cloud init Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update container tag for vulnerability Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * additional cloud init group Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update validate_additional_cloud_init.yml Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * logic update Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update requirements.txt Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update provision config Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update validate_additional_cloud_init_section.yml Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * cloud init update Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * tag update Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * minimal os group update Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Buildstream upgrade validation Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update upgrade.yml Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * moving packages as prohibited Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update main.yml Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update catalog_rhel.json Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update requirements.txt Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update provision_config.yml Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> * Update provision_config.j2 Signed-off-by: Abhishek S A <abhishek.sa3@dell.com> --------- Signed-off-by: Abhishek S A <abhishek.sa3@dell.com>
Set PXE boot replace lc check moduel with POST call
…s during rollback (#4738) * fix rollback Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * fix(rollback): remove 'skipped' from build_stream_terminal condition Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> * Rollback conditions for slurm and k8s Signed-off-by: Jagadeesh N V <jagadeesh_n_v@dell.com> * Update rollback.yml Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com> * Lint fixes Signed-off-by: Jagadeesh N V <jagadeesh_n_v@dell.com> --------- Signed-off-by: Katakam-Rakesh <katakam.rakesh@dell.com> Signed-off-by: Jagadeesh N V <jagadeesh_n_v@dell.com> Signed-off-by: Katakam Rakesh Naga Sai <125246792+Katakam-Rakesh@users.noreply.github.com> Co-authored-by: Jagadeesh N V <jagadeesh_n_v@dell.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Feature branch sync - pub/q2_upgrade to staging