Revert "OCPBUGS-86774: Pin azure-cli to 2.72.0 in e2e Dockerfile"#8652
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@bryan-cox: This pull request references Jira Issue OCPBUGS-86774, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-aws |
WalkthroughThis PR removes the version pin from the Azure CLI package in the E2E test Docker image, allowing the system package manager to install the latest available version of ChangesE2E Docker Image Updates
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 15✅ Passed checks (15 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@bryan-cox: This pull request references Jira Issue OCPBUGS-86774, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bryan-cox The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8652 +/- ##
=======================================
Coverage 41.26% 41.26%
=======================================
Files 755 755
Lines 93443 93443
=======================================
Hits 38563 38563
Misses 52148 52148
Partials 2732 2732
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
Dockerfile.e2e (1)
1-34: Container security guidelines: Multiple pre-existing violations in E2E Dockerfile.While not part of this change, this Dockerfile has several deviations from the container security guidelines that should be addressed in a future PR:
- No USER directive: Container runs as root (guideline: "USER non-root; never run as root")
- No HEALTHCHECK: No health check defined for the container
- COPY entire context (line 6): Copies all files instead of specific ones in builder stage
- Build tools in final image: Contains Go compiler (line 12) - acknowledged as necessary for tests but still a deviation
While these may be acceptable for an E2E test image, consider whether security hardening is needed even for test environments, especially if the image has access to cloud credentials during testing.
As per coding guidelines: "USER non-root; never run as root", "HEALTHCHECK defined", "COPY specific files, not entire context", "Multi-stage builds; no build tools in final image".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Dockerfile.e2e` around lines 1 - 34, The Dockerfile currently copies the entire build context (COPY . .), runs as root (no USER directive), lacks a HEALTHCHECK, and leaves build tooling in the final image (the builder stage runs RUN make ... and the final stage still uses a golang base and installs packages via RUN rpm/.../dnf). Update the builder stage to COPY only required sources and artifacts instead of COPY . ., keep compilation in the builder (identify the RUN make e2e hypershift ... step and the created binaries like /hypershift/bin/test-e2e and /hypershift/bin/hypershift), ensure the final image does not contain unnecessary build tools (move all build-time dependencies to the builder stage or use a slimmer runtime base if possible), add a non-root user and switch to it with a USER directive after creating/setting permissions for /hypershift, and add an appropriate HEALTHCHECK that exercises /hypershift/bin/test-e2e or another lightweight runtime check; adjust the azure-cli installation step (RUN rpm ... dnf install -y azure-cli) to preserve least privilege and only run as root for installation before switching to the non-root USER.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Dockerfile.e2e`:
- Line 33: The Dockerfile.e2e currently installs azure-cli without pinning and
removed the i686 exclusion, and also lacks container hardening (no USER, no
HEALTHCHECK) and uses COPY . . in the builder and reuses the builder as the
final image; update the dnf install line to pin azure-cli to the known-good
version (azure-cli-2.72.0) and restore the exclusion flag (--exclude='*.i686')
to avoid E4S/EUS python3.12 and i686 resolution failures, add a non-root USER
declaration and a HEALTHCHECK per our guidelines, and change the builder stage
to use explicit COPY of only needed artifacts (avoid COPY . .) and ensure the
final stage does not reuse the builder image but instead copies artifacts from
the builder stage into a minimal runtime image.
---
Nitpick comments:
In `@Dockerfile.e2e`:
- Around line 1-34: The Dockerfile currently copies the entire build context
(COPY . .), runs as root (no USER directive), lacks a HEALTHCHECK, and leaves
build tooling in the final image (the builder stage runs RUN make ... and the
final stage still uses a golang base and installs packages via RUN rpm/.../dnf).
Update the builder stage to COPY only required sources and artifacts instead of
COPY . ., keep compilation in the builder (identify the RUN make e2e hypershift
... step and the created binaries like /hypershift/bin/test-e2e and
/hypershift/bin/hypershift), ensure the final image does not contain unnecessary
build tools (move all build-time dependencies to the builder stage or use a
slimmer runtime base if possible), add a non-root user and switch to it with a
USER directive after creating/setting permissions for /hypershift, and add an
appropriate HEALTHCHECK that exercises /hypershift/bin/test-e2e or another
lightweight runtime check; adjust the azure-cli installation step (RUN rpm ...
dnf install -y azure-cli) to preserve least privilege and only run as root for
installation before switching to the non-root USER.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 246f7454-87c3-49f8-972d-d5c967b09bf5
📒 Files selected for processing (1)
Dockerfile.e2e
| dnf install -y https://packages.microsoft.com/config/rhel/9/packages-microsoft-prod.rpm && \ | ||
| mv /etc/yum.repos.d/microsoft-prod.repo /etc/yum.repos.art/ci/ && \ | ||
| dnf install -y azure-cli-2.72.0 && \ | ||
| dnf install -y azure-cli && \ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check available azure-cli versions in the Microsoft repository
dnf list --showduplicates azure-cli 2>/dev/null | grep azure-cli | tail -10Repository: openshift/hypershift
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Search for context about why 2.72.0 was originally pinned
gh pr view 8638 --json title,body,commentsRepository: openshift/hypershift
Length of output: 43580
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show Dockerfile.e2e around the cited line 33
if [ -f Dockerfile.e2e ]; then
nl -ba Dockerfile.e2e | sed -n '20,50p'
else
echo "Missing Dockerfile.e2e at repo root"
fi
# Also capture any azure-cli dnf install lines in the Dockerfile.e2e
if [ -f Dockerfile.e2e ]; then
rg -n "dnf .*install .*azure-cli" Dockerfile.e2e || true
rg -n "azure-cli" Dockerfile.e2e || true
fiRepository: openshift/hypershift
Length of output: 107
🏁 Script executed:
#!/bin/bash
set -euo pipefail
ls -la
# Locate and display Dockerfile.e2e (with line numbers) around line 33
if [ -f Dockerfile.e2e ]; then
echo "== Dockerfile.e2e exists at ./Dockerfile.e2e =="
awk 'NR>=20 && NR<=50 {printf "%d:%s\n", NR, $0}' Dockerfile.e2e
else
echo "== Dockerfile.e2e not found at repo root; searching by name =="
fd -t f -i "Dockerfile.e2e" .
exit 0
fi
echo "== azure-cli install lines =="
rg -n "dnf .*install .*azure-cli|azure-cli-[0-9]|azure-cli" Dockerfile.e2e || trueRepository: openshift/hypershift
Length of output: 6394
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== dockerfile.e2e azure-cli installs and i686 excludes =="
rg -n "dnf .*install .*azure-cli|azure-cli-|azure-cli|--exclude='\\*\\.i686'|i686" Dockerfile.e2e || trueRepository: openshift/hypershift
Length of output: 159
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Top of Dockerfile.e2e =="
awk 'NR>=1 && NR<=80 {printf "%d:%s\n", NR, $0}' Dockerfile.e2e
echo
echo "== USER / HEALTHCHECK =="
rg -n "^(USER|HEALTHCHECK)\b" Dockerfile.e2e || true
echo
echo "== COPY patterns (quick scan) =="
rg -n "\bCOPY\b" Dockerfile.e2e || true
echo
echo "== dnf/yum install usage =="
rg -n "\b(dnf|yum)\s+(install|update)" Dockerfile.e2e || trueRepository: openshift/hypershift
Length of output: 3246
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== azure-cli dnf exclude i686 =="
rg -n "exclude=.*i686|i686" Dockerfile.e2e || true
echo
echo "== USER / HEALTHCHECK directives =="
rg -n "^(USER|HEALTHCHECK)\b" Dockerfile.e2e || trueRepository: openshift/hypershift
Length of output: 134
Pin azure-cli (and restore the i686 exclusion) in Dockerfile.e2e to avoid reintroducing known e2e image build failures.
Removing the pin at dnf install -y azure-cli reintroduces the same class of issue fixed by PR #8638:
- E4S/EUS incompatibility: PR
#8638pinnedazure-cli-2.72.0becauseazure-cli >= 2.73.0requirespython3.12, which isn’t available in the E4S/EUS repos; unpinned installs can select a newer (uninstallable) version. - dnf resolution failures from i686 metadata: PR
#8638added--exclude='*.i686'to avoid unsatisfied i686 dependencies; this Dockerfile.e2e no longer excludes i686.
Additionally, this Dockerfile.e2e currently violates the provided container hardening guidelines: no USER, no HEALTHCHECK, and builder uses COPY . . (and the final stage reuses the builder image).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Dockerfile.e2e` at line 33, The Dockerfile.e2e currently installs azure-cli
without pinning and removed the i686 exclusion, and also lacks container
hardening (no USER, no HEALTHCHECK) and uses COPY . . in the builder and reuses
the builder as the final image; update the dnf install line to pin azure-cli to
the known-good version (azure-cli-2.72.0) and restore the exclusion flag
(--exclude='*.i686') to avoid E4S/EUS python3.12 and i686 resolution failures,
add a non-root USER declaration and a HEALTHCHECK per our guidelines, and change
the builder stage to use explicit COPY of only needed artifacts (avoid COPY . .)
and ensure the final stage does not reuse the builder image but instead copies
artifacts from the builder stage into a minimal runtime image.
Test Resultse2e-aws
e2e-aks
|
|
/area ci-tooling |
|
/pipeline required |
|
Scheduling tests matching the |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
|
/retest |
|
/test e2e-aws |
1 similar comment
|
/test e2e-aws |
|
Now I have all the evidence needed. The failure is clear — this is a CI infrastructure scheduling issue, not a test/code failure. Let me produce the final report. Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThis is a CI infrastructure failure, not a code or test failure. The job failed during the release image import phase — before any test code was executed — because two pods ( Root CauseThe CI cluster
The net result: 0 eligible nodes at every scheduling attempt over the entire hour. The pods The PR's code changes (reverting the azure-cli pin in the e2e Dockerfile) are completely uninvolved; the job never reached the point of executing any test step. Recommendations
Evidence
|
|
/lgtm |
|
Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage. |
|
/verified by e2e |
|
@bryan-cox: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@bryan-cox: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@bryan-cox: Jira Issue Verification Checks: Jira Issue OCPBUGS-86774 Jira Issue OCPBUGS-86774 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Reverts #8638
Summary by CodeRabbit