Skip to content

NO-JIRA: feat(skills): add validate-pr-override-images skill#8616

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
enxebre:enxebre/validate-pr-images-skill
Jun 3, 2026
Merged

NO-JIRA: feat(skills): add validate-pr-override-images skill#8616
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
enxebre:enxebre/validate-pr-images-skill

Conversation

@enxebre
Copy link
Copy Markdown
Member

@enxebre enxebre commented May 28, 2026

Summary

  • Add a Claude Code skill and GitHub Action to validate that CPO override images in a PR actually contain the fix PRs they claim to include
  • validate-overrides.sh — orchestrates the full validation: parses a structured contract from the PR description, extracts override images per branch from the diff, and validates each image contains the claimed PRs using git merge-base --is-ancestor
  • verify-pr-in-image.sh — inspects a container image via skopeo, reads its vcs-ref label, and checks whether a given PR merge commit is an ancestor of the image's build commit
  • SKILL.md — Claude Code skill wrapper that invokes validate-overrides.sh
  • GitHub Action workflow triggers on PRs modifying overrides.yaml and runs the validation automatically

Depends on #8627 to add skopeo and gh CLI to the ARC runner image.

PR description contract

PR authors modifying CPO overrides must include lines in the PR description like:

branch: 4.19 wants: https://github.com/openshift/hypershift/pull/1234
branch: 4.20 wants: https://github.com/openshift/hypershift/pull/1234, https://github.com/openshift/hypershift/pull/5678

The workflow fails if no such lines are found.

Usage

/validate-pr-override-images <PR-URL-or-number>

Or run the script directly:

.claude/skills/validate-pr-override-images/validate-overrides.sh <pr-number>

Test plan

  • verify-pr-in-image.sh returns PASS for a known-good image+PR combination
  • verify-pr-in-image.sh returns FAIL (exit 1) for a PR not in the image
  • validate-overrides.sh 8616 parses contract from this PR description and validates images
  • Skill appears in Claude Code available skills list
  • GitHub Action workflow passes with correct contract
  • GitHub Action workflow fails with incorrect contract

🤖 Generated with Claude Code

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@enxebre enxebre changed the title feat(skills): add validate-pr-override-images skill NO-JIRA: feat(skills): add validate-pr-override-images skill May 28, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 28, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@enxebre: This pull request explicitly references no jira issue.

Details

In response to this:

Summary

  • Add a Claude Code skill to validate that CPO override images in a PR actually contain the fix PRs they claim to include
  • Includes verify-pr-in-image.sh — a standalone script that pulls an image, reads the vcs-ref label, and checks git history for a given PR merge commit
  • Includes SKILL.md that orchestrates the full validation: extracting images and claimed PRs from a PR diff/description, then running the script for each combination

Usage

/validate-pr-override-images <PR-URL-or-number>

Or run the script directly:

.claude/skills/validate-pr-override-images/verify-pr-in-image.sh <image> <pr-number> [repo-path]

Tested with

Validated against #8610 — confirmed the 4.21 override image (control-plane-operator-4-21@sha256:1b3f1bf7...) contains PR #8565 (cherry-pick of #8552).

Test plan

  • verify-pr-in-image.sh returns PASS for a known-good image+PR combination
  • verify-pr-in-image.sh returns FAIL (exit 1) for a PR not in the image
  • Skill appears in Claude Code available skills list

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Claude skill doc, two shell scripts, a GitHub Actions workflow, and updated override entries. The validator parses a PR body for branch->PR mappings, extracts added cpoImage digests from the PR diff, deduplicates images by digest, and calls verify-pr-in-image.sh for each image/PR. The verifier reads an image's vcs-ref via skopeo, ensures the commit exists (fetching if needed), searches git history for merge/cherry-pick markers for the PR, and returns PASS/FAIL. CI runs the validator on PRs that modify overrides.yaml and fails the job if any verification fails.

Sequence Diagram(s)

sequenceDiagram
  participant GitHubActions
  participant Runner
  participant Validator as validate-overrides.sh
  participant GH as gh
  participant Verifier as verify-pr-in-image.sh
  participant Skopeo as skopeo
  participant Registry as ContainerRegistry
  participant Repo as GitRepo

  GitHubActions->>Runner: trigger on PR modifying overrides.yaml
  Runner->>Validator: run validate-overrides.sh --pr <N> (with GH_TOKEN)
  Validator->>GH: gh pr view / gh pr diff (read body + diff)
  Validator->>Verifier: verify-pr-in-image.sh <image> <pr>
  Verifier->>Skopeo: skopeo inspect <image> (read vcs-ref)
  Skopeo->>Registry: fetch image metadata
  Verifier->>Repo: git cat-file -e <sha> (fetch --all if missing)
  Verifier->>Repo: git log --oneline (search for PR markers)
  Verifier->>Validator: PASS / FAIL
  Validator->>Runner: aggregate results, exit code
Loading

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name Status Explanation Resolution
Stable And Deterministic Test Names ❌ Error The test file adds a dynamically generated test name using fmt.Sprintf("test-%d", i+1) which violates the requirement for stable, descriptive test names. Replace t.Run(fmt.Sprintf("test-%d", i+1), ...) with descriptive static names like t.Run(test.platform+" platform version "+test.version, ...) to clearly indicate what each test case validates.
✅ Passed checks (10 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the primary change: adding a new Claude Code skill for validating CPO override images.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Test Structure And Quality ✅ Passed This PR adds standard Go unit tests (overrides_test.go using testing.T + Gomega), not Ginkgo tests. The custom check for Ginkgo test code is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds validation scripts and override configuration only; no deployment manifests, controllers, or scheduling constraints (affinity, topology spread, nodeSelector, PDB, replicas) introduced.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. The changes comprise shell scripts, documentation, workflow YAML, and a manifest update—not Go test files.
No-Weak-Crypto ✅ Passed No weak crypto, custom implementations, or non-constant-time secret comparisons found. Uses standard tools (git, skopeo) for validation; SHA256 references are image digests.
Container-Privileges ✅ Passed PR contains no privileged: true, hostPID, hostNetwork, hostIPC, SYS_ADMIN, or allowPrivilegeEscalation settings in any files or GitHub Actions workflow configuration.
No-Sensitive-Data-In-Logs ✅ Passed No sensitive data exposed in logs. Scripts extract only PR numbers, branches, and digests from API responses; full JSON outputs never echoed.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the area/ai Indicates the PR includes changes related to AI - Claude agents, Cursor rules, etc. label May 28, 2026
@openshift-ci openshift-ci Bot requested review from muraee and sdminonne May 28, 2026 07:53
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 28, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 28, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude/skills/validate-pr-override-images/SKILL.md:
- Around line 10-12: The markdown in
.claude/skills/validate-pr-override-images/SKILL.md uses unlabeled fenced code
blocks (for the example command "/validate-pr-override-images
<PR-URL-or-number>" and the example sections "Validate a PR by number" and
"Validate a PR by URL" including their outputs) which triggers MD040; update
each triple-backtick fence around those examples to include a language
identifier (use "text" for these CLI output blocks) so the fences read ```text
instead of ``` to satisfy linting. Ensure you change every occurrence around the
command and its example outputs (the blocks containing
"/validate-pr-override-images 8610", the example output lines starting with
"Image:" and "Overall: PASS", and the "/validate-pr-override-images
https://github.com/openshift/hypershift/pull/8610" block) so all unlabeled
fences are labeled consistently.

In @.claude/skills/validate-pr-override-images/verify-pr-in-image.sh:
- Around line 8-10: Add argument-count validation at the top of the script
before reading positional params: check $# and if fewer than 2 print a concise
usage message mentioning the script name and expected args (IMAGE PR [REPO]) and
exit non-zero; then safely assign IMAGE="$1", PR="$2", REPO="${3:-.}". This
prevents unbound-variable errors under set -u and gives a clear error when IMAGE
or PR are missing.
- Around line 15-20: COMMIT is set from podman inspect and Podman returns the
literal string "<no value>" when the label is missing, so update the validation
after the COMMIT assignment in verify-pr-in-image.sh to treat both empty string
and the literal "<no value>" as missing; change the condition that currently
checks [[ -z "$COMMIT" ]] to also check for "$COMMIT" == "<no value>" (or use a
pattern match) and exit with the error message if either case is true so later
logic does not attempt to use "<no value>" as a commit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 88b5c93f-05e2-4694-b912-363bc695c51f

📥 Commits

Reviewing files that changed from the base of the PR and between 0aa20fc and 8d28c06.

📒 Files selected for processing (2)
  • .claude/skills/validate-pr-override-images/SKILL.md
  • .claude/skills/validate-pr-override-images/verify-pr-in-image.sh

Comment thread .claude/skills/validate-pr-override-images/SKILL.md Outdated
Comment thread .claude/skills/validate-pr-override-images/verify-pr-in-image.sh
Comment thread .claude/skills/validate-pr-override-images/verify-pr-in-image.sh Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 41.27%. Comparing base (c135e0c) to head (921e551).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8616   +/-   ##
=======================================
  Coverage   41.27%   41.27%           
=======================================
  Files         755      755           
  Lines       93446    93446           
=======================================
  Hits        38566    38566           
  Misses      52148    52148           
  Partials     2732     2732           
Flag Coverage Δ
cmd-support 34.86% <ø> (ø)
cpo-hostedcontrolplane 43.50% <ø> (ø)
cpo-other 42.79% <ø> (ø)
hypershift-operator 51.00% <ø> (ø)
other 31.64% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@enxebre enxebre force-pushed the enxebre/validate-pr-images-skill branch from 8a55aeb to 9b3a02e Compare May 28, 2026 08:04
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.claude/skills/validate-pr-override-images/verify-pr-in-image.sh (1)

34-34: Be aware of performance on repos with deep history.

The git log --oneline "$COMMIT" command searches the entire commit history from $COMMIT back to the root. On repositories with thousands of commits, this can be slow. However, limiting the search (e.g., with --max-count) risks missing legitimate PR inclusions, so the current approach is the safest for correctness.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/skills/validate-pr-override-images/verify-pr-in-image.sh at line 34,
The git log call that sets FOUND (FOUND=$(git -C "$REPO" log --oneline "$COMMIT"
...)) can be very slow on repos with deep history; change it to a two-step
approach: first search a bounded recent window (e.g., use git -C "$REPO" log
--oneline --max-count="${PR_SEARCH_LIMIT:-10000}" "$COMMIT") and if not found
fall back to the full search only when necessary. Update the assignment to FOUND
(and any related logic) in verify-pr-in-image.sh to implement the configurable
--max-count fast path with a fallback to the original command so correctness is
preserved but typical runs are much faster.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude/skills/validate-pr-override-images/verify-pr-in-image.sh:
- Around line 29-32: After fetching, re-check the commit existence using the
same git command to ensure the commit was found; if git -C "$REPO" cat-file -e
"$COMMIT" still fails, print a clear error like "Commit $COMMIT not found in any
remote for repo $REPO" and exit with non-zero status. Update the block that
currently uses git -C "$REPO" cat-file -e "$COMMIT" (and the subsequent git -C
"$REPO" fetch --all --quiet) to perform this post-fetch verification and handle
the failure path explicitly.

---

Nitpick comments:
In @.claude/skills/validate-pr-override-images/verify-pr-in-image.sh:
- Line 34: The git log call that sets FOUND (FOUND=$(git -C "$REPO" log
--oneline "$COMMIT" ...)) can be very slow on repos with deep history; change it
to a two-step approach: first search a bounded recent window (e.g., use git -C
"$REPO" log --oneline --max-count="${PR_SEARCH_LIMIT:-10000}" "$COMMIT") and if
not found fall back to the full search only when necessary. Update the
assignment to FOUND (and any related logic) in verify-pr-in-image.sh to
implement the configurable --max-count fast path with a fallback to the original
command so correctness is preserved but typical runs are much faster.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 9b0a8b55-2c5a-4b6e-83c4-a3869f85f320

📥 Commits

Reviewing files that changed from the base of the PR and between 8a55aeb and 9b3a02e.

📒 Files selected for processing (2)
  • .claude/skills/validate-pr-override-images/SKILL.md
  • .claude/skills/validate-pr-override-images/verify-pr-in-image.sh
✅ Files skipped from review due to trivial changes (1)
  • .claude/skills/validate-pr-override-images/SKILL.md

Comment thread .claude/skills/validate-pr-override-images/verify-pr-in-image.sh
@jparrill
Copy link
Copy Markdown
Contributor

jparrill commented May 28, 2026

Hey @enxebre, thanks for putting this together — the idea of validating override images is solid and I've been thinking about this gap too.

My concern is that as a Claude skill this relies on someone remembering to run it. If we forget, we're back to square one. Also the script has some fragilities: it needs podman running locally, depends on merge commit message format (squash merges or different cherry-pick conventions would break it), and git fetch --all can be slow.

What if instead of a manual skill we make this a CI check that runs automatically on every PR touching overrides.yaml? We already have the pattern for this:

  1. Unit tests in overrides_test.go — validate structural invariants with zero network: no duplicate versions per platform, valid image ref format (@sha256: digest), sane testing config (at most one platform with runTests: true). These catch copy-paste errors at make test time.
  2. A verify tool + GHA workflow — following the verify-api-deps / cpo-container-sync pattern, add a verify-cpo-overrides target that checks image existence via registry API (HEAD on the manifest, no full pull needed). Wire it into a workflow triggered on paths: ['hypershift-operator/controlplaneoperator-overrides/**'].

This way every override PR gets validated automatically — no manual step, no podman dependency, and it catches the most common mistakes (wrong digest, image doesn't exist, duplicate versions).

WDYT?

@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented May 28, 2026

target that checks image existence via registry API (HEAD on the manifest, no full pull needed). Wire it into a workflow triggered on paths: ['hypershift-operator/controlplaneoperator-overrides/**']

These are meant to be building blocks for CI automation.
The value of the script is validating that intended git backports ARE within an container image
The value of the skill is judgement to figure what the intended backports for each image override based on a PR desc.

You could possibly do the second step deterministically as part of the script without llm, agreeing on a contractual consistent PR desc for overrides changes.

@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented May 28, 2026

You could possibly do the second step deterministically as part of the script without llm, agreeing on a contractual consistent PR desc for overrides changes.

I just pushed an absolutely untested commit so you can see the idea
05a351c

if doing that deterministically results too convoluted, we can still run the skill from GH actions, it's just a building block

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 28, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/validate-cpo-overrides.yaml (1)

14-27: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Add SAST/SCA coverage in this workflow path.

This pipeline introduces security-sensitive parsing/validation logic but has no SAST/SCA step (or reusable security workflow invocation).

As per coding guidelines, ".github/workflows/**/*: ... SAST/SCA steps in pipeline".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/validate-cpo-overrides.yaml around lines 14 - 27, The
workflow job validate-cpo-overrides lacks SAST/SCA scanning; add a security scan
step or call the organization's reusable SAST/SCA workflow in the
validate-cpo-overrides job so the validation script
(.claude/skills/validate-pr-override-images/validate-overrides.sh) runs under
security analysis; update the job to include either a reusable workflow
invocation (e.g., call the repo's standard SAST/SCA workflow) or insert the
approved scanner step(s) (SCA dependency scan and SAST/static-analysis) before
or after the existing run step, ensuring GH_TOKEN or required secrets are set in
env and the step uses the same runs-on and timeout settings.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude/skills/validate-pr-override-images/validate-overrides.sh:
- Around line 39-55: The script increments FOUND_LINES and records BRANCH_PRS
even when the extracted pr_numbers is empty, allowing a silent pass; modify the
logic in the block that processes wants (and the similar block at the other
location) to validate that at least one PR ID was extracted before incrementing
FOUND_LINES and assigning BRANCH_PRS["$branch"], and if no IDs are found either
emit an error and exit non-zero or skip/continue so the script fails fast;
reference the variables and arrays pr_numbers, FOUND_LINES, BRANCH_PRS, wants
and the branch handling code to locate and update the conditional flow.
- Around line 77-93: The loop only captures version when the diff line is an
added line, so cpoImage changes that appear under an unchanged or context
version line get missed; modify the version-detection branch inside the
while-read loop in validate-overrides.sh to match version: lines regardless of
the leading diff symbol (e.g., accept lines starting with +, -, or space) and
set current_version from those matches (trim as needed), so subsequent cpoImage
matches (the image extraction branch and BRANCH_IMAGES population) will
correctly associate images with the last-seen version; keep using the same
variables BRANCH_IMAGES and current_version so the rest of the logic remains
unchanged.

---

Outside diff comments:
In @.github/workflows/validate-cpo-overrides.yaml:
- Around line 14-27: The workflow job validate-cpo-overrides lacks SAST/SCA
scanning; add a security scan step or call the organization's reusable SAST/SCA
workflow in the validate-cpo-overrides job so the validation script
(.claude/skills/validate-pr-override-images/validate-overrides.sh) runs under
security analysis; update the job to include either a reusable workflow
invocation (e.g., call the repo's standard SAST/SCA workflow) or insert the
approved scanner step(s) (SCA dependency scan and SAST/static-analysis) before
or after the existing run step, ensuring GH_TOKEN or required secrets are set in
env and the step uses the same runs-on and timeout settings.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: ecfa5447-2ecb-422b-8d02-f7a853e2a9c5

📥 Commits

Reviewing files that changed from the base of the PR and between 9b3a02e and 05a351c.

📒 Files selected for processing (3)
  • .claude/skills/validate-pr-override-images/validate-overrides.sh
  • .claude/skills/validate-pr-override-images/verify-pr-in-image.sh
  • .github/workflows/validate-cpo-overrides.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • .claude/skills/validate-pr-override-images/verify-pr-in-image.sh

Comment thread .claude/skills/validate-pr-override-images/validate-overrides.sh
Comment thread .claude/skills/validate-pr-override-images/validate-overrides.sh
5. Reporting a pass/fail summary per image and per PR

Prerequisites:
- `podman` must be running (`podman machine start` if needed)
Copy link
Copy Markdown
Collaborator

@celebdor celebdor May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it also be pointed to a valid pull secret? some overrides can be to actually released newer Z streams, for which skopeo would need the pull secret

Copy link
Copy Markdown
Contributor

@jparrill jparrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped some more comments. Thanks!

```

The script:
- Pulls the image via `podman`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested skopeo command in local and it not perform a pull of the image (podman pull docker://xxxxx) to grab the details, so this needs to be adjusted.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the advantage of using skopeo is that it does NOT make the pull

5. Reporting a pass/fail summary per image and per PR

Prerequisites:
- `podman` must be running (`podman machine start` if needed)
Copy link
Copy Markdown
Contributor

@jparrill jparrill May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that this makes sense in darwin environments, but in a regular Linux env this is not necessary, only the service running if applies.

@openshift-ci openshift-ci Bot added the area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release label May 28, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@enxebre enxebre force-pushed the enxebre/validate-pr-images-skill branch from 6d472cc to da93754 Compare May 28, 2026 15:10
@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 28, 2026
@enxebre enxebre force-pushed the enxebre/validate-pr-images-skill branch 6 times, most recently from 07ec0e4 to d9efdad Compare May 28, 2026 15:46
@enxebre enxebre force-pushed the enxebre/validate-pr-images-skill branch from d9efdad to 93e2f60 Compare June 3, 2026 09:03
enxebre added a commit to enxebre/hypershift that referenced this pull request Jun 3, 2026
Empty commit to retrigger the validate-cpo-overrides workflow after
updating the PR description to claim PR openshift#8616 (this PR, which is
not in the CPO release images) to verify the check fails correctly.
@enxebre enxebre force-pushed the enxebre/validate-pr-images-skill branch 2 times, most recently from c7cfb5f to 2a9e3a3 Compare June 3, 2026 09:18
@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented Jun 3, 2026

/hold cancel

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 3, 2026
…description

Adds a GitHub Actions workflow that triggers when CPO override images
are modified. It parses the PR description for a contract specifying
which PRs each branch override should contain, then validates the
images using skopeo and git history.

Also adds a Claude Code skill wrapping the same validation script
for local use.
@enxebre enxebre force-pushed the enxebre/validate-pr-images-skill branch from 2a9e3a3 to 921e551 Compare June 3, 2026 09:20
@hypershift-jira-solve-ci
Copy link
Copy Markdown

Now I have the complete root cause. Let me produce the final report.

Test Failure Analysis Complete

Job Information

  • Prow Job: Validate CPO Override Images (GitHub Actions workflow validate-cpo-overrides.yaml)
  • Build ID: 26875356379 / Job 79261031734
  • PR: NO-JIRA: feat(skills): add validate-pr-override-images skill #8616 (NO-JIRA: feat(skills): add validate-pr-override-images skill)
  • Branch: enxebre/validate-pr-images-skillmain
  • Duration: ~29 seconds (09:16:02 – 09:16:31 UTC)

Test Failure Analysis

Error

FAIL: PR #8616 has no merge commit (not merged yet?)
  PR #8616: FAIL

=== Summary ===
Passed: 0
Failed: 2

OVERALL: FAIL
##[error]Process completed with exit code 1.

Summary

The validation workflow fails because the PR description's branch: X.Y wants: contract references PR #8616 itself as the PR that should be included in the override images. Since PR #8616 is still open (not merged), verify-pr-in-image.sh calls gh pr view 8616 --json mergeCommit which returns null, causing the script to emit FAIL: PR #8616 has no merge commit (not merged yet?) and exit 1. This is a circular self-referencing test setup: the PR adds test override entries pointing to real Konflux images (for branches 4.20 and 4.21) and claims those images contain PR #8616, but PR #8616 cannot have a merge commit until it is merged, making the validation inherently impossible to pass while the PR is open.

Root Cause

The failure is caused by a circular self-reference in the test data. Here is the exact causal chain:

  1. PR NO-JIRA: feat(skills): add validate-pr-override-images skill #8616 modifies overrides.yaml — The PR adds test entries with real Konflux CPO images for versions 4.20.25 and 4.21.19 (comments say # Test entry for validate-pr-override-images CI validation).

  2. The PR description declares branch: 4.20 wants: PR 8616 — The validate-overrides.sh script parses this contract from the PR description (outside code blocks). At the time CI ran, the PR description contained lines like branch 4.20 wants PRs: 8616 and branch 4.21 wants PRs: 8616.

  3. Step 3 validation calls verify-pr-in-image.sh — For each (branch, image, PR) tuple, this script:

  4. Both image validations fail identically — The 4.20 and 4.21 images are real existing images that were built from commits predating PR NO-JIRA: feat(skills): add validate-pr-override-images skill #8616. Even if PR NO-JIRA: feat(skills): add validate-pr-override-images skill #8616 were merged, its merge commit would not be an ancestor of these pre-existing image commits, so the git merge-base --is-ancestor check would also fail.

In summary: The test data is self-referential and impossible to satisfy. The override images are real pre-existing builds that cannot contain PR #8616 (which introduces the validation tooling itself, not a CPO code fix). The PR claims these images contain PR #8616, but they don't and can't.

Recommendations
  1. Use a real, already-merged PR number in the test contract — Change the PR description contract to reference a PR that is already merged and known to be included in the test override images. For example, use the PR that originally added the override entries for 4.20/4.21 (the images at sha256:155e4eee... and sha256:1b3f1bf7... were built from specific commits — find a merged PR that is an ancestor of those commits).

  2. Remove the test override entries from overrides.yaml — The # Test entry for validate-pr-override-images CI validation entries at versions 4.20.25 and 4.21.19 duplicate existing images. If the workflow is only meant to run on real override PRs, remove these test entries and instead rely on a separate test PR or a mock test that doesn't modify production data.

  3. Add a --dry-run or --self-test mode to the script — For CI on the tooling PR itself, add a mode that validates the script's parsing logic without actually checking merge ancestry. This avoids the circular dependency of a PR validating that images contain itself.

  4. Scope the workflow trigger — The workflow currently triggers because the PR adds the workflow file itself. Consider adding a condition to skip validation when the PR doesn't substantively change override images (e.g., if: contains(github.event.pull_request.changed_files, 'overrides.yaml')), or exclude the workflow from running on its own initial PR.

Evidence
Evidence Detail
Error message FAIL: PR #8616 has no merge commit (not merged yet?) (appears twice, once per branch)
PR state OPEN (mergedAt: null, mergeCommit: null)
Failed step Step 3: "Validating images contain claimed PRs" in validate-overrides.sh
Script line causing failure verify-pr-in-image.sh line 49: gh pr view "$PR" --json mergeCommit --jq '.mergeCommit.oid // empty' returns empty
Branch 4.20 image quay.io/redhat-user-workloads/crt-redhat-acm-tenant/control-plane-operator-4-20@sha256:155e4eee... (commit: d271e089)
Branch 4.21 image quay.io/redhat-user-workloads/crt-redhat-acm-tenant/control-plane-operator-4-21@sha256:1b3f1bf7... (commit: d345a6af)
Test override entries overrides.yaml has # Test entry for validate-pr-override-images CI validation at versions 4.20.25 and 4.21.19
PR description contract branch 4.20 wants PRs: 8616 and branch 4.21 wants PRs: 8616 (parsed at runtime from PR body)
Root cause Circular self-reference — PR #8616 claims images contain itself, but it is not merged and its code is not in those pre-existing images

@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented Jun 3, 2026

/hold cancel

@jparrill
Copy link
Copy Markdown
Contributor

jparrill commented Jun 3, 2026

/lgtm

@bryan-cox
Copy link
Copy Markdown
Member

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 3, 2026
@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 3, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification

No second-stage tests were triggered for this PR.

This can happen when:

  • The changed files don't match any pipeline_run_if_changed patterns
  • All files match pipeline_skip_if_only_changed patterns
  • No pipeline-controlled jobs are defined for the main branch

Use /test ? to see all available tests.

@bryan-cox
Copy link
Copy Markdown
Member

/hold cancel

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 3, 2026
@bryan-cox
Copy link
Copy Markdown
Member

/verified by @enxebre

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 3, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@bryan-cox: This PR has been marked as verified by @enxebre.

Details

In response to this:

/verified by @enxebre

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot openshift-merge-bot Bot merged commit 860b695 into openshift:main Jun 3, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/ai Indicates the PR includes changes related to AI - Claude agents, Cursor rules, etc. area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants