OCPBUGS-86471: baremetal: fix hostSelector for default stream#10571
OCPBUGS-86471: baremetal: fix hostSelector for default stream#10571honza wants to merge 1 commit into
Conversation
|
@honza: This pull request references Jira Issue OCPBUGS-86471, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
Note
|
| Layer / File(s) | Summary |
|---|---|
Import and stream usage pkg/asset/machines/baremetal/machines.go |
Add k8s.io/apimachinery/pkg/selection import required by the new helper and preserve stream typing in provider. |
Provider wiring to helper pkg/asset/machines/baremetal/machines.go |
Replace inline CustomDeploy.HostSelector construction with hostSelectorForStream(osImageStream) in provider. |
hostSelectorForStream implementation pkg/asset/machines/baremetal/machines.go |
Add hostSelectorForStream(stream) that computes an exclusion list of types.OSImageStreamValues not equal to stream and returns a HostSelector with one MatchExpressions requirement using selection.NotIn and that exclusion list (keyed by streamLabelKey). |
Estimated code review effort
🎯 2 (Simple) | ⏱️ ~10 minutes
Possibly related PRs
- openshift/installer#10502: Both PRs modify the baremetal machine provider and
HostSelectorlogic related tocoreos.openshift.io/stream.
Suggested labels
approved, lgtm
Suggested reviewers
- dtantsur
- iurygregory
🚥 Pre-merge checks | ✅ 14 | ❌ 1
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. | Write docstrings for the functions missing them to satisfy the coverage threshold. |
✅ Passed checks (14 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title accurately summarizes the main change: fixing the hostSelector logic for default streams in baremetal machines, which is the core purpose of this PR. |
| Linked Issues check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Out of Scope Changes check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Stable And Deterministic Test Names | ✅ Passed | PR modifies only production code in machines.go. No Ginkgo tests present or added; file has no test patterns or Ginkgo imports. Check not applicable. |
| Test Structure And Quality | ✅ Passed | No Ginkgo test files were added or modified in this PR. The PR only changes non-test code in machines.go. |
| Microshift Test Compatibility | ✅ Passed | No Ginkgo e2e tests are added in this PR. Changes are only to production code in pkg/asset/machines/baremetal/machines.go, which contains no test definitions. |
| Single Node Openshift (Sno) Test Compatibility | ✅ Passed | PR modifies bare metal machine provisioning code, not e2e tests. No Ginkgo test declarations found; check is not applicable. |
| Topology-Aware Scheduling Compatibility | ✅ Passed | PR changes baremetal machine provisioning logic (HostSelector for BareMetalMachineProviderSpec), not pod scheduling constraints. No pod affinity, topology spread, PDBs, or nodeSelectors introduced. |
| Ote Binary Stdout Contract | ✅ Passed | File is production asset generation code with no process-level functions, no stdout writes, and no test binary communication code affected by changes. |
| Ipv6 And Disconnected Network Test Compatibility | ✅ Passed | No new Ginkgo e2e tests are added in this PR. Changes are to baremetal machine provisioning code (pkg/asset/machines/baremetal/machines.go and machinesets.go), not test code. Check does not apply. |
| No-Weak-Crypto | ✅ Passed | No weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret comparisons detected in the modified file. |
| Container-Privileges | ✅ Passed | PR modifies only Go provisioning code for baremetal HostSelector logic; no container security, privilege escalation, or K8s manifest configurations added. |
| No-Sensitive-Data-In-Logs | ✅ Passed | No logging of sensitive data found. The code only uses fmt for non-sensitive error messages and machine naming, with no logging frameworks or secret exposure. |
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches
🧪 Generate unit tests (beta)
- Create PR with unit tests
Comment @coderabbitai help to get the list of available commands and usage tips.
|
/jira refresh |
|
@honza: This pull request references Jira Issue OCPBUGS-86471, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@honza: This pull request references Jira Issue OCPBUGS-86471, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/lgtm |
|
|
||
| // Only set HostSelector when a non-default stream is explicitly configured. | ||
| // Unlabeled BMHs (e.g. spare hosts in a CI pool) won't match a selector, | ||
| // so we leave it empty for the default stream to preserve compatibility. |
There was a problem hiding this comment.
The problem with this is that it will freely pick up e.g. BMHs that are explicitly labelled for rhel-10 and happily try to install rhel-9 on them.
Ideally the semantics we want are to match any BMHs either with the label value we want or without any label of that type. That doesn't seem to be possible with the Selector API, as all of the match conditions must be true. We would have to change the API to allow multiple selectors. Or k8s needs to add a NotOut operator to selectors 🙂.
What we can do is use the NotEquals or NotIn operator to exclude the non-default value (i.e. currently rhel-10) - that does still match in the case where the label is not set. Unfortunately this is not super future-proof, as it requires us to enumerate all of the values we don't want to match.
The alternative would be to not apply the label to any hosts, and explicitly match hosts without the label. So that the default pool would always use unlabelled hosts and for the non-default pools you would have to label the hosts and add a selector to the pool. I kind of hate this because as you upgrade into a version with new default streams, the correct labelling continues to depend on the cluster's born-in version.
It looks like past-me was at least dimly aware of the problem:
The hardest part will be handling the default case that there is no label. Worst case scenario, we could run a small controller that adds the default label value to any BMHs that don't have the label. It would be better not to have to, but at least we know that a viable solution exists.
So many options, all fairly bad.
There was a problem hiding this comment.
What we can do is use the NotEquals or NotIn operator to exclude the non-default value (i.e. currently rhel-10) - that does still match in the case where the label is not set. Unfortunately this is not super future-proof, as it requires us to enumerate all of the values we don't want to match.
By the time we get to RHEL 11, maybe it will become less of a problem :)
There was a problem hiding this comment.
By the time we get to RHEL 11, maybe it will become less of a problem :)
I guess that's true, in the sense that by the time we drop RHEL 9 support (which will be before RHEL 11), any MachineSets that exclude RHEL 10 hosts will no longer be useful and will need to be deleted.
It's less robust against us adding other kinds of streams, but we can document what to do when adding another MachineSet with a different selector.
There was a problem hiding this comment.
Updated with the NotIn idea
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/lgtm |
Use a NotIn MatchExpression to exclude other known streams rather than requiring an exact label match via MatchLabels. NotIn matches when the label is absent or has any value not in the exclusion set, so unlabeled BareMetalHosts (e.g. spare hosts in a CI pool) can still be claimed while hosts labeled for a different stream are rejected. This applies to all streams, not just the compile-time default, because the cluster's effective default stream is determined at install time, not by the installer binary's DefaultOSImageStream constant. The exclusion list is computed from types.OSImageStreamValues, so new streams are automatically excluded without additional code changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@honza: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/payload-job pull-ci-openshift-cluster-baremetal-operator-main-e2e-metal-ipi-serial-ipv4 |
|
@dtantsur: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command |
|
/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ipi-serial-ipv4 |
|
@dtantsur: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command |
|
/lgtm Okay, I give up, I'm not one of the 5 people in OpenShift who understand how to use |
|
/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ipi-ovn-serial-ipv4 |
|
@tthvo: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4c43a510-60fd-11f1-8ef2-87081cd6649a-0 |
Use a
NotInMatchExpression to exclude other known streams rather than requiring an exact label match via MatchLabels. NotIn matches when the label is absent or has any value not in the exclusion set, so unlabeled BareMetalHosts (e.g. spare hosts in a CI pool) can still be claimed while hosts labeled for a different stream are rejected.This applies to all streams, not just the compile-time default, because the cluster's effective default stream is determined at install time, not by the installer binary's DefaultOSImageStream constant. The exclusion list is computed from types.OSImageStreamValues, so new streams are automatically excluded without additional code changes..
Summary by CodeRabbit