Skip to content

Testing changes for MCS Firstboot Pivot Failure Reporting v2 #6124

Draft
CourtCourt521 wants to merge 1 commit into
openshift:mainfrom
CourtCourt521:FirstBootPivotErrorReportingv2
Draft

Testing changes for MCS Firstboot Pivot Failure Reporting v2 #6124
CourtCourt521 wants to merge 1 commit into
openshift:mainfrom
CourtCourt521:FirstBootPivotErrorReportingv2

Conversation

@CourtCourt521
Copy link
Copy Markdown
Contributor

@CourtCourt521 CourtCourt521 commented Jun 2, 2026

  • What I did

Testing not an actual pull request yet

  • How to verify it

Testing not an actual pull request

  • Description for the changelog yet

Testing not an actual pull request yet

Testing not an actual pull request yet

Summary by CodeRabbit

  • Enhancements

    • Node annotations now include the Machine Config Server URL for improved status reporting and cluster diagnostics.
    • Ignition server host discovery improved with infrastructure-aware logic for BareMetal, OpenStack, OVirt and vSphere, yielding more reliable config delivery.
  • Chores

    • RBAC updated to allow reading infrastructure objects for accurate host resolution.
  • Tests

    • Adjusted tests to validate annotation behavior with the API server URL.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 2, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Walkthrough

The PR adds Machine Config Server (MCS) URL annotation to Ignition node configurations: a new annotation constant and exported GetIgnitionHost compute the host from Infrastructure; server appenders now accept and propagate mcsURL; ClusterServer fetches Infrastructure to compute mcsURL; BootstrapServer and tests updated; RBAC extended to read infrastructures.

Changes

Machine Config Server URL Annotation in Ignition Configs

Layer / File(s) Summary
Server: constant, host helper, and appender signatures
pkg/daemon/constants/constants.go, pkg/server/server.go
Adds MachineConfigServerURLAnnotationKey, exports GetIgnitionHost(infraStatus *configv1.InfrastructureStatus) (string, error), and updates WithNodeAnnotations, appendNodeAnnotations, and getNodeAnnotation to accept mcsURL and set the annotation when non-empty.
ClusterServer: Infrastructure fetch and mcsURL wiring
pkg/server/cluster_server.go
ClusterServer creates a configClient, retrieves the Infrastructure resource, calls GetIgnitionHost, formats mcsURL as https://<host>, stores it on the server struct, and passes it to WithNodeAnnotations during Ignition assembly.
Operator sync, BootstrapServer, tests, RBAC
pkg/operator/sync.go, pkg/server/bootstrap_server.go, pkg/server/server_test.go, manifests/machineconfigserver/clusterrole.yaml
Operator sync delegates to server.GetIgnitionHost (removes local helper and unused imports); BootstrapServer calls WithNodeAnnotations with matching signature (empty defaults); tests updated to new getNodeAnnotation signature; ClusterRole gains get,list,watch on infrastructures.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 13 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning PR title states 'Testing changes' but actual changeset implements functional infrastructure—adding an annotation constant, new exported GetIgnitionHost function, RBAC permissions, and integration logic across multiple files. Revise title to reflect the actual functional implementation: something like 'Add MCS URL annotation support and infrastructure integration for firstboot pivot failure reporting' would better describe the actual changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (13 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names are static and deterministic. The repository uses Go testing (not Ginkgo), and no test names contain dynamic values like timestamps, UUIDs, or IP addresses.
Test Structure And Quality ✅ Passed No Ginkgo tests found in this PR. All tests use standard Go testing with testing.T, not Ginkgo's BDD framework. Custom check is not applicable.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR; only standard Go unit tests are modified, so MicroShift compatibility check does not apply.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR. The changes include production code and unit tests (standard Go testing), not e2e tests. The custom check does not apply.
Topology-Aware Scheduling Compatibility ✅ Passed PR does not introduce scheduling constraints; changes are operator/server code and RBAC updates that compute and store MCS URLs in node annotations only.
Ote Binary Stdout Contract ✅ Passed PR modifies only support code with no main/init/TestMain/suite functions. No stdout writes in modified files. Existing process-level code already redirects klog to stderr.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests were added in this PR. The check applies only to new Ginkgo e2e tests; existing unit tests (TestBootstrapServer, TestClusterServer) use standard Go testing package, not Ginkgo.
No-Weak-Crypto ✅ Passed No weak cryptography (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret comparisons found in modified files.
Container-Privileges ✅ Passed PR modifies Go code and one RBAC ClusterRole manifest. No container security context flags (privileged, hostPID, hostNetwork, hostIPC, allowPrivilegeEscalation, SYS_ADMIN) were changed.
No-Sensitive-Data-In-Logs ✅ Passed The PR does not log sensitive data. Internal MCS URLs, hostnames, or API server URLs are computed but never logged; error logs only capture error objects, not the sensitive values.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 2, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: CourtCourt521
Once this PR has been reviewed and has the lgtm label, please assign cheesesashimi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/server/server_test.go`:
- Line 188: The tests currently call getNodeAnnotation and build a local
mcIgnCfg but never assert that MachineConfigServerURLAnnotationKey was written
or configure the test servers to emit that MCS URL; update the tests to (1)
configure the test server(s) used in these cases to return/emit the expected MCS
URL, (2) set the expected URL value and call
getNodeAnnotation(mp.Status.Configuration.Name, "", "<expected-url>", mc) as you
already do, and (3) add explicit assertions that the returned annotation map
contains MachineConfigServerURLAnnotationKey with the expected URL value (use
the same unique symbols: getNodeAnnotation, MachineConfigServerURLAnnotationKey,
and mcIgnCfg to locate where to change behavior and where to add the assertion).
Ensure both occurrences (around lines shown) are updated consistently.

In `@pkg/server/server.go`:
- Around line 322-347: GetIgnitionHost currently uses fmt.Sprintf("%s:%s",
internalURLParsed.Hostname(), securePortStr) which breaks for IPv6 and assumes
infraStatus and API IP slices are non-nil/non-empty; change the fallback to use
net.JoinHostPort(internalURLParsed.Hostname(), securePortStr) and add an initial
nil check for infraStatus to return an error if nil, then in each platform
branch (BareMetal/APIServerInternalIPs, OpenStack/APIServerInternalIPs,
Ovirt/APIServerInternalIPs, VSphere.APIServerInternalIPs) check that the slice
exists and len(...)>0 before indexing [0] to avoid panics, keeping the
SecurePort and ignitionHost assignment logic otherwise the same.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0a4e5c0a-ff8b-478e-8e72-f71e85d2c927

📥 Commits

Reviewing files that changed from the base of the PR and between 3ffecb5 and 7556763.

📒 Files selected for processing (6)
  • pkg/daemon/constants/constants.go
  • pkg/operator/sync.go
  • pkg/server/bootstrap_server.go
  • pkg/server/cluster_server.go
  • pkg/server/server.go
  • pkg/server/server_test.go

Comment thread pkg/server/server_test.go
t.Fatalf("unexpected error while appending file to ignition: %v", err)
}
anno, err := getNodeAnnotation(mp.Status.Configuration.Name, "", mc)
anno, err := getNodeAnnotation(mp.Status.Configuration.Name, "", "https://api-int.test.example.com:22623", mc)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

These tests still don’t verify the new MCS URL annotation.

The hardcoded URL only goes into a local mcIgnCfg that never drives the final assertions, and the servers under test are not configured to emit that URL. These cases will keep passing even if MachineConfigServerURLAnnotationKey is never written.

Also applies to: 383-383

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/server/server_test.go` at line 188, The tests currently call
getNodeAnnotation and build a local mcIgnCfg but never assert that
MachineConfigServerURLAnnotationKey was written or configure the test servers to
emit that MCS URL; update the tests to (1) configure the test server(s) used in
these cases to return/emit the expected MCS URL, (2) set the expected URL value
and call getNodeAnnotation(mp.Status.Configuration.Name, "", "<expected-url>",
mc) as you already do, and (3) add explicit assertions that the returned
annotation map contains MachineConfigServerURLAnnotationKey with the expected
URL value (use the same unique symbols: getNodeAnnotation,
MachineConfigServerURLAnnotationKey, and mcIgnCfg to locate where to change
behavior and where to add the assertion). Ensure both occurrences (around lines
shown) are updated consistently.

Comment thread pkg/server/server.go
Comment on lines +322 to +347
func GetIgnitionHost(infraStatus *configv1.InfrastructureStatus) (string, error) {
internalURL := infraStatus.APIServerInternalURL
internalURLParsed, err := url.Parse(internalURL)
if err != nil {
return "", err
}
securePortStr := strconv.Itoa(SecurePort)
ignitionHost := fmt.Sprintf("%s:%s", internalURLParsed.Hostname(), securePortStr)
if infraStatus.PlatformStatus != nil {
switch infraStatus.PlatformStatus.Type {
case configv1.BareMetalPlatformType:
ignitionHost = net.JoinHostPort(infraStatus.PlatformStatus.BareMetal.APIServerInternalIPs[0], securePortStr)
case configv1.OpenStackPlatformType:
ignitionHost = net.JoinHostPort(infraStatus.PlatformStatus.OpenStack.APIServerInternalIPs[0], securePortStr)
case configv1.OvirtPlatformType:
ignitionHost = net.JoinHostPort(infraStatus.PlatformStatus.Ovirt.APIServerInternalIPs[0], securePortStr)
case configv1.VSpherePlatformType:
if infraStatus.PlatformStatus.VSphere != nil {
if len(infraStatus.PlatformStatus.VSphere.APIServerInternalIPs) > 0 {
ignitionHost = net.JoinHostPort(infraStatus.PlatformStatus.VSphere.APIServerInternalIPs[0], securePortStr)
}
}
}
}

return ignitionHost, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Harden GetIgnitionHost (IPv6 host:port + panic safety)

  • Replace the fallback fmt.Sprintf("%s:%s", internalURLParsed.Hostname(), securePortStr) with net.JoinHostPort(...) so IPv6 literals get valid brackets.
  • Add infraStatus == nil handling and check len(APIServerInternalIPs) > 0 before indexing [0] in the BareMetal/OpenStack/Ovirt/VSphere branches to avoid panics.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/server/server.go` around lines 322 - 347, GetIgnitionHost currently uses
fmt.Sprintf("%s:%s", internalURLParsed.Hostname(), securePortStr) which breaks
for IPv6 and assumes infraStatus and API IP slices are non-nil/non-empty; change
the fallback to use net.JoinHostPort(internalURLParsed.Hostname(),
securePortStr) and add an initial nil check for infraStatus to return an error
if nil, then in each platform branch (BareMetal/APIServerInternalIPs,
OpenStack/APIServerInternalIPs, Ovirt/APIServerInternalIPs,
VSphere.APIServerInternalIPs) check that the slice exists and len(...)>0 before
indexing [0] to avoid panics, keeping the SecurePort and ignitionHost assignment
logic otherwise the same.

@CourtCourt521 CourtCourt521 force-pushed the FirstBootPivotErrorReportingv2 branch from 7556763 to bbcbf77 Compare June 4, 2026 19:28
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@manifests/machineconfigserver/clusterrole.yaml`:
- Around line 28-30: The ClusterRole currently grants unnecessary verbs for the
config.openshift.io/infrastructures resource; remove "list" and "watch" and
retain only the "get" verb since pkg/server/cluster_server.go only calls
ConfigV1().Infrastructures().Get(...). Locate the rule that lists apiGroups:
["config.openshift.io"] and resources: ["infrastructures"] in
manifests/machineconfigserver/clusterrole.yaml and update the verbs array to
only include "get" so RBAC is tightened to the minimum required permission.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1bffdd77-4bc7-48c8-a0a0-697a22118952

📥 Commits

Reviewing files that changed from the base of the PR and between 7556763 and bbcbf77.

📒 Files selected for processing (7)
  • manifests/machineconfigserver/clusterrole.yaml
  • pkg/daemon/constants/constants.go
  • pkg/operator/sync.go
  • pkg/server/bootstrap_server.go
  • pkg/server/cluster_server.go
  • pkg/server/server.go
  • pkg/server/server_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • pkg/daemon/constants/constants.go
  • pkg/server/server_test.go
  • pkg/server/bootstrap_server.go
  • pkg/server/cluster_server.go
  • pkg/operator/sync.go
  • pkg/server/server.go

Comment on lines +28 to +30
- apiGroups: ["config.openshift.io"]
resources: ["infrastructures"]
verbs: ["get", "list", "watch"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check for Infrastructure informer/lister usage that requires list+watch

# Search for Infrastructure informer/lister initialization
rg -nP --type=go -C3 '\bInfrastructure(Informer|Lister)\b'

# Search for watch operations on Infrastructure
rg -nP --type=go -C3 'Infrastructures\(\)\.Watch'

# Search for list operations on Infrastructure  
rg -nP --type=go -C3 'Infrastructures\(\)\.List'

Repository: openshift/machine-config-operator

Length of output: 18993


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Find where infraInformer is actually used (excluding vendor)
rg -n --type=go --glob '!vendor/**' '\binfraInformer\b' pkg

# 2) Look for starting/starting informers via factory or .Informer().Run/.HasSynced (excluding vendor)
rg -n --type=go --glob '!vendor/**' '\.Informer\(\)\.Run|\bRun\(.*infra|Start.*infra|WaitForCacheSync|HasSynced|informerFactory' pkg

# 3) Check if there are any direct API calls for Infrastructures().List/Watch in pkg (excluding vendor)
rg -n --type=go --glob '!vendor/**' 'Infrastructures\(\)\.(List|Watch)\b' pkg

Repository: openshift/machine-config-operator

Length of output: 23552


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Look for where the config informer factory is created/started (excluding vendor)
rg -n --type=go --glob '!vendor/**' '\bNewSharedInformerFactory\([^)]*configClient|\bStart\([^)]*\)' pkg/operator pkg/controller pkg/server pkg/daemon || true

# Specifically search for starting shared informer factories named/configure on configClient
rg -n --type=go --glob '!vendor/**' 'configinformers\.NewSharedInformerFactory|configinformers.*NewSharedInformerFactory' pkg/operator pkg/controller pkg/server pkg/daemon || true

# Find where infraInformer.Informer() comes from in operator.go to trace factory start
rg -n --type=go --glob '!vendor/**' 'infraInformer\s*[:=]' pkg/operator/operator.go || true
rg -n --type=go --glob '!vendor/**' 'Infrastructures\(\)\s*\.Informer\(\)' pkg/operator/operator.go || true

Repository: openshift/machine-config-operator

Length of output: 6087


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Trace config informer factory creation for infraInformer and whether it is started
rg -n --type=go --glob '!vendor/**' 'configinformers\.NewSharedInformerFactory\(' pkg/operator/operator.go pkg/server/cluster_server.go pkg/controller pkg/daemon pkg/osimagestream || true

# Find the sharedInformerFactory variable name used in pkg/server/cluster_server.go and see Start call scope
sed -n '1,220p' pkg/server/cluster_server.go | cat -n

# Show relevant section of pkg/operator/operator.go around where informer factories are started (look near Start())
rg -n --type=go --glob '!vendor/**' '\.Start\(stopCh\)|SharedInformerFactory\.Start' pkg/operator/operator.go
sed -n '430,560p' pkg/operator/operator.go | cat -n

Repository: openshift/machine-config-operator

Length of output: 16197


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search only in pkg/server for infra informer/lister usage and list/watch
rg -n --type=go --glob '!vendor/**' 'Infrastructures\(\)\.(Get|List|Watch)|infrastructure|infraInformer|infraLister' pkg/server || true

# Also quickly check for any shared informer factory involving config client in server
rg -n --type=go --glob '!vendor/**' 'configinformers|ConfigV1\(\)|ConfigClientOrDie\(' pkg/server || true

Repository: openshift/machine-config-operator

Length of output: 694


Tighten RBAC: drop list/watch for config.openshift.io/infrastructures

manifests/machineconfigserver/clusterrole.yaml should not grant list/watch: pkg/server/cluster_server.go only uses configClient.ConfigV1().Infrastructures().Get(ctx, "cluster", ...) to compute the MCS URL, and pkg/server contains no Infrastructures().List/Watch usage (or infra informer usage).

🔒 Tighten RBAC to minimum necessary verbs
 - apiGroups: ["config.openshift.io"]
   resources: ["infrastructures"]
-  verbs: ["get", "list", "watch"]
+  verbs: ["get"]
🧰 Tools
🪛 Checkov (3.2.530)

[medium] 1-30: Minimize wildcard use in Roles and ClusterRoles

(CKV_K8S_49)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@manifests/machineconfigserver/clusterrole.yaml` around lines 28 - 30, The
ClusterRole currently grants unnecessary verbs for the
config.openshift.io/infrastructures resource; remove "list" and "watch" and
retain only the "get" verb since pkg/server/cluster_server.go only calls
ConfigV1().Infrastructures().Get(...). Locate the rule that lists apiGroups:
["config.openshift.io"] and resources: ["infrastructures"] in
manifests/machineconfigserver/clusterrole.yaml and update the verbs array to
only include "get" so RBAC is tightened to the minimum required permission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant