Skip to content

CORS-4334: Konnectivity#10344

Open
patrickdillon wants to merge 3 commits intoopenshift:mainfrom
patrickdillon:konnectivity
Open

CORS-4334: Konnectivity#10344
patrickdillon wants to merge 3 commits intoopenshift:mainfrom
patrickdillon:konnectivity

Conversation

@patrickdillon
Copy link
Copy Markdown
Contributor

@patrickdillon patrickdillon commented Mar 2, 2026

Continuation of #10280:

  • Refactored to reduce in-lining in bootkube.sh
  • Added some gating (needs port opening on some or all platforms)

Will break the API vendoring into a separate PR to get that merged sooner rather than later.

Not tested. Opening this now as a /WIP to continue discussion of #10280 with #9628
/cc @JoelSpeed @mdbooth

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 2, 2026
@openshift-ci openshift-ci bot requested a review from JoelSpeed March 2, 2026 04:08
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Mar 2, 2026

@patrickdillon: This pull request references CORS-4334 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Continuation of #10280:

  • Refactored to reduce in-lining in bootkube.sh
  • Added some gating (needs port opening on some or all platforms)

Will break the API changes into a separate PR.

Not tested. Opening this now as a /WIP to continue discussion of #10280 with #9628
/cc @JoelSpeed @mdbooth

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from mdbooth March 2, 2026 04:08
@patrickdillon
Copy link
Copy Markdown
Contributor Author

patrickdillon commented Mar 2, 2026

/wip
/hold

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 2, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tthvo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Mar 2, 2026

@patrickdillon: This pull request references CORS-4334 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Continuation of #10280:

  • Refactored to reduce in-lining in bootkube.sh
  • Added some gating (needs port opening on some or all platforms)

Will break the API vendoring into a separate PR to get that merged sooner rather than later.

Not tested. Opening this now as a /WIP to continue discussion of #10280 with #9628
/cc @JoelSpeed @mdbooth

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@patrickdillon
Copy link
Copy Markdown
Contributor Author

/test e2e-vsphere-ovn e2e-nutanix-ovn
/test ?

@patrickdillon
Copy link
Copy Markdown
Contributor Author

/test e2e-metal-ipi-ovn
/test e2e-agent-compact-ipv4

@patrickdillon
Copy link
Copy Markdown
Contributor Author

We probably want to clean up the konnectivity ports on bootstrap destroy as well.

@patrickdillon
Copy link
Copy Markdown
Contributor Author

I have experimented with adding a feature gate to control this and it is possible.

@patrickdillon
Copy link
Copy Markdown
Contributor Author

Need to not deploy this on a true single node cluster.

@JoelSpeed
Copy link
Copy Markdown
Contributor

Have read through the changes and the scripts all seem reasonable to me. I'll open a PR to CAPIO that switches us back to Fail webhook policy to test this with

@patrickdillon
Copy link
Copy Markdown
Contributor Author

/retest-required

2 similar comments
@patrickdillon
Copy link
Copy Markdown
Contributor Author

/retest-required

@patrickdillon
Copy link
Copy Markdown
Contributor Author

/retest-required

@tthvo
Copy link
Copy Markdown
Member

tthvo commented Mar 12, 2026

/cc @sadasu @jhixson74

@openshift-ci openshift-ci bot requested review from jhixson74 and sadasu March 12, 2026 20:47
@tthvo
Copy link
Copy Markdown
Member

tthvo commented Mar 12, 2026

/retest

Copy link
Copy Markdown
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cool 😎🔥! I just have a questions and comments while learning/reading about this :D

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

Walkthrough

Adds Konnectivity bootstrap: new manifests (server, agent, namespace, egress config, secret, DaemonSet), bootstrap scripts and cert generator, bootkube integration, and cloud-provider changes to allow and clean up port 8091 ingress across providers; also updates service analysis logging for Konnectivity stages.

Changes

Cohort / File(s) Summary
Konnectivity Manifests
data/data/bootstrap/files/opt/openshift/egress-selector-config.yaml, data/data/bootstrap/files/opt/openshift/konnectivity-agent-certs-secret.yaml, data/data/bootstrap/files/opt/openshift/konnectivity-agent-daemonset.yaml, data/data/bootstrap/files/opt/openshift/konnectivity-config-override.yaml, data/data/bootstrap/files/opt/openshift/konnectivity-namespace.yaml, data/data/bootstrap/files/opt/openshift/konnectivity-server-pod.yaml
Adds EgressSelectorConfiguration, namespace, secret, konnectivity server pod, agent DaemonSet, and kube-apiserver config-override. Review TLS paths, hostPath mounts, UDS socket path, and egress/proxyProtocol settings.
Bootstrap scripts & templates
data/data/bootstrap/files/usr/local/bin/bootkube.sh.template, data/data/bootstrap/files/usr/local/bin/konnectivity.sh.template, data/data/bootstrap/files/usr/local/bin/konnectivity-certs.sh
Integrates konnectivity lifecycle into bootkube: new setup/manifests/cleanup functions, cert generation script, invocations added to bootkube flow, envsubst usage, and config override file copy. Verify idempotency markers, env var templating, and ordering relative to kube-apiserver bootstrap.
AWS manifests & cleanup
pkg/asset/manifests/aws/cluster.go, pkg/infrastructure/aws/clusterapi/aws.go
Adds BootstrapKonnectivityDescription and control-plane ingress rule for TCP/8091; updates bootstrap rule removal and polling to treat SSH+Konnectivity as bootstrap rules. Inspect rule matching/removal logic and description constants.
Azure manifests & cleanup
pkg/asset/manifests/azure/cluster.go, pkg/infrastructure/azure/azure.go
Adds per-subnet Konnectivity NSG rules (IPv4/IPv6) and post-destroy deletion of the konnectivity NSG rule. Check NSG rule naming/priority sequencing and error handling for deletion.
GCP firewall changes
pkg/infrastructure/gcp/clusterapi/firewallrules.go
Creates bootstrap firewall rule for TCP/8091 and updates removal to delete both SSH and Konnectivity rules. Validate SourceTags/TargetTags and deletion order/error propagation.
IBM Cloud, PowerVS, OpenStack security groups
pkg/asset/manifests/ibmcloud/securitygroups.go, pkg/asset/manifests/powervs/securitygroups.go, pkg/infrastructure/openstack/preprovision/securitygroups.go
Adds bootstrap ingress rules for TCP/8091: IBM Cloud uses cluster-wide SG remote reference, PowerVS adds Any remote rule, OpenStack adds service definition and per-CIDR bootstrap rules. Verify remote types and CIDR/remote group references.
Service analysis logging
pkg/gather/service/analyze.go
Updates bootkube service analysis to explicitly log errors for Konnectivity-related failing stages (konnectivity-certs, konnectivity-server-bootstrap, konnectivity-agent-manifest, konnectivity-cleanup) and removes an obsolete comment. Confirm logged stage names match marker files used by bootstrap scripts.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@patrickdillon
Copy link
Copy Markdown
Contributor Author

In a followup I will add gating for this, but I wanted to leave it ungated for the moment to do another round of tests

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
data/data/bootstrap/files/usr/local/bin/konnectivity-certs.sh (1)

17-22: Consider restricting private key file permissions.

The generated private keys (ca.key, server.key, agent.key) are created with default umask permissions. For defense in depth, consider explicitly setting restrictive permissions.

🔒 Proposed fix to restrict key permissions
+# Ensure restrictive permissions on key files
+umask 077
+
 # Generate self-signed Konnectivity CA
 openssl req -x509 -newkey rsa:2048 -nodes \

Or alternatively, after each key generation:

chmod 600 "${KONNECTIVITY_CERT_DIR}"/*.key
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/data/bootstrap/files/usr/local/bin/konnectivity-certs.sh` around lines
17 - 22, The script konnectivity-certs.sh generates private keys (ca.key,
server.key, agent.key) using openssl without setting restrictive permissions;
after each key is created (identify the openssl req/newkey invocations that
write "${KONNECTIVITY_CERT_DIR}/ca.key" and the similar server/agent key
writes), explicitly set file mode to 0600 for those private key files (for
example by calling chmod 600 on each generated .key or on
"${KONNECTIVITY_CERT_DIR}"/*.key immediately after generation) to ensure keys
are not world-readable.
pkg/infrastructure/aws/clusterapi/aws.go (1)

531-565: Consider renaming removeSSHRule to removeBootstrapRules for clarity.

The function now removes both SSH and Konnectivity rules, but the name and log message (line 561) still reference only SSH.

📝 Optional naming improvement
-// removeSSHRule removes the SSH rule for accessing the bootstrap node
-// by removing the rule from the cluster spec and updating the object.
-func removeSSHRule(ctx context.Context, cl k8sClient.Client, infraID string) error {
+// removeBootstrapRules removes bootstrap-only ingress rules (SSH and Konnectivity)
+// by removing them from the cluster spec and updating the object.
+func removeBootstrapRules(ctx context.Context, cl k8sClient.Client, infraID string) error {

And update the log message:

-		logrus.Debug("Updated AWSCluster to remove bootstrap SSH rule")
+		logrus.Debug("Updated AWSCluster to remove bootstrap ingress rules")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/infrastructure/aws/clusterapi/aws.go` around lines 531 - 565, The
function removeSSHRule should be renamed to removeBootstrapRules and its
internal log message (the logrus.Debug call that currently says "Updated
AWSCluster to remove bootstrap SSH rule") should be updated to reflect that both
SSH and Konnectivity rules are removed; update references to
awsmanifest.BootstrapSSHDescription and
awsmanifest.BootstrapKonnectivityDescription remain unchanged, and ensure you
also rename any callers/imports of removeSSHRule to the new removeBootstrapRules
name to keep compilation intact.
data/data/bootstrap/files/opt/openshift/konnectivity-agent-daemonset.yaml (1)

23-54: Consider adding explicit securityContext to harden the container.

While hostNetwork: true is required for konnectivity-agent functionality, adding explicit security controls can improve the pod's security posture. Even for bootstrap-only workloads, defining securityContext with allowPrivilegeEscalation: false is a reasonable hardening step if the application permits it.

🔒 Optional security hardening
       containers:
       - name: konnectivity-agent
         image: ${KONNECTIVITY_IMAGE}
+        securityContext:
+          allowPrivilegeEscalation: false
+          capabilities:
+            drop:
+              - ALL
         command:
         - /usr/bin/proxy-agent
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/data/bootstrap/files/opt/openshift/konnectivity-agent-daemonset.yaml`
around lines 23 - 54, Add an explicit securityContext to the konnectivity-agent
container to harden it: in the PodSpec under the container named
"konnectivity-agent" add a securityContext block (container-level) setting
allowPrivilegeEscalation: false and, if the binary supports non-root, set
runAsNonRoot: true and a non-zero runAsUser; also consider
readOnlyRootFilesystem: true and dropping all capabilities (capabilities: drop:
["ALL"]) to minimize privileges while ensuring the agent still functions with
hostNetwork: true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@data/data/bootstrap/files/opt/openshift/konnectivity-server-pod.yaml`:
- Line 25: The Unix socket path for konnectivity-server is inconsistent:
konnectivity-server-pod.yaml creates the socket at
/etc/kubernetes/bootstrap-configs/konnectivity-server.socket but
egress-selector-config.yaml points to
/etc/kubernetes/config/konnectivity-server.socket; update one to match the
other. Fix by either changing the socket path in egress-selector-config.yaml to
/etc/kubernetes/bootstrap-configs/konnectivity-server.socket or altering the
konnectivity-server launch/config in konnectivity-server-pod.yaml to create the
socket under /etc/kubernetes/config/konnectivity-server.socket so both
references to konnectivity-server.socket match exactly.

In `@data/data/bootstrap/files/usr/local/bin/konnectivity.sh.template`:
- Around line 42-61: konnectivity_manifests() currently exports
BOOTSTRAP_NODE_IP but assumes konnectivity_setup() populated it, which can lead
to an empty value in the daemonset manifest; update konnectivity_manifests() to
either re-detect and set BOOTSTRAP_NODE_IP using the same logic as
konnectivity_setup() (so envsubst produces a valid value) or validate that
BOOTSTRAP_NODE_IP is non-empty before calling envsubst and abort/record failure
with a clear error if it's unset; reference the konnectivity_manifests()
function, the BOOTSTRAP_NODE_IP variable, and the envsubst call that writes
manifests/konnectivity-agent-daemonset.yaml when making the change.

---

Nitpick comments:
In `@data/data/bootstrap/files/opt/openshift/konnectivity-agent-daemonset.yaml`:
- Around line 23-54: Add an explicit securityContext to the konnectivity-agent
container to harden it: in the PodSpec under the container named
"konnectivity-agent" add a securityContext block (container-level) setting
allowPrivilegeEscalation: false and, if the binary supports non-root, set
runAsNonRoot: true and a non-zero runAsUser; also consider
readOnlyRootFilesystem: true and dropping all capabilities (capabilities: drop:
["ALL"]) to minimize privileges while ensuring the agent still functions with
hostNetwork: true.

In `@data/data/bootstrap/files/usr/local/bin/konnectivity-certs.sh`:
- Around line 17-22: The script konnectivity-certs.sh generates private keys
(ca.key, server.key, agent.key) using openssl without setting restrictive
permissions; after each key is created (identify the openssl req/newkey
invocations that write "${KONNECTIVITY_CERT_DIR}/ca.key" and the similar
server/agent key writes), explicitly set file mode to 0600 for those private key
files (for example by calling chmod 600 on each generated .key or on
"${KONNECTIVITY_CERT_DIR}"/*.key immediately after generation) to ensure keys
are not world-readable.

In `@pkg/infrastructure/aws/clusterapi/aws.go`:
- Around line 531-565: The function removeSSHRule should be renamed to
removeBootstrapRules and its internal log message (the logrus.Debug call that
currently says "Updated AWSCluster to remove bootstrap SSH rule") should be
updated to reflect that both SSH and Konnectivity rules are removed; update
references to awsmanifest.BootstrapSSHDescription and
awsmanifest.BootstrapKonnectivityDescription remain unchanged, and ensure you
also rename any callers/imports of removeSSHRule to the new removeBootstrapRules
name to keep compilation intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d7c2cba8-f144-4052-bf2f-59e7b3332489

📥 Commits

Reviewing files that changed from the base of the PR and between 883a6d8 and ce99dcf.

📒 Files selected for processing (18)
  • data/data/bootstrap/files/opt/openshift/egress-selector-config.yaml
  • data/data/bootstrap/files/opt/openshift/konnectivity-agent-certs-secret.yaml
  • data/data/bootstrap/files/opt/openshift/konnectivity-agent-daemonset.yaml
  • data/data/bootstrap/files/opt/openshift/konnectivity-config-override.yaml
  • data/data/bootstrap/files/opt/openshift/konnectivity-namespace.yaml
  • data/data/bootstrap/files/opt/openshift/konnectivity-server-pod.yaml
  • data/data/bootstrap/files/usr/local/bin/bootkube.sh.template
  • data/data/bootstrap/files/usr/local/bin/konnectivity-certs.sh
  • data/data/bootstrap/files/usr/local/bin/konnectivity.sh.template
  • pkg/asset/manifests/aws/cluster.go
  • pkg/asset/manifests/azure/cluster.go
  • pkg/asset/manifests/ibmcloud/securitygroups.go
  • pkg/asset/manifests/powervs/securitygroups.go
  • pkg/gather/service/analyze.go
  • pkg/infrastructure/aws/clusterapi/aws.go
  • pkg/infrastructure/azure/azure.go
  • pkg/infrastructure/gcp/clusterapi/firewallrules.go
  • pkg/infrastructure/openstack/preprovision/securitygroups.go

Enables kube-apiserver running on the bootstrap node to access the pod network,
specifically to enable access to webhooks running in the cluster.

Changes:

* Adds a new static Konnectivity server pod running on the bootstrap node
* Configures the bootstrap KAS to use its local Konnectivity server for
outbound cluster traffic
* Add a daemonset deployed into the cluster to run Konnectivity agent on every
cluster node
* Removes daemonset automatically in bootstrap teardown

Co-authored-by: Matthew Booth <mbooth@redhat.com>
Adds error handling to report konnectivity specific failures
when running gather bootstrap or analyze.
Opens the port 8091 for use by konnectivity proxy during bootstrapping.
Cleans up the port on bootstrap destroy. PowerVS does not seem to do
any sort of bootstrap destroy cleanup, but all other platforms are
handled.
@patrickdillon
Copy link
Copy Markdown
Contributor Author

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@data/data/bootstrap/files/opt/openshift/konnectivity-agent-daemonset.yaml`:
- Line 25: Export the KONNECTIVITY_IMAGE variable inside the
konnectivity_manifests() function before calling envsubst so the placeholder in
konnectivity-agent-daemonset.yaml is always defined; specifically, ensure
konnectivity_manifests() sets and exports KONNECTIVITY_IMAGE (same value source
used by konnectivity_setup()) immediately prior to running envsubst on the
daemonset template to avoid reliance on the conditional export in
konnectivity_setup().

In `@data/data/bootstrap/files/opt/openshift/konnectivity-server-pod.yaml`:
- Line 21: The konnectivity server pod's UDS path
(--uds-name=/etc/kubernetes/bootstrap-configs/konnectivity-server.socket) does
not match the egress selector config's udsName
(/etc/kubernetes/config/konnectivity-server.socket); update one side so both use
the identical socket path—either change the pod's --uds-name flag in
konnectivity-server-pod.yaml to
/etc/kubernetes/config/konnectivity-server.socket or change the egress
selector's udsName to
/etc/kubernetes/bootstrap-configs/konnectivity-server.socket—ensure the same
exact filename and directory are used so the API server and konnectivity server
can connect.

In `@pkg/gather/service/analyze.go`:
- Around line 121-130: The switch on a.failingStage only handles specific
konnectivity cases and omits other bootkube stages; add a default case to the
switch (in the same block that references a.failingStage) which logs a concise
summary that includes the unknown failing stage value so failures outside the
listed konnectivity cases are still reported (use logrus.Error or the same
logging pattern and include a.failingStage in the message).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 797a1c9a-033d-47a1-96ac-144596eb5dc4

📥 Commits

Reviewing files that changed from the base of the PR and between ce99dcf and 80cb9f9.

📒 Files selected for processing (18)
  • data/data/bootstrap/files/opt/openshift/egress-selector-config.yaml
  • data/data/bootstrap/files/opt/openshift/konnectivity-agent-certs-secret.yaml
  • data/data/bootstrap/files/opt/openshift/konnectivity-agent-daemonset.yaml
  • data/data/bootstrap/files/opt/openshift/konnectivity-config-override.yaml
  • data/data/bootstrap/files/opt/openshift/konnectivity-namespace.yaml
  • data/data/bootstrap/files/opt/openshift/konnectivity-server-pod.yaml
  • data/data/bootstrap/files/usr/local/bin/bootkube.sh.template
  • data/data/bootstrap/files/usr/local/bin/konnectivity-certs.sh
  • data/data/bootstrap/files/usr/local/bin/konnectivity.sh.template
  • pkg/asset/manifests/aws/cluster.go
  • pkg/asset/manifests/azure/cluster.go
  • pkg/asset/manifests/ibmcloud/securitygroups.go
  • pkg/asset/manifests/powervs/securitygroups.go
  • pkg/gather/service/analyze.go
  • pkg/infrastructure/aws/clusterapi/aws.go
  • pkg/infrastructure/azure/azure.go
  • pkg/infrastructure/gcp/clusterapi/firewallrules.go
  • pkg/infrastructure/openstack/preprovision/securitygroups.go
✅ Files skipped from review due to trivial changes (6)
  • pkg/asset/manifests/ibmcloud/securitygroups.go
  • data/data/bootstrap/files/opt/openshift/konnectivity-namespace.yaml
  • data/data/bootstrap/files/opt/openshift/egress-selector-config.yaml
  • pkg/infrastructure/azure/azure.go
  • data/data/bootstrap/files/opt/openshift/konnectivity-agent-certs-secret.yaml
  • data/data/bootstrap/files/opt/openshift/konnectivity-config-override.yaml
🚧 Files skipped from review as they are similar to previous changes (7)
  • pkg/infrastructure/openstack/preprovision/securitygroups.go
  • pkg/asset/manifests/powervs/securitygroups.go
  • pkg/asset/manifests/azure/cluster.go
  • data/data/bootstrap/files/usr/local/bin/konnectivity-certs.sh
  • data/data/bootstrap/files/usr/local/bin/bootkube.sh.template
  • pkg/infrastructure/aws/clusterapi/aws.go
  • data/data/bootstrap/files/usr/local/bin/konnectivity.sh.template

- operator: Exists
containers:
- name: konnectivity-agent
image: ${KONNECTIVITY_IMAGE}
Copy link
Copy Markdown

@coderabbitai coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that KONNECTIVITY_IMAGE is exported before konnectivity_manifests uses it
# Check the call flow in bootkube.sh.template

rg -n "konnectivity_setup|konnectivity_manifests|KONNECTIVITY_IMAGE" data/data/bootstrap/files/usr/local/bin/ -A2 -B2

Repository: openshift/installer

Length of output: 3242


🏁 Script executed:

# Get the full konnectivity_manifests() function to see what it does
sed -n '42,70p' data/data/bootstrap/files/usr/local/bin/konnectivity.sh.template

Repository: openshift/installer

Length of output: 1499


🏁 Script executed:

# Check what files konnectivity_manifests processes with envsubst
rg -A 20 "^konnectivity_manifests\(\)" data/data/bootstrap/files/usr/local/bin/konnectivity.sh.template

Repository: openshift/installer

Length of output: 1029


🏁 Script executed:

# List all template files in /opt/openshift that might be processed
fd "konnectivity" data/data/bootstrap/files/opt/openshift/ -type f

Repository: openshift/installer

Length of output: 233


🏁 Script executed:

# Check if konnectivity.sh.template is sourced in bootkube.sh.template and at what line
rg -n "source|\\." data/data/bootstrap/files/usr/local/bin/bootkube.sh.template | grep -i konnectivity

Repository: openshift/installer

Length of output: 218


🏁 Script executed:

# Get the lines around where konnectivity_setup and konnectivity_manifests are called
sed -n '240,260p' data/data/bootstrap/files/usr/local/bin/bootkube.sh.template
sed -n '570,580p' data/data/bootstrap/files/usr/local/bin/bootkube.sh.template

Repository: openshift/installer

Length of output: 1143


🏁 Script executed:

# Check the actual content around line 25 of the daemonset file
sed -n '20,30p' data/data/bootstrap/files/opt/openshift/konnectivity-agent-daemonset.yaml

Repository: openshift/installer

Length of output: 376


🏁 Script executed:

# Check all variables referenced in the daemonset file
grep -o '\${[A-Z_]*}' data/data/bootstrap/files/opt/openshift/konnectivity-agent-daemonset.yaml | sort -u

Repository: openshift/installer

Length of output: 105


🏁 Script executed:

# Get full konnectivity_setup() function to see all conditionals
sed -n '7,38p' data/data/bootstrap/files/usr/local/bin/konnectivity.sh.template

Repository: openshift/installer

Length of output: 1499


Export KONNECTIVITY_IMAGE in konnectivity_manifests() to ensure it's available for envsubst.

The KONNECTIVITY_IMAGE variable is exported in konnectivity_setup() (line 33 in konnectivity.sh.template), but only conditionally inside an if [ ! -f konnectivity-server-bootstrap.done ] block. When konnectivity_manifests() calls envsubst on konnectivity-agent-daemonset.yaml (which contains ${KONNECTIVITY_IMAGE} at line 25), the variable may be undefined if .done files get out of sync. Explicitly export KONNECTIVITY_IMAGE in konnectivity_manifests() before running envsubst.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/data/bootstrap/files/opt/openshift/konnectivity-agent-daemonset.yaml` at
line 25, Export the KONNECTIVITY_IMAGE variable inside the
konnectivity_manifests() function before calling envsubst so the placeholder in
konnectivity-agent-daemonset.yaml is always defined; specifically, ensure
konnectivity_manifests() sets and exports KONNECTIVITY_IMAGE (same value source
used by konnectivity_setup()) immediately prior to running envsubst on the
daemonset template to avoid reliance on the conditional export in
konnectivity_setup().

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdillon this comment kinda makes sense tho? We can check if KONNECTIVITY_IMAGE is empty (for some reason), then re-call image_for if needed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

- --cluster-cert=/etc/konnectivity/server.crt
- --cluster-key=/etc/konnectivity/server.key
- --cluster-ca-cert=/etc/konnectivity/ca.crt
- --uds-name=/etc/kubernetes/bootstrap-configs/konnectivity-server.socket
Copy link
Copy Markdown

@coderabbitai coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify socket path consistency between server pod and egress selector config
echo "=== Konnectivity server UDS path ==="
rg -n "uds-name" data/data/bootstrap/files/opt/openshift/konnectivity-server-pod.yaml

echo ""
echo "=== Egress selector config udsName ==="
cat data/data/bootstrap/files/opt/openshift/egress-selector-config.yaml | grep -A2 -B2 "udsName"

Repository: openshift/installer

Length of output: 348


Socket path mismatch detected between server pod and egress selector config.

The konnectivity server pod creates its socket at /etc/kubernetes/bootstrap-configs/konnectivity-server.socket (line 21), but the egress selector config references /etc/kubernetes/config/konnectivity-server.socket. These paths must match, or the API server will fail to communicate with the konnectivity server. Either the pod's --uds-name flag or the egress config's udsName field must be corrected to use the same path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/data/bootstrap/files/opt/openshift/konnectivity-server-pod.yaml` at line
21, The konnectivity server pod's UDS path
(--uds-name=/etc/kubernetes/bootstrap-configs/konnectivity-server.socket) does
not match the egress selector config's udsName
(/etc/kubernetes/config/konnectivity-server.socket); update one side so both use
the identical socket path—either change the pod's --uds-name flag in
konnectivity-server-pod.yaml to
/etc/kubernetes/config/konnectivity-server.socket or change the egress
selector's udsName to
/etc/kubernetes/bootstrap-configs/konnectivity-server.socket—ensure the same
exact filename and directory are used so the API server and konnectivity server
can connect.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the path is correct. The /etc/kubernetes/bootstrap-configs is mounted as etc/kubernetes/config in the bootstrap kube-apisever static pod. See here.

So, no issue here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tthvo, thanks for the clarification! I missed that the host path /etc/kubernetes/bootstrap-configs is volume-mounted as /etc/kubernetes/config inside the bootstrap kube-apiserver static pod, making the socket paths consistent. My concern was invalid.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

Comment on lines +121 to +130
switch a.failingStage {
case "konnectivity-certs":
logrus.Error("The bootstrap machine failed to generate konnectivity certificates")
case "konnectivity-server-bootstrap":
logrus.Error("The bootstrap machine failed to start the konnectivity server")
case "konnectivity-agent-manifest":
logrus.Error("The bootstrap machine failed to create konnectivity agent manifests")
case "konnectivity-cleanup":
logrus.Error("The bootstrap machine failed to clean up konnectivity resources")
}
Copy link
Copy Markdown

@coderabbitai coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add a default bootkube failure summary for unknown stages.

The new switch logs only Konnectivity stage failures. If bootkube fails in any other stage, the concise summary log is lost, which makes incident triage harder. Please add a default case that logs the failing stage.

Suggested patch
 	switch a.failingStage {
 	case "konnectivity-certs":
 		logrus.Error("The bootstrap machine failed to generate konnectivity certificates")
 	case "konnectivity-server-bootstrap":
 		logrus.Error("The bootstrap machine failed to start the konnectivity server")
 	case "konnectivity-agent-manifest":
 		logrus.Error("The bootstrap machine failed to create konnectivity agent manifests")
 	case "konnectivity-cleanup":
 		logrus.Error("The bootstrap machine failed to clean up konnectivity resources")
+	default:
+		logrus.Errorf("The bootstrap machine failed during bootkube stage %q", a.failingStage)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
switch a.failingStage {
case "konnectivity-certs":
logrus.Error("The bootstrap machine failed to generate konnectivity certificates")
case "konnectivity-server-bootstrap":
logrus.Error("The bootstrap machine failed to start the konnectivity server")
case "konnectivity-agent-manifest":
logrus.Error("The bootstrap machine failed to create konnectivity agent manifests")
case "konnectivity-cleanup":
logrus.Error("The bootstrap machine failed to clean up konnectivity resources")
}
switch a.failingStage {
case "konnectivity-certs":
logrus.Error("The bootstrap machine failed to generate konnectivity certificates")
case "konnectivity-server-bootstrap":
logrus.Error("The bootstrap machine failed to start the konnectivity server")
case "konnectivity-agent-manifest":
logrus.Error("The bootstrap machine failed to create konnectivity agent manifests")
case "konnectivity-cleanup":
logrus.Error("The bootstrap machine failed to clean up konnectivity resources")
default:
logrus.Errorf("The bootstrap machine failed during bootkube stage %q", a.failingStage)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/gather/service/analyze.go` around lines 121 - 130, The switch on
a.failingStage only handles specific konnectivity cases and omits other bootkube
stages; add a default case to the switch (in the same block that references
a.failingStage) which logs a concise summary that includes the unknown failing
stage value so failures outside the listed konnectivity cases are still reported
(use logrus.Error or the same logging pattern and include a.failingStage in the
message).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdillon This also makes sense :D

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@patrickdillon
Copy link
Copy Markdown
Contributor Author

/test e2e-vsphere-ovn e2e-nutanix-ovn e2e-metal-ipi-ovn e2e-agent-compact-ipv4 e2e-powervs-capi-ovn

@patrickdillon
Copy link
Copy Markdown
Contributor Author

/test e2e-metal-single-node-live-iso

@patrickdillon
Copy link
Copy Markdown
Contributor Author

single node live iso is a bootstrap in place install, so I'd like to see that tested to see if it is affected.

I will wait for this test to finish before pushing changes to feature gate this

@tthvo
Copy link
Copy Markdown
Member

tthvo commented Mar 25, 2026

/test e2e-metal-single-node-live-iso

Can't see any logs at all in previous run... 👀

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 26, 2026

@patrickdillon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-edge-zones 80cb9f9 link false /test e2e-aws-ovn-edge-zones
ci/prow/e2e-aws-ovn-shared-vpc-edge-zones 80cb9f9 link false /test e2e-aws-ovn-shared-vpc-edge-zones
ci/prow/e2e-aws-ovn-single-node 80cb9f9 link false /test e2e-aws-ovn-single-node
ci/prow/e2e-gcp-xpn-dedicated-dns-project 80cb9f9 link false /test e2e-gcp-xpn-dedicated-dns-project
ci/prow/e2e-openstack-proxy 80cb9f9 link false /test e2e-openstack-proxy
ci/prow/e2e-gcp-custom-endpoints 80cb9f9 link false /test e2e-gcp-custom-endpoints
ci/prow/e2e-azurestack 80cb9f9 link false /test e2e-azurestack
ci/prow/e2e-aws-ovn-heterogeneous 80cb9f9 link false /test e2e-aws-ovn-heterogeneous
ci/prow/e2e-openstack-ovn 80cb9f9 link true /test e2e-openstack-ovn
ci/prow/gcp-custom-endpoints-proxy-wif 80cb9f9 link false /test gcp-custom-endpoints-proxy-wif
ci/prow/e2e-gcp-ovn-byo-vpc 80cb9f9 link false /test e2e-gcp-ovn-byo-vpc
ci/prow/e2e-openstack-nfv-intel 80cb9f9 link false /test e2e-openstack-nfv-intel
ci/prow/e2e-ibmcloud-ovn 80cb9f9 link false /test e2e-ibmcloud-ovn
ci/prow/e2e-gcp-xpn-custom-dns 80cb9f9 link false /test e2e-gcp-xpn-custom-dns
ci/prow/e2e-metal-single-node-live-iso 80cb9f9 link false /test e2e-metal-single-node-live-iso
ci/prow/e2e-powervs-capi-ovn 80cb9f9 link false /test e2e-powervs-capi-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@gpei
Copy link
Copy Markdown
Contributor

gpei commented Mar 26, 2026

/payload-job periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-baremetal-sno-ipv4-etcd-encryption-rt-kernel-kdump-f7

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 26, 2026

@gpei: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-baremetal-sno-ipv4-etcd-encryption-rt-kernel-kdump-f7

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/da240ba0-28d0-11f1-8876-d000e64a46c3-0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants