Skip to content

TRT-2580: Revert "OCPBUGS-78557: Fix admin context race condition in parallel tests"#1258

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
smg247:revert-pr-1256
Mar 17, 2026
Merged

TRT-2580: Revert "OCPBUGS-78557: Fix admin context race condition in parallel tests"#1258
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
smg247:revert-pr-1256

Conversation

@smg247
Copy link
Member

@smg247 smg247 commented Mar 17, 2026

This reverts #1256 (merge commit c6dfbbd).

Why? This PR is causing blocking job failures (periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv4, periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6) in the nightly amd64 payload 4.22.0-0.nightly-2026-03-17-104634.

The olmv0-tests-ext binary emits a klog line Registered Plugin containerd to stdout during test listing, corrupting the JSON output stream. openshift-tests fails with invalid character I looking for beginning of value. The cluster installs successfully but zero tests execute.

/cc @kuiwang02


To unrevert: Please revert this revert PR and include a fix for the klog stdout corruption. Verify that the following jobs pass before re-landing:

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv4
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6

Override commands (for CI jobs that may not pass on the revert itself):

/override ci/prow/e2e-aws-olmv0-ext
/override ci/prow/e2e-aws-upgrade-ovn-single-node
/override ci/prow/e2e-gcp-ovn
/override ci/prow/e2e-upgrade
/override ci/prow/tests-extension

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 17, 2026
@openshift-ci-robot
Copy link

@smg247: This pull request references Jira Issue OCPBUGS-78557, which is invalid:

  • expected the bug to be in one of the following states: NEW, ASSIGNED, POST, but it is ON_QA instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This reverts #1256 (merge commit c6dfbbd).

Why? This PR is causing blocking job failures (periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv4, periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6) in the nightly amd64 payload 4.22.0-0.nightly-2026-03-17-104634.

The olmv0-tests-ext binary emits a klog line Registered Plugin containerd to stdout during test listing, corrupting the JSON output stream. openshift-tests fails with invalid character I looking for beginning of value. The cluster installs successfully but zero tests execute.

/cc @kuiwang02


To unrevert: Please revert this revert PR and include a fix for the klog stdout corruption. Verify that the following jobs pass before re-landing:

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv4
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6

Override commands (for CI jobs that may not pass on the revert itself):

/override ci/prow/e2e-aws-olmv0-ext
/override ci/prow/e2e-aws-upgrade-ovn-single-node
/override ci/prow/e2e-gcp-ovn
/override ci/prow/e2e-upgrade
/override ci/prow/tests-extension

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from kuiwang02 March 17, 2026 18:33
@coderabbitai
Copy link

coderabbitai bot commented Mar 17, 2026

Walkthrough

This pull request removes global admin-context caching infrastructure from the CLI struct and refactors file duplication utilities to use Gomega assertions, consolidating error-returning helpers into assertion-based approach.

Changes

Cohort / File(s) Summary
Admin Context Simplification
tests-extension/test/qe/util/client.go
Removed fields tempAdmConfPath, admContextName, and isUseAdminCxt from CLI struct. Simplified AsAdmin and Run methods to directly use shared admin config path instead of temporary per-test paths. Eliminated functions and state for admin-context discovery and caching (getAdminContextName, getAdminContextNameCached, globalAdmContextName, globalAdmContextOnce) and removed associated temp file cleanup logic.
File Duplication Utilities Refactoring
tests-extension/test/qe/util/tools.go
Introduced new public function DuplicateFileToPath(srcPath, destPath string) with Gomega assertions for error handling. Removed private helpers duplicateFileToPathSafe, duplicateFileToTempSafe, and duplicateFileToTempOrEmpty. Refactored DuplicateFileToTemp to create a temporary file and delegate copying logic to DuplicateFileToPath.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can approve the review once all CodeRabbit's comments are resolved.

Enable the reviews.request_changes_workflow setting to automatically approve the review once all CodeRabbit's comments are resolved.

@smg247
Copy link
Member Author

smg247 commented Mar 17, 2026

/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv4
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 17, 2026

@smg247: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv4
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ed283ac0-222f-11f1-9282-2bfd5c4f2a15-0

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 17, 2026

@smg247: Overrode contexts on behalf of smg247: ci/prow/e2e-aws-olmv0-ext, ci/prow/e2e-aws-upgrade-ovn-single-node, ci/prow/e2e-gcp-ovn, ci/prow/e2e-upgrade, ci/prow/tests-extension

Details

In response to this:

This reverts #1256 (merge commit c6dfbbd).

Why? This PR is causing blocking job failures (periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv4, periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6) in the nightly amd64 payload 4.22.0-0.nightly-2026-03-17-104634.

The olmv0-tests-ext binary emits a klog line Registered Plugin containerd to stdout during test listing, corrupting the JSON output stream. openshift-tests fails with invalid character I looking for beginning of value. The cluster installs successfully but zero tests execute.

/cc @kuiwang02


To unrevert: Please revert this revert PR and include a fix for the klog stdout corruption. Verify that the following jobs pass before re-landing:

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv4
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6

Override commands (for CI jobs that may not pass on the revert itself):

/override ci/prow/e2e-aws-olmv0-ext
/override ci/prow/e2e-aws-upgrade-ovn-single-node
/override ci/prow/e2e-gcp-ovn
/override ci/prow/e2e-upgrade
/override ci/prow/tests-extension

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 17, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: smg247
Once this PR has been reviewed and has the lgtm label, please assign oceanc80 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests-extension/test/qe/util/client.go`:
- Line 176: The assignment nc.configPath = c.adminConfigPath reintroduces a
shared, mutable admin kubeconfig and must be changed to preserve per-test/admin
isolation; stop routing runtime client config to the shared c.adminConfigPath
and instead ensure each test gets its own immutable config (e.g., create and
assign a per-test temp kubeconfig or a separate field like nc.adminConfigPath)
or use the intended runtime config variable (not c.adminConfigPath) when
initializing nc.configPath; update the analogous occurrences referenced (around
the other spots noted) so no code assigns nc.configPath from the shared
c.adminConfigPath and instead uses isolated per-test copies or the proper
runtime config variable.

In `@tests-extension/test/qe/util/tools.go`:
- Around line 63-68: The code closes the temp file (destFile.Close) and then
reopens it by path via DuplicateFileToPath, creating a TOCTOU race; instead keep
the temp file open and copy into its descriptor to avoid the window. Modify the
call site in tests-extension/test/qe/util/tools.go to not close destFile before
copying and either (a) add a new helper
DuplicateFileToFile/DuplicateFileToWriter that accepts *os.File or io.Writer and
copies srcPath contents into destFile using io.Copy, or (b) change
DuplicateFileToPath to accept an io.Writer and use destFile directly; after the
copy, close destFile. Ensure you reference destFile, DuplicateFileToPath (or the
new DuplicateFileToFile helper), and remove the destFile.Close() before the
copy.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7b8f64bc-ec4b-4a71-a72c-93f97f8bc44e

📥 Commits

Reviewing files that changed from the base of the PR and between c6dfbbd and 75b5a4e.

📒 Files selected for processing (2)
  • tests-extension/test/qe/util/client.go
  • tests-extension/test/qe/util/tools.go


nc.isUseAdminCxt = true

nc.configPath = c.adminConfigPath
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This reintroduces shared admin-kubeconfig race risk in parallel specs.

These lines route admin and run-time kubeconfig selection back to shared adminConfigPath/configPath semantics. That restores the original shared mutable kubeconfig surface (the race from OCPBUGS-78557), so context-mutating oc calls can interfere across parallel tests again.

Please keep the revert if needed for the current blocker, but preserve per-test/admin-isolated kubeconfig files (or equivalent immutable context handling) in the re-landing fix so both issues are addressed together.

Also applies to: 618-619, 674-674

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests-extension/test/qe/util/client.go` at line 176, The assignment
nc.configPath = c.adminConfigPath reintroduces a shared, mutable admin
kubeconfig and must be changed to preserve per-test/admin isolation; stop
routing runtime client config to the shared c.adminConfigPath and instead ensure
each test gets its own immutable config (e.g., create and assign a per-test temp
kubeconfig or a separate field like nc.adminConfigPath) or use the intended
runtime config variable (not c.adminConfigPath) when initializing nc.configPath;
update the analogous occurrences referenced (around the other spots noted) so no
code assigns nc.configPath from the shared c.adminConfigPath and instead uses
isolated per-test copies or the proper runtime config variable.

Comment on lines +63 to +68
destFile, err := os.CreateTemp(os.TempDir(), destPrefix)
o.Expect(err).NotTo(o.HaveOccurred(), "Failed to create temporary file")
o.Expect(destFile.Close()).NotTo(o.HaveOccurred(), "Failed to close temporary file")

destPath := destFile.Name()
DuplicateFileToPath(srcPath, destPath)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid close-then-reopen on temp files (TOCTOU window).

Line 65 closes the temp file, and Line 68 reopens by path via DuplicateFileToPath. That introduces a race window where the path can be swapped before copy. Copy into the already-open temp fd instead.

Suggested fix
 func DuplicateFileToTemp(srcPath string, destPrefix string) string {
-	destFile, err := os.CreateTemp(os.TempDir(), destPrefix)
+	destFile, err := os.CreateTemp(os.TempDir(), destPrefix)
 	o.Expect(err).NotTo(o.HaveOccurred(), "Failed to create temporary file")
-	o.Expect(destFile.Close()).NotTo(o.HaveOccurred(), "Failed to close temporary file")
-
-	destPath := destFile.Name()
-	DuplicateFileToPath(srcPath, destPath)
+	srcFile, err := os.Open(filepath.Clean(srcPath))
+	o.Expect(err).NotTo(o.HaveOccurred(), "Failed to open source file")
+	defer func() { o.Expect(srcFile.Close()).NotTo(o.HaveOccurred()) }()
+
+	_, err = io.Copy(destFile, srcFile)
+	o.Expect(err).NotTo(o.HaveOccurred(), "Failed to copy source file to temporary file")
+	o.Expect(destFile.Sync()).NotTo(o.HaveOccurred(), "Failed to sync temporary file")
+	o.Expect(destFile.Close()).NotTo(o.HaveOccurred(), "Failed to close temporary file")
+
+	destPath := destFile.Name()
 	return destPath
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests-extension/test/qe/util/tools.go` around lines 63 - 68, The code closes
the temp file (destFile.Close) and then reopens it by path via
DuplicateFileToPath, creating a TOCTOU race; instead keep the temp file open and
copy into its descriptor to avoid the window. Modify the call site in
tests-extension/test/qe/util/tools.go to not close destFile before copying and
either (a) add a new helper DuplicateFileToFile/DuplicateFileToWriter that
accepts *os.File or io.Writer and copies srcPath contents into destFile using
io.Copy, or (b) change DuplicateFileToPath to accept an io.Writer and use
destFile directly; after the copy, close destFile. Ensure you reference
destFile, DuplicateFileToPath (or the new DuplicateFileToFile helper), and
remove the destFile.Close() before the copy.

@smg247 smg247 changed the title Revert "OCPBUGS-78557: Fix admin context race condition in parallel tests" TRT-2580: Revert "OCPBUGS-78557: Fix admin context race condition in parallel tests" Mar 17, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 17, 2026

@smg247: This pull request references TRT-2580 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This reverts #1256 (merge commit c6dfbbd).

Why? This PR is causing blocking job failures (periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv4, periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6) in the nightly amd64 payload 4.22.0-0.nightly-2026-03-17-104634.

The olmv0-tests-ext binary emits a klog line Registered Plugin containerd to stdout during test listing, corrupting the JSON output stream. openshift-tests fails with invalid character I looking for beginning of value. The cluster installs successfully but zero tests execute.

/cc @kuiwang02


To unrevert: Please revert this revert PR and include a fix for the klog stdout corruption. Verify that the following jobs pass before re-landing:

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv4
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6

Override commands (for CI jobs that may not pass on the revert itself):

/override ci/prow/e2e-aws-olmv0-ext
/override ci/prow/e2e-aws-upgrade-ovn-single-node
/override ci/prow/e2e-gcp-ovn
/override ci/prow/e2e-upgrade
/override ci/prow/tests-extension

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Mar 17, 2026
@stbenjam
Copy link
Member

I think this is probably safe to just merge, it is repo's HEAD

/override ci/prow/e2e-aws-olmv0-ext
/override ci/prow/e2e-aws-upgrade-ovn-single-node
/override ci/prow/e2e-gcp-ovn
/override ci/prow/e2e-upgrade
/override ci/prow/tests-extension
/lgtm
/label approved

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 17, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 17, 2026

@stbenjam: Overrode contexts on behalf of stbenjam: ci/prow/e2e-aws-olmv0-ext, ci/prow/e2e-aws-upgrade-ovn-single-node, ci/prow/e2e-gcp-ovn, ci/prow/e2e-upgrade, ci/prow/tests-extension

Details

In response to this:

I think this is probably safe to just merge, it is repo's HEAD

/override ci/prow/e2e-aws-olmv0-ext
/override ci/prow/e2e-aws-upgrade-ovn-single-node
/override ci/prow/e2e-gcp-ovn
/override ci/prow/e2e-upgrade
/override ci/prow/tests-extension
/lgtm
/label approved

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 17, 2026
@stbenjam
Copy link
Member

/verified by stbenjam

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 17, 2026
@openshift-ci-robot
Copy link

@stbenjam: This PR has been marked as verified by stbenjam.

Details

In response to this:

/verified by stbenjam

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 64e80c7 into openshift:main Mar 17, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants