TRT-2580: Revert "OCPBUGS-78557: Fix admin context race condition in parallel tests"#1258
Conversation
|
@smg247: This pull request references Jira Issue OCPBUGS-78557, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThis pull request removes global admin-context caching infrastructure from the CLI struct and refactors file duplication utilities to use Gomega assertions, consolidating error-returning helpers into assertion-based approach. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip CodeRabbit can approve the review once all CodeRabbit's comments are resolved.Enable the |
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv4 |
|
@smg247: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ed283ac0-222f-11f1-9282-2bfd5c4f2a15-0 |
|
@smg247: Overrode contexts on behalf of smg247: ci/prow/e2e-aws-olmv0-ext, ci/prow/e2e-aws-upgrade-ovn-single-node, ci/prow/e2e-gcp-ovn, ci/prow/e2e-upgrade, ci/prow/tests-extension DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: smg247 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests-extension/test/qe/util/client.go`:
- Line 176: The assignment nc.configPath = c.adminConfigPath reintroduces a
shared, mutable admin kubeconfig and must be changed to preserve per-test/admin
isolation; stop routing runtime client config to the shared c.adminConfigPath
and instead ensure each test gets its own immutable config (e.g., create and
assign a per-test temp kubeconfig or a separate field like nc.adminConfigPath)
or use the intended runtime config variable (not c.adminConfigPath) when
initializing nc.configPath; update the analogous occurrences referenced (around
the other spots noted) so no code assigns nc.configPath from the shared
c.adminConfigPath and instead uses isolated per-test copies or the proper
runtime config variable.
In `@tests-extension/test/qe/util/tools.go`:
- Around line 63-68: The code closes the temp file (destFile.Close) and then
reopens it by path via DuplicateFileToPath, creating a TOCTOU race; instead keep
the temp file open and copy into its descriptor to avoid the window. Modify the
call site in tests-extension/test/qe/util/tools.go to not close destFile before
copying and either (a) add a new helper
DuplicateFileToFile/DuplicateFileToWriter that accepts *os.File or io.Writer and
copies srcPath contents into destFile using io.Copy, or (b) change
DuplicateFileToPath to accept an io.Writer and use destFile directly; after the
copy, close destFile. Ensure you reference destFile, DuplicateFileToPath (or the
new DuplicateFileToFile helper), and remove the destFile.Close() before the
copy.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7b8f64bc-ec4b-4a71-a72c-93f97f8bc44e
📒 Files selected for processing (2)
tests-extension/test/qe/util/client.gotests-extension/test/qe/util/tools.go
|
|
||
| nc.isUseAdminCxt = true | ||
|
|
||
| nc.configPath = c.adminConfigPath |
There was a problem hiding this comment.
This reintroduces shared admin-kubeconfig race risk in parallel specs.
These lines route admin and run-time kubeconfig selection back to shared adminConfigPath/configPath semantics. That restores the original shared mutable kubeconfig surface (the race from OCPBUGS-78557), so context-mutating oc calls can interfere across parallel tests again.
Please keep the revert if needed for the current blocker, but preserve per-test/admin-isolated kubeconfig files (or equivalent immutable context handling) in the re-landing fix so both issues are addressed together.
Also applies to: 618-619, 674-674
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests-extension/test/qe/util/client.go` at line 176, The assignment
nc.configPath = c.adminConfigPath reintroduces a shared, mutable admin
kubeconfig and must be changed to preserve per-test/admin isolation; stop
routing runtime client config to the shared c.adminConfigPath and instead ensure
each test gets its own immutable config (e.g., create and assign a per-test temp
kubeconfig or a separate field like nc.adminConfigPath) or use the intended
runtime config variable (not c.adminConfigPath) when initializing nc.configPath;
update the analogous occurrences referenced (around the other spots noted) so no
code assigns nc.configPath from the shared c.adminConfigPath and instead uses
isolated per-test copies or the proper runtime config variable.
| destFile, err := os.CreateTemp(os.TempDir(), destPrefix) | ||
| o.Expect(err).NotTo(o.HaveOccurred(), "Failed to create temporary file") | ||
| o.Expect(destFile.Close()).NotTo(o.HaveOccurred(), "Failed to close temporary file") | ||
|
|
||
| destPath := destFile.Name() | ||
| DuplicateFileToPath(srcPath, destPath) |
There was a problem hiding this comment.
Avoid close-then-reopen on temp files (TOCTOU window).
Line 65 closes the temp file, and Line 68 reopens by path via DuplicateFileToPath. That introduces a race window where the path can be swapped before copy. Copy into the already-open temp fd instead.
Suggested fix
func DuplicateFileToTemp(srcPath string, destPrefix string) string {
- destFile, err := os.CreateTemp(os.TempDir(), destPrefix)
+ destFile, err := os.CreateTemp(os.TempDir(), destPrefix)
o.Expect(err).NotTo(o.HaveOccurred(), "Failed to create temporary file")
- o.Expect(destFile.Close()).NotTo(o.HaveOccurred(), "Failed to close temporary file")
-
- destPath := destFile.Name()
- DuplicateFileToPath(srcPath, destPath)
+ srcFile, err := os.Open(filepath.Clean(srcPath))
+ o.Expect(err).NotTo(o.HaveOccurred(), "Failed to open source file")
+ defer func() { o.Expect(srcFile.Close()).NotTo(o.HaveOccurred()) }()
+
+ _, err = io.Copy(destFile, srcFile)
+ o.Expect(err).NotTo(o.HaveOccurred(), "Failed to copy source file to temporary file")
+ o.Expect(destFile.Sync()).NotTo(o.HaveOccurred(), "Failed to sync temporary file")
+ o.Expect(destFile.Close()).NotTo(o.HaveOccurred(), "Failed to close temporary file")
+
+ destPath := destFile.Name()
return destPath
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests-extension/test/qe/util/tools.go` around lines 63 - 68, The code closes
the temp file (destFile.Close) and then reopens it by path via
DuplicateFileToPath, creating a TOCTOU race; instead keep the temp file open and
copy into its descriptor to avoid the window. Modify the call site in
tests-extension/test/qe/util/tools.go to not close destFile before copying and
either (a) add a new helper DuplicateFileToFile/DuplicateFileToWriter that
accepts *os.File or io.Writer and copies srcPath contents into destFile using
io.Copy, or (b) change DuplicateFileToPath to accept an io.Writer and use
destFile directly; after the copy, close destFile. Ensure you reference
destFile, DuplicateFileToPath (or the new DuplicateFileToFile helper), and
remove the destFile.Close() before the copy.
|
@smg247: This pull request references TRT-2580 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
I think this is probably safe to just merge, it is repo's HEAD /override ci/prow/e2e-aws-olmv0-ext |
|
@stbenjam: Overrode contexts on behalf of stbenjam: ci/prow/e2e-aws-olmv0-ext, ci/prow/e2e-aws-upgrade-ovn-single-node, ci/prow/e2e-gcp-ovn, ci/prow/e2e-upgrade, ci/prow/tests-extension DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/verified by stbenjam |
|
@stbenjam: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
This reverts #1256 (merge commit c6dfbbd).
Why? This PR is causing blocking job failures (periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv4, periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6) in the nightly amd64 payload 4.22.0-0.nightly-2026-03-17-104634.
The olmv0-tests-ext binary emits a klog line
Registered Plugin containerdto stdout during test listing, corrupting the JSON output stream. openshift-tests fails withinvalid character I looking for beginning of value. The cluster installs successfully but zero tests execute./cc @kuiwang02
To unrevert: Please revert this revert PR and include a fix for the klog stdout corruption. Verify that the following jobs pass before re-landing:
Override commands (for CI jobs that may not pass on the revert itself):