OCPBUGS-78557: Re-Apply Fix admin context race condition in parallel tests#1259
Conversation
WalkthroughExtends CLI state with admin context caching and temporary kubeconfig path handling for concurrent test execution. Adds safe, error-returning file-operation helpers for initialization phases, enabling error propagation instead of Gomega assertions. Refactors existing file utilities to delegate to new safe variants. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip You can customize the high-level summary generated by CodeRabbit.Configure the |
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv4 |
|
@kuiwang02: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/01e7d5f0-2268-11f1-8c88-608126d8fa68-0 |
bc0e988 to
5ef3f99
Compare
|
/retitle OCPBUGS-78557: Re-Apply Fix admin context race condition in parallel tests |
|
@kuiwang02: This pull request references Jira Issue OCPBUGS-78557, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@kuiwang02: This pull request references Jira Issue OCPBUGS-78557, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@openshift-ci-robot: GitHub didn't allow me to request PR reviews from the following users: kuiwang02. Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
5ef3f99 to
a3a9389
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests-extension/test/qe/util/client.go (1)
1230-1240: Global cache assumes uniform kubeconfig across tests.The
sync.Oncecaches the admin context name from the first kubeconfig file encountered. If tests use different kubeconfig files with different admin context names, only the first will be cached. This is likely acceptable for this test framework's usage patterns, but worth noting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/qe/util/client.go` around lines 1230 - 1240, The current getAdminContextNameCached uses a single globalAdmContextOnce/globalAdmContextName so the first kubeConfigFile wins; change it to cache per-file instead: replace the single globalAdmContextOnce/globalAdmContextName with a map (or sync.Map) keyed by kubeConfigFile and protected by a mutex, then update getAdminContextNameCached to check the map for kubeConfigFile, call getAdminContextName(kubeConfigFile) only if missing, store the result in the map, and log using e2e.Logf; keep function name getAdminContextNameCached and use the new cache map (or sync.Map) and mutex symbols to locate and modify related code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests-extension/test/qe/util/client.go`:
- Around line 1230-1240: The current getAdminContextNameCached uses a single
globalAdmContextOnce/globalAdmContextName so the first kubeConfigFile wins;
change it to cache per-file instead: replace the single
globalAdmContextOnce/globalAdmContextName with a map (or sync.Map) keyed by
kubeConfigFile and protected by a mutex, then update getAdminContextNameCached
to check the map for kubeConfigFile, call getAdminContextName(kubeConfigFile)
only if missing, store the result in the map, and log using e2e.Logf; keep
function name getAdminContextNameCached and use the new cache map (or sync.Map)
and mutex symbols to locate and modify related code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dd631288-c980-4abe-bdf5-acc601bd7ba5
📒 Files selected for processing (2)
tests-extension/test/qe/util/client.gotests-extension/test/qe/util/tools.go
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv4 |
|
@kuiwang02: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7d274ce0-227d-11f1-8df9-129e606b316c-0 |
|
/payload-job periodic-ci-openshift-operator-framework-olm-release-4.22-periodics-e2e-aws-ovn-proxy-extended-f2 |
|
@kuiwang02: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/97721ad0-227d-11f1-870b-8fd11430f706-0 |
|
/payload-job periodic-ci-openshift-operator-framework-olm-release-4.22-periodics-e2e-aws-ovn-fips-techpreview-extended-f1 |
|
@kuiwang02: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/9dbd95e0-227d-11f1-8930-28b77953248d-0 |
|
/test e2e-gcp-ovn |
|
/approve |
|
Does this mean that #1257 should be updated as well? |
|
/verified by @kuiwang02 |
|
@kuiwang02: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jianzhangbjz, kuiwang02, tmshort The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@kuiwang02: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@kuiwang02: Jira Issue Verification Checks: Jira Issue OCPBUGS-78557 Jira Issue OCPBUGS-78557 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/cherry-pick release-4.21 |
|
@kuiwang02: new pull request created: #1261 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cc @jianzhangbjz @bandrade @Xia-Zhao-rh
Fix admin context race condition in parallel tests
This PR fixes an intermittent test failure where
AsAdmin()occasionally uses the wrong service account instead of cluster-admin credentials, caused by concurrent tests sharing the same kubeconfig file.Background
PR #1256 introduced a fix using double insurance (per-test temp kubeconfig files + explicit
--context=adminflag), but was reverted by PR #1258 due to log pollution inimagescommand.Root cause of revert: The fix initialized admin context during test tree construction (in
NewCLI()), which caused logs to appear when runningimagescommands that only build the test tree without executing tests.Changes in This PR
This PR reintroduces the race condition fix with lazy-loading pattern to prevent log pollution:
admContextNameis initialized on firstAsAdmin()call (during test execution), not inNewCLI()(during tree construction)verbose boolparameter toduplicateFileToTempOrEmpty()to control loggingverbose=false(silent during tree construction)verbose=true(logs during actual test runs)Assisted-By: Claude Code