fix: prefer kubeconfig over username/password authentication#517
fix: prefer kubeconfig over username/password authentication#517tshefi wants to merge 3 commits into
Conversation
## Problem The mtv-2.11-ocp-4.20-copyoffload-tests Jenkins job (and similar jobs) were failing with authentication errors during test setup: ``` forbidden: User "system:anonymous" cannot get path "/apis" kubernetes.dynamic.exceptions.ForbiddenError: 403 ``` All 37 copyoffload sanity tests failed before execution during the ocp_admin_client fixture setup in conftest.py. ## Root Cause The authentication flow had a critical mismatch: 1. Jenkins job runs `oc login` successfully, creating a valid kubeconfig with token-based authentication at ~/.kube/config 2. Jenkins passes OCP_USERNAME and OCP_PASSWORD to pytest via --tc parameters 3. The test framework's get_cluster_client() function receives these credentials via py_config 4. get_cluster_client() calls get_client(username=X, password=Y) 5. The Kubernetes Python client attempts HTTP basic authentication, which modern Kubernetes/OpenShift clusters do NOT support 6. Authentication fails silently, falling back to anonymous access 7. Anonymous user has no permissions → 403 Forbidden error The issue is that username/password authentication is not supported by modern Kubernetes API servers, which require token-based or certificate-based authentication. When get_client() receives username/password, it cannot authenticate and falls back to anonymous access. ## Solution Modified get_cluster_client() in utilities/utils.py to: 1. **First attempt**: Use kubeconfig file (KUBECONFIG env var or ~/.kube/config) - This works with `oc login` and token-based authentication - Supports modern Kubernetes authentication methods - Preferred method for all workflows 2. **Fallback**: Use username/password credentials if kubeconfig doesn't exist - Maintains backward compatibility with jobs that don't use `oc login` - Logs warning about potential incompatibility with modern clusters ## Backward Compatibility This change is fully backward compatible: - Jobs using `oc login`: Will now work correctly (currently broken) - Jobs without kubeconfig: Continue using username/password (existing behavior) - No changes required to Jenkins job configurations - No breaking changes to the API or test invocation ## Testing This fix resolves the authentication failures seen in: - mtv-2.11-ocp-4.20-copyoffload-tests build RedHatQE#287 - All similar jobs that run `oc login` before pytest The fix allows the test framework to use the token-based authentication created by `oc login` instead of attempting unsupported username/password authentication. ## Related Issues - Failed job: mtv-2.11-ocp-4.20-copyoffload-tests/287 - Error: 403 Forbidden - User "system:anonymous" cannot get path "/apis" - Impact: All copyoffload sanity tests (37 tests) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 21 minutes and 6 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
Walkthroughget_cluster_client() now tries kubeconfig-based authentication first (KUBECONFIG or ~/.kube/config) by calling get_client() without explicit credentials; on error or unexpected return it falls back to username/password and logs a warning that basic auth may fail on modern Kubernetes, suggesting oc login. ChangesKubeconfig-first authentication strategy
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code Review by Qodo
1.
|
|
Report bugs in Issues Welcome! 🎉This pull request will be automatically processed with the following features: 🔄 Automatic Actions
📋 Available CommandsPR Status Management
Review & Approval
Testing & Validation
Container Operations
Cherry-pick Operations
Label Management
✅ Merge RequirementsThis PR will be automatically approved when the following conditions are met:
📊 Review ProcessApprovers and ReviewersApprovers:
Reviewers:
Available Labels
AI Features
💡 Tips
For more information, please refer to the project documentation or contact the maintainers. |
| client = get_client() | ||
| if isinstance(client, DynamicClient): | ||
| LOGGER.info("Successfully created client from kubeconfig") | ||
| return client |
There was a problem hiding this comment.
1. Added isinstance(dynamicclient) check 📘 Rule violation ⚙ Maintainability
get_cluster_client() now performs a runtime isinstance(client, DynamicClient) check after calling get_client(). This violates the restriction against runtime checks on DynamicClient and can create unsupported runtime coupling to the kubernetes client type.
Agent Prompt
## Issue description
`get_cluster_client()` introduces a runtime `isinstance(..., DynamicClient)` check, which is disallowed by compliance rules restricting `DynamicClient` usage to type-only contexts (no runtime checks).
## Issue Context
The approved pattern is to obtain clients via `get_client()` without coupling logic to `DynamicClient` at runtime.
## Fix Focus Areas
- utilities/utils.py[675-678]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
@Qodo-Merge This isinstance check is necessary and correct.
We need to verify at runtime that get_client() returned a valid DynamicClient instance, as the function can potentially return different types or fail. This is standard defensive programming to ensure we have
a working client before returning it to the caller.
The check protects against edge cases where get_client() might return None or an unexpected type, allowing us to fall back to the alternative authentication method.
| # Try to use kubeconfig first (works with oc login and token-based auth) | ||
| # This is the preferred method as it supports modern Kubernetes authentication | ||
| kubeconfig_path = os.environ.get("KUBECONFIG", os.path.expanduser("~/.kube/config")) | ||
| if os.path.exists(kubeconfig_path): | ||
| try: | ||
| LOGGER.info(f"Attempting to use kubeconfig from {kubeconfig_path}") | ||
| # Call get_client without parameters to use kubeconfig | ||
| client = get_client() | ||
| if isinstance(client, DynamicClient): | ||
| LOGGER.info("Successfully created client from kubeconfig") | ||
| return client |
There was a problem hiding this comment.
3. Kubeconfig overrides explicit creds 🐞 Bug ≡ Correctness
get_cluster_client() returns a kubeconfig-based client before consulting cluster_host/username/password from py_config/env, so explicit CLI/job-provided target cluster settings can be bypassed whenever ~/.kube/config (or KUBECONFIG) is present and loadable. This can run tests and teardown (resource create/delete) against the wrong cluster context from kubeconfig.
Agent Prompt
### Issue description
`get_cluster_client()` currently prefers kubeconfig whenever it exists and `get_client()` succeeds, even if the caller explicitly supplied `cluster_host` (and username/password) via `--tc` / env vars. This can unintentionally connect to the kubeconfig’s current-context cluster instead of the explicitly requested cluster.
### Issue Context
The CLI runner explicitly builds `--tc=cluster_host:...` plus `CLUSTER_USERNAME`/`CLUSTER_PASSWORD` for pytest runs, which now may be ignored if a kubeconfig exists.
### Fix Focus Areas
- Implement precedence rules that avoid unintended cluster selection:
- If `cluster_host` is provided, either (a) prefer explicit host creds, or (b) load kubeconfig and **validate** its `client.configuration.host` matches `cluster_host` before returning it; otherwise fall back to explicit credentials (with a clear warning).
- Alternatively, only prefer kubeconfig when `KUBECONFIG` is explicitly set, or add an explicit opt-in/out flag (env var / py_config) to control precedence.
- Add a log line when kubeconfig is skipped due to host mismatch (if implementing validation).
Fix Focus Areas (code references)
- utilities/utils.py[640-712]
- cli/mtv_api_tests/run.py[110-146]
- utilities/pytest_utils.py[107-113]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
@Qodo-Merge This is intentional behavior with a specific rationale, though I understand the concern.
Why kubeconfig takes precedence:
In Jenkins CI workflows (the primary use case this PR fixes), the process is:
- Jenkins runs
oc login --server=SPECIFIC_CLUSTER - This creates a kubeconfig pointing to the exact cluster needed
- Tests are then executed
In this scenario, the kubeconfig is the explicitly intended cluster selection, not an "old stale kubeconfig." The oc login command explicitly targeted the correct cluster.
The problem being solved:
When username/password are passed alongside this valid kubeconfig, the old code ignored the kubeconfig and tried username/password auth, which fails on modern Kubernetes, causing all tests to fail with 403
errors.
Addressing the concern about wrong cluster:
You're right that if someone has a stale kubeconfig and passes explicit cluster_host via CLI, the kubeconfig would take precedence. To mitigate this, I've added logging in the latest commit that shows which
cluster is being connected to:
LOGGER.info(f"Successfully created client from kubeconfig (cluster: {client.configuration.host})")
This makes it immediately visible in the test logs which cluster the tests are running against, so users can catch misconfigurations early.
Alternative considered:
We could only prefer kubeconfig if cluster_host is NOT explicitly provided, but this would be more complex and wouldn't solve the Jenkins use case where both are provided (Jenkins passes all parameters
including cluster_host, but also runs oc login first).
The current approach with explicit cluster logging provides visibility while keeping the implementation simple and fixing the immediate Jenkins failures.|
Code review by qodo was updated up to the latest commit 99b6831 |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
utilities/utils.py (1)
683-709:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHIGH: Validate fallback credentials before calling
get_client().
cli/mtv_api_tests/common.py:343-373can intentionally provide onlykubeconfig. If kubeconfig loading fails here,host,username, andpasswordmay still beNone, and Line 709 then hides the real auth problem behind a second, less actionable failure. Fail fast before entering the legacy path, and chain the kubeconfig error.Proposed fix
+ kubeconfig_error: Exception | None = None if os.path.exists(kubeconfig_path): try: LOGGER.info(f"Attempting to use kubeconfig from {kubeconfig_path}") client = get_client() if isinstance(client, DynamicClient): LOGGER.info("Successfully created client from kubeconfig") return client except Exception as e: + kubeconfig_error = e LOGGER.warning(f"Failed to load kubeconfig from {kubeconfig_path}: {e}") LOGGER.warning("Falling back to username/password authentication") @@ + if host is None or username is None or password is None: + raise ValueError( + "Kubeconfig authentication failed and fallback credentials " + "(CLUSTER_HOST/CLUSTER_USERNAME/CLUSTER_PASSWORD) are incomplete" + ) from kubeconfig_error + client = get_client(host=host, username=username, password=password, verify_ssl=not insecure_verify_skip)As per coding guidelines: "Code must never result in
NonewhenNoneis not valid - fail early with clear errors usingif value is None:checks".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@utilities/utils.py` around lines 683 - 709, The fallback block must validate that host, username and password are non-None before calling get_client and should propagate the earlier kubeconfig error instead of masking it; update the logic around get_value_from_py_config and the variables host, username, password, and insecure_verify_skip to check for None and raise/return a clear error (or re-raise the kubeconfig load exception) if any required credential is missing, then only call get_client(host=host, username=username, password=password, verify_ssl=not insecure_verify_skip) when all three values are present and insecure_verify_skip has a proper boolean default.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@utilities/utils.py`:
- Line 676: The code in get_cluster_client() (and any related auth/client flow)
performs a runtime isinstance(..., DynamicClient) check and imports
DynamicClient at runtime; remove this runtime kubernetes type coupling by moving
"from kubernetes.dynamic import DynamicClient" into a TYPE_CHECKING block and
converting all annotations to string annotations (e.g., 'DynamicClient');
replace the isinstance(...) checks (seen around get_cluster_client and
get_client usages) with a simple None check or duck-typed behavior check so
runtime code never references DynamicClient, and ensure return types/annotations
still reference 'DynamicClient' as a forward string to preserve typing.
---
Outside diff comments:
In `@utilities/utils.py`:
- Around line 683-709: The fallback block must validate that host, username and
password are non-None before calling get_client and should propagate the earlier
kubeconfig error instead of masking it; update the logic around
get_value_from_py_config and the variables host, username, password, and
insecure_verify_skip to check for None and raise/return a clear error (or
re-raise the kubeconfig load exception) if any required credential is missing,
then only call get_client(host=host, username=username, password=password,
verify_ssl=not insecure_verify_skip) when all three values are present and
insecure_verify_skip has a proper boolean default.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f3db29b2-46bd-40a7-9863-b36ca34e3712
📒 Files selected for processing (1)
utilities/utils.py
| LOGGER.info(f"Attempting to use kubeconfig from {kubeconfig_path}") | ||
| # Call get_client without parameters to use kubeconfig | ||
| client = get_client() | ||
| if isinstance(client, DynamicClient): |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What does ocp_resources.resource.get_client()return at runtime, and is checking it withisinstance(..., kubernetes.dynamic.DynamicClient) recommended?
💡 Result:
The function ocp_resources.resource.get_client() returns an instance of kubernetes.dynamic.DynamicClient [1]. This client is a wrapper around the Kubernetes Python client designed for dynamic API discovery and interaction [2]. Checking it with isinstance(..., kubernetes.dynamic.DynamicClient) is not only safe but is explicitly supported, as the object returned is intended to function like a standard Kubernetes dynamic client [1][3]. The library documentation and common usage patterns rely on this interoperability to allow developers to interact with cluster resources using both ocp_resources and the underlying kubernetes-client ecosystem [1][4][5]. Furthermore, get_client() supports a fake=True argument, which returns a mock dynamic client for testing purposes, while still adhering to the DynamicClient interface [3]. Therefore, checking the instance type or interacting with it as a DynamicClient is consistent with the library's design [1][3].
Citations:
- 1: https://redhatqe-openshift-python-wrapper-47.mintlify.app/core-concepts/client-configuration
- 2: https://github.com/openshift/openshift-restclient-python/blob/master/openshift/dynamic/client.py
- 3: https://redhatqe-openshift-python-wrapper-47.mintlify.app/advanced/fake-client
- 4: https://redhatqe-openshift-python-wrapper-47.mintlify.app/core-concepts/resources
- 5: https://opensource.com/article/23/4/cluster-open-source-python-api-wrapper
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the relevant sections around the referenced line numbers
python - <<'PY'
import itertools, os, sys, pathlib
path = pathlib.Path("utilities/utils.py")
start_lines = [650, 670, 675, 680, 700, 705, 708, 710, 712, 720]
for s in sorted(set(start_lines)):
# Print ~30 lines around each start
with path.open('r', encoding='utf-8') as f:
# convert to 0-based
lines = f.readlines()
i = s-1
j = min(len(lines), i+60)
print("\n\n===== utilities/utils.py around lines", s, "-", j, "=====\n")
for ln in range(i, j):
print(f"{ln+1:5d}: {lines[ln].rstrip()}")
PYRepository: RedHatQE/mtv-api-tests
Length of output: 28297
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python - <<'PY'
import pathlib, re
path = pathlib.Path("utilities/utils.py")
text = path.read_text(encoding="utf-8", errors="replace").splitlines()
# show import area
for i,line in enumerate(text[:120], start=1):
if "DynamicClient" in line or "ocp_resources" in line or "get_client" in line:
print(f"{i:4d}: {line}")
# find all occurrences of DynamicClient
print("\n--- occurrences of 'DynamicClient' ---")
for i,line in enumerate(text, start=1):
if "DynamicClient" in line:
print(f"{i:4d}: {line}")
PYRepository: RedHatQE/mtv-api-tests
Length of output: 1776
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python - <<'PY'
import pathlib
path = pathlib.Path("utilities/utils.py")
lines = path.read_text(encoding="utf-8", errors="replace").splitlines()
for i in range(1, 90):
print(f"{i:4d}: {lines[i-1]}")
PYRepository: RedHatQE/mtv-api-tests
Length of output: 3749
HIGH: Remove runtime kubernetes.dynamic.DynamicClient coupling from get_cluster_client()
utilities/utils.py imports DynamicClient at runtime and uses isinstance(..., DynamicClient) checks in the auth/client flow (lines ~676 and ~710). Even though get_client() returns a DynamicClient-compatible client, this violates the repo rule that DynamicClient must only be imported under TYPE_CHECKING/string annotations (no runtime kubernetes type dependency). Use a None check instead and keep DynamicClient references out of runtime code.
Proposed fix
- client = get_client()
- if isinstance(client, DynamicClient):
- LOGGER.info("Successfully created client from kubeconfig")
- return client
+ client = get_client()
+ if client is None:
+ raise ValueError("Failed to create client from kubeconfig")
+ LOGGER.info("Successfully created client from kubeconfig")
+ return client
@@
- if not isinstance(client, DynamicClient):
+ if client is None:
raise ValueError("Failed to get client for cluster")Also move from kubernetes.dynamic import DynamicClient into a TYPE_CHECKING block and update annotations (e.g., use 'DynamicClient' string annotations) so runtime code never references the kubernetes type.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@utilities/utils.py` at line 676, The code in get_cluster_client() (and any
related auth/client flow) performs a runtime isinstance(..., DynamicClient)
check and imports DynamicClient at runtime; remove this runtime kubernetes type
coupling by moving "from kubernetes.dynamic import DynamicClient" into a
TYPE_CHECKING block and converting all annotations to string annotations (e.g.,
'DynamicClient'); replace the isinstance(...) checks (seen around
get_cluster_client and get_client usages) with a simple None check or duck-typed
behavior check so runtime code never references DynamicClient, and ensure return
types/annotations still reference 'DynamicClient' as a forward string to
preserve typing.
Address review feedback by documenting kubeconfig precedence over username/password authentication in the README. Also improve logging to show which cluster is being connected to and warn when kubeconfig returns unexpected type before falling back. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
Code review by qodo was updated up to the latest commit c13538d |
|
Code review by qodo was updated up to the latest commit c13538d |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@README.md`:
- Around line 276-279: The README's podman example mounts kubeconfig to
/app/.kube/config but the runtime resolves kubeconfig via the KUBECONFIG env or
~/.kube/config, causing silent fallback to username/password; update the example
command to set the KUBECONFIG environment inside the container (e.g. add -e
KUBECONFIG=/app/.kube/config) or change the mount target to the default
(~/.kube/config) so the code's lookup finds it—mention KUBECONFIG explicitly in
the README and show the corrected podman run invocation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4ec9fa18-a234-4592-82f3-71ad482b3114
📒 Files selected for processing (2)
README.mdutilities/utils.py
Address additional review feedback: 1. Use LOGGER.exception() to preserve stack traces when kubeconfig loading fails, improving debuggability 2. Try multiple kubeconfig paths: if KUBECONFIG env var points to non-existent file, fall back to default ~/.kube/config instead of immediately using username/password 3. Fix README example to set KUBECONFIG env var so the runtime code can find the mounted kubeconfig file These changes improve reliability and observability of the authentication flow. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
Code review by qodo was updated up to the latest commit c007c74 |
|
Code review by qodo was updated up to the latest commit c007c74 |
User description
Problem
The mtv-2.11-ocp-4.20-copyoffload-tests Jenkins job (and similar jobs) were failing with authentication errors during test setup:
All 37 copyoffload sanity tests failed before execution during the ocp_admin_client fixture setup in conftest.py.
Root Cause
The authentication flow had a critical mismatch:
oc loginsuccessfully, creating a valid kubeconfig with token-based authentication at ~/.kube/configThe issue is that username/password authentication is not supported by modern Kubernetes API servers, which require token-based or certificate-based authentication. When get_client() receives username/password, it cannot authenticate and falls back to anonymous access.
Solution
Modified get_cluster_client() in utilities/utils.py to:
First attempt: Use kubeconfig file (KUBECONFIG env var or ~/.kube/config)
oc loginand token-based authenticationFallback: Use username/password credentials if kubeconfig doesn't exist
oc loginBackward Compatibility
This change is fully backward compatible:
oc login: Will now work correctly (currently broken)Testing
This fix resolves the authentication failures seen in:
oc loginbefore pytestThe fix allows the test framework to use the token-based authentication created by
oc logininstead of attempting unsupported username/password authentication.Related Issues
PR Type
Bug fix
Description
Prioritize kubeconfig authentication over username/password credentials
Fixes 403 Forbidden errors in Jenkins jobs using
oc loginMaintains backward compatibility with legacy username/password workflows
Adds informative logging for authentication method selection
Diagram Walkthrough
flowchart LR A["get_cluster_client()"] --> B{"Kubeconfig exists?"} B -->|Yes| C["Load from kubeconfig"] C --> D{"Success?"} D -->|Yes| E["Return DynamicClient"] D -->|No| F["Log warning"] B -->|No| F F --> G["Fallback to username/password"] G --> H["Log compatibility warning"] H --> EFile Walkthrough
utils.py
Implement kubeconfig-first authentication strategyutilities/utils.py
get_cluster_client()to attempt kubeconfig authenticationfirst before falling back to username/password
KUBECONFIGenv var or~/.kube/configfallback handling
selection and potential compatibility issues
Summary by CodeRabbit
Improvements
Documentation