fix: wrong kubernetes login method when running inside a pod#568
Conversation
Summary by CodeRabbit
WalkthroughEphemeralK8sClient now attempts kubeconfig-based auth first, falls back to in-cluster only if kubeconfig load raises ChangesKubernetes Client Authentication and Resource Resolution
Sequence DiagramsequenceDiagram
participant Client as EphemeralK8sClient
participant KubeCfg as kubeconfig loader
participant InCluster as in-cluster loader
participant DynAPI as dynamic.resources
participant Log as logger
Client->>KubeCfg: load_kube_config()
alt kubeconfig loads
KubeCfg-->>Client: success
else kubeconfig raises ConfigException
KubeCfg-->>Client: ConfigException
Client->>InCluster: _is_in_cluster()? then load_incluster_config()
alt in-cluster available
InCluster-->>Client: success
else not in-cluster
InCluster-->>Client: raise ConfigException
end
end
Client->>DynAPI: resources.get(kind)
alt resource found
DynAPI-->>Client: resource handle
else error
DynAPI-->>Client: raises exception
Client->>Log: error(kind, CRD_API_VERSION, ExceptionClass: message)
Log-->>Client: logged
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
bonfire_lib/k8s_client.py (1)
48-52:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate docstring to reflect new auth priority order.
The docstring states the priority order as:
- Explicit server + token
- In-cluster service account
- Kubeconfig file
However, the implementation now checks kubeconfig before in-cluster (lines 79-103), so the actual priority is:
- Explicit server + token
- Kubeconfig file
- In-cluster service account
📝 Proposed docstring fix
"""Kubernetes API client for ephemeral resource CRDs. - Supports three auth modes (auto-detected in priority order): - 1. Explicit server + token - 2. In-cluster service account - 3. Kubeconfig file + Supports three auth modes (auto-detected in priority order): + 1. Explicit server + token + 2. Kubeconfig file + 3. In-cluster service account """🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bonfire_lib/k8s_client.py` around lines 48 - 52, Update the module-level docstring in k8s_client.py to reflect the actual auth priority implemented: 1) explicit server + token, 2) kubeconfig file, 3) in-cluster service account; specifically change the listed order and wording so it matches the code path that attempts load_kube_config (or equivalent kubeconfig check) before config.load_incluster_config (or the in-cluster check) in the auth selection logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bonfire_lib/k8s_client.py`:
- Around line 115-122: The exception handler in _get_resource currently logs the
error and falls through, returning None which causes AttributeError in callers
that then call methods like resource.create; change the handler to either
re-raise the caught exception or raise a new, informative exception after
logging so callers receive the original failure (do not silently return None).
Specifically, in the _get_resource function replace the silent return with a
re-raise (raise) or raise RuntimeError(...) wrapping e to preserve traceback and
include the kind and CRD_API_VERSION in the message so callers of _get_resource
(e.g., places that call resource.create) see the real error instead of getting a
NoneType AttributeError.
- Around line 79-103: The try/except around config.load_kube_config should not
only catch config.ConfigException but also any other exceptions that can occur
(e.g., OSError, TypeError) so the in-cluster fallback still runs; update the
block that calls config.load_kube_config (and sets kubeconfig_loaded,
self._auth_mode, and self._api_client) to catch Exception (or explicitly list
known exceptions) and log or record the underlying exception, then proceed to
call self._is_in_cluster(), config.load_incluster_config(), and set
self._api_client/self._auth_mode as before when kubeconfig loading fails.
---
Outside diff comments:
In `@bonfire_lib/k8s_client.py`:
- Around line 48-52: Update the module-level docstring in k8s_client.py to
reflect the actual auth priority implemented: 1) explicit server + token, 2)
kubeconfig file, 3) in-cluster service account; specifically change the listed
order and wording so it matches the code path that attempts load_kube_config (or
equivalent kubeconfig check) before config.load_incluster_config (or the
in-cluster check) in the auth selection logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 563dfa85-9372-4208-beb6-9fca40fd7c91
📒 Files selected for processing (1)
bonfire_lib/k8s_client.py
| # Try kubeconfig first, then fall back to in-cluster auth. | ||
| # This ensures that explicit `oc login` credentials are used even when | ||
| # running inside a pod (e.g., GitLab CI runners). | ||
| kubeconfig_loaded = False | ||
| try: | ||
| config.load_kube_config( | ||
| config_file=kubeconfig_path, | ||
| context=context, | ||
| ) | ||
| kubeconfig_loaded = True | ||
| self._auth_mode = "kubeconfig" | ||
| self._api_client = client.ApiClient() | ||
| except config.ConfigException: | ||
| # No valid kubeconfig found | ||
| pass | ||
|
|
||
| if not kubeconfig_loaded: | ||
| if self._is_in_cluster(): | ||
| self._auth_mode = "in-cluster" | ||
| config.load_incluster_config() | ||
| self._api_client = client.ApiClient() | ||
| else: | ||
| raise config.ConfigException( | ||
| "Unable to load kubeconfig and not running in-cluster" | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does kubernetes.config.load_kube_config raise exceptions other than ConfigException?
💡 Result:
Yes, kubernetes.config.load_kube_config can raise exceptions other than ConfigException. While it primarily raises ConfigException for kubeconfig parsing and validation errors [1][2][3][4], it can also raise standard Python exceptions from underlying operations. Examples include: - TypeError: Seen in older versions or edge cases like empty config files (before fixes to raise ConfigException instead) [5][6]. - OSError or file-related exceptions: From os.path operations, file reading (e.g., if KUBE_CONFIG_DEFAULT_LOCATION cannot be accessed due to permissions) [1]. - MemoryError, KeyboardInterrupt, and other standard Python exceptions that can occur at any time [1]. The function loads the config file using os.path and yaml.safe_load, then constructs a KubeConfigLoader which performs further parsing that explicitly raises ConfigException [3]. However, file I/O and other low-level operations can propagate other exceptions. In some cases, ConfigException may be caught and logged without re-raising (e.g., in exec plugin handling), potentially leading to further errors [4]. No official documentation exhaustively lists all possible exceptions; the source code shows ConfigException as the main custom one, but Python standard exceptions are possible [7][2][3].
Citations:
- 1: https://stackoverflow.com/questions/61126006/what-exceptions-can-be-raised-by-load-kube-config
- 2: https://github.com/kubernetes-client/python-base/blob/master/config/kube_config.py
- 3: https://github.com/kubernetes-client/python/blob/master/kubernetes/base/config/kube_config.py
- 4: ConfigExceptions aren't made visible kubernetes-client/python#2219
- 5: config.load_kube_config fails with Exception TypeError: "'NoneType' object is not callable" kubernetes-client/python#625
- 6: raise exception when an empty config file is passed to load_kube_config kubernetes-client/python-base#223
- 7: http://jashandeep-sohik8s-python.readthedocs.io/en/latest/kubernetes.config.html
Expand exception handling to catch all exceptions from kubeconfig loading.
The code currently catches only config.ConfigException when calling config.load_kube_config(), but the kubernetes client library can raise other exceptions including OSError (e.g., file permission errors), TypeError, and other standard Python exceptions. If any of these occur, they propagate and skip the in-cluster fallback, breaking the intended auth degradation path.
Catch a broader exception class (e.g., Exception) or explicitly handle known file/parsing exceptions to ensure all failures trigger the fallback to in-cluster auth.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@bonfire_lib/k8s_client.py` around lines 79 - 103, The try/except around
config.load_kube_config should not only catch config.ConfigException but also
any other exceptions that can occur (e.g., OSError, TypeError) so the in-cluster
fallback still runs; update the block that calls config.load_kube_config (and
sets kubeconfig_loaded, self._auth_mode, and self._api_client) to catch
Exception (or explicitly list known exceptions) and log or record the underlying
exception, then proceed to call self._is_in_cluster(),
config.load_incluster_config(), and set self._api_client/self._auth_mode as
before when kubeconfig loading fails.
| try: | ||
| return self._dynamic.resources.get(api_version=CRD_API_VERSION, kind=kind) | ||
| except Exception as e: | ||
| # Log available resources for debugging | ||
| log.error( | ||
| f"Failed to get resource {kind} from {CRD_API_VERSION}. " | ||
| f"Error: {e.__class__.__name__}: {e}" | ||
| ) |
There was a problem hiding this comment.
Missing return or re-raise in exception handler causes AttributeError in all callers.
The _get_resource method logs the exception but does not return a value or re-raise, causing it to implicitly return None. All callers immediately invoke methods on the returned resource object (e.g., line 128: resource.create(...)), which will raise AttributeError: 'NoneType' object has no attribute 'create' instead of the original error.
This obscures the root cause and makes debugging significantly harder.
🔧 Proposed fix to re-raise after logging
def _get_resource(self, kind: str):
"""Get a DynamicClient resource handle for a cloud.redhat.com/v1alpha1 CRD."""
try:
return self._dynamic.resources.get(api_version=CRD_API_VERSION, kind=kind)
except Exception as e:
# Log available resources for debugging
log.error(
f"Failed to get resource {kind} from {CRD_API_VERSION}. "
f"Error: {e.__class__.__name__}: {e}"
)
+ raise🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@bonfire_lib/k8s_client.py` around lines 115 - 122, The exception handler in
_get_resource currently logs the error and falls through, returning None which
causes AttributeError in callers that then call methods like resource.create;
change the handler to either re-raise the caught exception or raise a new,
informative exception after logging so callers receive the original failure (do
not silently return None). Specifically, in the _get_resource function replace
the silent return with a re-raise (raise) or raise RuntimeError(...) wrapping e
to preserve traceback and include the kind and CRD_API_VERSION in the message so
callers of _get_resource (e.g., places that call resource.create) see the real
error instead of getting a NoneType AttributeError.
Signed-off-by: JuanmaBM <juanmabareamartinez@gmail.com>
d80f59e to
0c00a64
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_bonfire_lib/test_k8s_client.py`:
- Around line 61-67: The test currently only asserts that load_incluster_config
was called after EphemeralK8sClient() but doesn't assert that load_kube_config
was attempted first; update the test in test_k8s_client.py to assert
mock_config.load_kube_config.assert_called_once() (or assert_called_once_with if
specific args are expected) before asserting
mock_config.load_incluster_config.assert_called_once(), referencing the
EphemeralK8sClient instantiation and the mock_config mock to ensure the
kubeconfig path was attempted prior to the in-cluster fallback.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 8774949b-9a94-4f8b-bd27-9f6614c92f37
📒 Files selected for processing (2)
bonfire_lib/k8s_client.pytests/test_bonfire_lib/test_k8s_client.py
🚧 Files skipped from review as they are similar to previous changes (1)
- bonfire_lib/k8s_client.py
| # Mock kubeconfig load to fail, so it falls back to in-cluster | ||
| mock_config.load_kube_config.side_effect = ConfigException("No kubeconfig") | ||
| mock_config.ConfigException = ConfigException | ||
|
|
||
| EphemeralK8sClient() | ||
| mock_config.load_incluster_config.assert_called_once() | ||
|
|
There was a problem hiding this comment.
Assert kubeconfig attempt in the fallback test.
Line 66 only proves fallback happened; it doesn’t prove kubeconfig was attempted first. Add an explicit kubeconfig call assertion so this test protects the regression you’re fixing.
Proposed test hardening
EphemeralK8sClient()
+ mock_config.load_kube_config.assert_called_once()
mock_config.load_incluster_config.assert_called_once()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Mock kubeconfig load to fail, so it falls back to in-cluster | |
| mock_config.load_kube_config.side_effect = ConfigException("No kubeconfig") | |
| mock_config.ConfigException = ConfigException | |
| EphemeralK8sClient() | |
| mock_config.load_incluster_config.assert_called_once() | |
| # Mock kubeconfig load to fail, so it falls back to in-cluster | |
| mock_config.load_kube_config.side_effect = ConfigException("No kubeconfig") | |
| mock_config.ConfigException = ConfigException | |
| EphemeralK8sClient() | |
| mock_config.load_kube_config.assert_called_once() | |
| mock_config.load_incluster_config.assert_called_once() |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_bonfire_lib/test_k8s_client.py` around lines 61 - 67, The test
currently only asserts that load_incluster_config was called after
EphemeralK8sClient() but doesn't assert that load_kube_config was attempted
first; update the test in test_k8s_client.py to assert
mock_config.load_kube_config.assert_called_once() (or assert_called_once_with if
specific args are expected) before asserting
mock_config.load_incluster_config.assert_called_once(), referencing the
EphemeralK8sClient instantiation and the mock_config mock to ensure the
kubeconfig path was attempted prior to the in-cluster fallback.
|
/retest |
Problem
bonfire 6.12.0 broke GitLab CI pipelines that use
oc loginbefore runningbonfire namespace reserve.The new
bonfire_lib/k8s_client.pyauto-detects when running inside a pod and prioritizes in-cluster service account auth over kubeconfig. This causes pipelines to use the GitLab runner's service account (which has no permissions) instead of the ephemeral-bot credentials fromoc login.Error:
Root Cause
Auth detection logic in
EphemeralK8sClient.__init__()checked_is_in_cluster()before attempting to load kubeconfig. GitLab CI pods always have/var/run/secrets/kubernetes.io/serviceaccount/token, so kubeconfig was ignored.Fix
Reverse the priority: try loading kubeconfig first, only fall back to in-cluster auth if kubeconfig is not available.
This matches the behavior of bonfire 6.10.1 (which used
ocviapy/occommands that always respect kubeconfig).