fix: wrong kubernetes login method when running inside a pod by JuanmaBM · Pull Request #568 · RedHatInsights/bonfire

JuanmaBM · 2026-05-12T09:11:13Z

Problem

bonfire 6.12.0 broke GitLab CI pipelines that use oc login before running bonfire namespace reserve.

The new bonfire_lib/k8s_client.py auto-detects when running inside a pod and prioritizes in-cluster service account auth over kubeconfig. This causes pipelines to use the GitLab runner's service account (which has no permissions) instead of the ephemeral-bot credentials from oc login.

Error:

ResourceNotFoundError: No matches found for {'api_version': 'cloud.redhat.com/v1alpha1', 'kind': 'NamespaceReservation'}

Root Cause

Auth detection logic in EphemeralK8sClient.__init__() checked _is_in_cluster() before attempting to load kubeconfig. GitLab CI pods always have /var/run/secrets/kubernetes.io/serviceaccount/token, so kubeconfig was ignored.

Fix

Reverse the priority: try loading kubeconfig first, only fall back to in-cluster auth if kubeconfig is not available.

This matches the behavior of bonfire 6.10.1 (which used ocviapy/oc commands that always respect kubeconfig).

coderabbitai · 2026-05-12T09:11:27Z

Summary by CodeRabbit

Bug Fixes
- Kubeconfig is now attempted first when no explicit server/token are provided, falling back to in-cluster auth only if loading kubeconfig fails.
- Improved error logging for resource lookup failures, providing clearer diagnostic information to aid troubleshooting.

Walkthrough

EphemeralK8sClient now attempts kubeconfig-based auth first, falls back to in-cluster only if kubeconfig load raises ConfigException, and raises if both fail. _get_resource now catches exceptions from dynamic CRD lookup, logs the error (including exception class/message), and returns no resource on failure.

Changes

Kubernetes Client Authentication and Resource Resolution

Layer / File(s)	Summary
Authentication fallback ordering `bonfire_lib/k8s_client.py`	`EphemeralK8sClient.__init__` calls `config.load_kube_config()` first and records success; if that raises `config.ConfigException` it checks `_is_in_cluster()` and either calls `config.load_incluster_config()` or re-raises `config.ConfigException`.
CRD resource lookup error handling `bonfire_lib/k8s_client.py`	`_get_resource(kind)` wraps `self._dynamic.resources.get(...)` in a broad `try/except`, logs `log.error` with `{kind}`, `{CRD_API_VERSION}`, and the exception class/message, and does not return a resource on exception (implicit `None`).
Test: simulate kubeconfig failure and in-cluster fallback `tests/test_bonfire_lib/test_k8s_client.py`	Tests import `ConfigException` from `kubernetes.config`, set `mock_config.load_kube_config.side_effect = ConfigException("No kubeconfig")`, and assign `mock_config.ConfigException = ConfigException` to exercise the in-cluster fallback path.

Sequence Diagram

sequenceDiagram
    participant Client as EphemeralK8sClient
    participant KubeCfg as kubeconfig loader
    participant InCluster as in-cluster loader
    participant DynAPI as dynamic.resources
    participant Log as logger

    Client->>KubeCfg: load_kube_config()
    alt kubeconfig loads
        KubeCfg-->>Client: success
    else kubeconfig raises ConfigException
        KubeCfg-->>Client: ConfigException
        Client->>InCluster: _is_in_cluster()? then load_incluster_config()
        alt in-cluster available
            InCluster-->>Client: success
        else not in-cluster
            InCluster-->>Client: raise ConfigException
        end
    end

    Client->>DynAPI: resources.get(kind)
    alt resource found
        DynAPI-->>Client: resource handle
    else error
        DynAPI-->>Client: raises exception
        Client->>Log: error(kind, CRD_API_VERSION, ExceptionClass: message)
        Log-->>Client: logged
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and specifically describes the main fix: reversing Kubernetes auth detection to prioritize kubeconfig over in-cluster service account authentication.
Description check	✅ Passed	The description provides clear context on the problem, root cause, and solution, all directly related to the code changes in the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

bonfire_lib/k8s_client.py (1)

48-52: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update docstring to reflect new auth priority order.

The docstring states the priority order as:

Explicit server + token
In-cluster service account
Kubeconfig file

However, the implementation now checks kubeconfig before in-cluster (lines 79-103), so the actual priority is:

Explicit server + token
Kubeconfig file
In-cluster service account

📝 Proposed docstring fix

     """Kubernetes API client for ephemeral resource CRDs.
 
-    Supports three auth modes (auto-detected in priority order):
-    1. Explicit server + token
-    2. In-cluster service account
-    3. Kubeconfig file
+    Supports three auth modes (auto-detected in priority order):
+    1. Explicit server + token
+    2. Kubeconfig file
+    3. In-cluster service account
     """

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bonfire_lib/k8s_client.py` around lines 48 - 52, Update the module-level
docstring in k8s_client.py to reflect the actual auth priority implemented: 1)
explicit server + token, 2) kubeconfig file, 3) in-cluster service account;
specifically change the listed order and wording so it matches the code path
that attempts load_kube_config (or equivalent kubeconfig check) before
config.load_incluster_config (or the in-cluster check) in the auth selection
logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bonfire_lib/k8s_client.py`:
- Around line 115-122: The exception handler in _get_resource currently logs the
error and falls through, returning None which causes AttributeError in callers
that then call methods like resource.create; change the handler to either
re-raise the caught exception or raise a new, informative exception after
logging so callers receive the original failure (do not silently return None).
Specifically, in the _get_resource function replace the silent return with a
re-raise (raise) or raise RuntimeError(...) wrapping e to preserve traceback and
include the kind and CRD_API_VERSION in the message so callers of _get_resource
(e.g., places that call resource.create) see the real error instead of getting a
NoneType AttributeError.
- Around line 79-103: The try/except around config.load_kube_config should not
only catch config.ConfigException but also any other exceptions that can occur
(e.g., OSError, TypeError) so the in-cluster fallback still runs; update the
block that calls config.load_kube_config (and sets kubeconfig_loaded,
self._auth_mode, and self._api_client) to catch Exception (or explicitly list
known exceptions) and log or record the underlying exception, then proceed to
call self._is_in_cluster(), config.load_incluster_config(), and set
self._api_client/self._auth_mode as before when kubeconfig loading fails.

---

Outside diff comments:
In `@bonfire_lib/k8s_client.py`:
- Around line 48-52: Update the module-level docstring in k8s_client.py to
reflect the actual auth priority implemented: 1) explicit server + token, 2)
kubeconfig file, 3) in-cluster service account; specifically change the listed
order and wording so it matches the code path that attempts load_kube_config (or
equivalent kubeconfig check) before config.load_incluster_config (or the
in-cluster check) in the auth selection logic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 563dfa85-9372-4208-beb6-9fca40fd7c91

📥 Commits

Reviewing files that changed from the base of the PR and between dc26256 and d80f59e.

📒 Files selected for processing (1)

bonfire_lib/k8s_client.py

coderabbitai · 2026-05-12T09:13:58Z

+            # Try kubeconfig first, then fall back to in-cluster auth.
+            # This ensures that explicit `oc login` credentials are used even when
+            # running inside a pod (e.g., GitLab CI runners).
+            kubeconfig_loaded = False
+            try:
+                config.load_kube_config(
+                    config_file=kubeconfig_path,
+                    context=context,
+                )
+                kubeconfig_loaded = True
+                self._auth_mode = "kubeconfig"
+                self._api_client = client.ApiClient()
+            except config.ConfigException:
+                # No valid kubeconfig found
+                pass
+
+            if not kubeconfig_loaded:
+                if self._is_in_cluster():
+                    self._auth_mode = "in-cluster"
+                    config.load_incluster_config()
+                    self._api_client = client.ApiClient()
+                else:
+                    raise config.ConfigException(
+                        "Unable to load kubeconfig and not running in-cluster"
+                    )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Does kubernetes.config.load_kube_config raise exceptions other than ConfigException?

💡 Result:

Yes, kubernetes.config.load_kube_config can raise exceptions other than ConfigException. While it primarily raises ConfigException for kubeconfig parsing and validation errors [1][2][3][4], it can also raise standard Python exceptions from underlying operations. Examples include: - TypeError: Seen in older versions or edge cases like empty config files (before fixes to raise ConfigException instead) [5][6]. - OSError or file-related exceptions: From os.path operations, file reading (e.g., if KUBE_CONFIG_DEFAULT_LOCATION cannot be accessed due to permissions) [1]. - MemoryError, KeyboardInterrupt, and other standard Python exceptions that can occur at any time [1]. The function loads the config file using os.path and yaml.safe_load, then constructs a KubeConfigLoader which performs further parsing that explicitly raises ConfigException [3]. However, file I/O and other low-level operations can propagate other exceptions. In some cases, ConfigException may be caught and logged without re-raising (e.g., in exec plugin handling), potentially leading to further errors [4]. No official documentation exhaustively lists all possible exceptions; the source code shows ConfigException as the main custom one, but Python standard exceptions are possible [7][2][3].

Citations:

1: https://stackoverflow.com/questions/61126006/what-exceptions-can-be-raised-by-load-kube-config

2: https://github.com/kubernetes-client/python-base/blob/master/config/kube_config.py

3: https://github.com/kubernetes-client/python/blob/master/kubernetes/base/config/kube_config.py

4: ConfigExceptions aren't made visible kubernetes-client/python#2219

5: config.load_kube_config fails with Exception TypeError: "'NoneType' object is not callable" kubernetes-client/python#625

6: raise exception when an empty config file is passed to load_kube_config kubernetes-client/python-base#223

7: http://jashandeep-sohik8s-python.readthedocs.io/en/latest/kubernetes.config.html

Expand exception handling to catch all exceptions from kubeconfig loading.

The code currently catches only config.ConfigException when calling config.load_kube_config(), but the kubernetes client library can raise other exceptions including OSError (e.g., file permission errors), TypeError, and other standard Python exceptions. If any of these occur, they propagate and skip the in-cluster fallback, breaking the intended auth degradation path.

Catch a broader exception class (e.g., Exception) or explicitly handle known file/parsing exceptions to ensure all failures trigger the fallback to in-cluster auth.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bonfire_lib/k8s_client.py` around lines 79 - 103, The try/except around config.load_kube_config should not only catch config.ConfigException but also any other exceptions that can occur (e.g., OSError, TypeError) so the in-cluster fallback still runs; update the block that calls config.load_kube_config (and sets kubeconfig_loaded, self._auth_mode, and self._api_client) to catch Exception (or explicitly list known exceptions) and log or record the underlying exception, then proceed to call self._is_in_cluster(), config.load_incluster_config(), and set self._api_client/self._auth_mode as before when kubeconfig loading fails.

coderabbitai · 2026-05-12T09:13:58Z

+        try:
+            return self._dynamic.resources.get(api_version=CRD_API_VERSION, kind=kind)
+        except Exception as e:
+            # Log available resources for debugging
+            log.error(
+                f"Failed to get resource {kind} from {CRD_API_VERSION}. "
+                f"Error: {e.__class__.__name__}: {e}"
+            )


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Missing return or re-raise in exception handler causes AttributeError in all callers.

The _get_resource method logs the exception but does not return a value or re-raise, causing it to implicitly return None. All callers immediately invoke methods on the returned resource object (e.g., line 128: resource.create(...)), which will raise AttributeError: 'NoneType' object has no attribute 'create' instead of the original error.

This obscures the root cause and makes debugging significantly harder.

🔧 Proposed fix to re-raise after logging

def _get_resource(self, kind: str): """Get a DynamicClient resource handle for a cloud.redhat.com/v1alpha1 CRD.""" try: return self._dynamic.resources.get(api_version=CRD_API_VERSION, kind=kind) except Exception as e: # Log available resources for debugging log.error( f"Failed to get resource {kind} from {CRD_API_VERSION}. " f"Error: {e.__class__.__name__}: {e}" ) + raise

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bonfire_lib/k8s_client.py` around lines 115 - 122, The exception handler in _get_resource currently logs the error and falls through, returning None which causes AttributeError in callers that then call methods like resource.create; change the handler to either re-raise the caught exception or raise a new, informative exception after logging so callers receive the original failure (do not silently return None). Specifically, in the _get_resource function replace the silent return with a re-raise (raise) or raise RuntimeError(...) wrapping e to preserve traceback and include the kind and CRD_API_VERSION in the message so callers of _get_resource (e.g., places that call resource.create) see the real error instead of getting a NoneType AttributeError.

Signed-off-by: JuanmaBM <juanmabareamartinez@gmail.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_bonfire_lib/test_k8s_client.py`:
- Around line 61-67: The test currently only asserts that load_incluster_config
was called after EphemeralK8sClient() but doesn't assert that load_kube_config
was attempted first; update the test in test_k8s_client.py to assert
mock_config.load_kube_config.assert_called_once() (or assert_called_once_with if
specific args are expected) before asserting
mock_config.load_incluster_config.assert_called_once(), referencing the
EphemeralK8sClient instantiation and the mock_config mock to ensure the
kubeconfig path was attempted prior to the in-cluster fallback.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 8774949b-9a94-4f8b-bd27-9f6614c92f37

📥 Commits

Reviewing files that changed from the base of the PR and between d80f59e and 0c00a64.

📒 Files selected for processing (2)

bonfire_lib/k8s_client.py
tests/test_bonfire_lib/test_k8s_client.py

🚧 Files skipped from review as they are similar to previous changes (1)

bonfire_lib/k8s_client.py

coderabbitai · 2026-05-12T14:33:03Z

+        # Mock kubeconfig load to fail, so it falls back to in-cluster
+        mock_config.load_kube_config.side_effect = ConfigException("No kubeconfig")
+        mock_config.ConfigException = ConfigException
+
        EphemeralK8sClient()
        mock_config.load_incluster_config.assert_called_once()



⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Assert kubeconfig attempt in the fallback test.

Line 66 only proves fallback happened; it doesn’t prove kubeconfig was attempted first. Add an explicit kubeconfig call assertion so this test protects the regression you’re fixing.

Proposed test hardening

EphemeralK8sClient() + mock_config.load_kube_config.assert_called_once() mock_config.load_incluster_config.assert_called_once()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Mock kubeconfig load to fail, so it falls back to in-cluster

mock_config.load_kube_config.side_effect = ConfigException("No kubeconfig")

mock_config.ConfigException = ConfigException

EphemeralK8sClient()

mock_config.load_incluster_config.assert_called_once()

# Mock kubeconfig load to fail, so it falls back to in-cluster

mock_config.load_kube_config.side_effect = ConfigException("No kubeconfig")

mock_config.ConfigException = ConfigException

EphemeralK8sClient()

mock_config.load_kube_config.assert_called_once()

mock_config.load_incluster_config.assert_called_once()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_bonfire_lib/test_k8s_client.py` around lines 61 - 67, The test currently only asserts that load_incluster_config was called after EphemeralK8sClient() but doesn't assert that load_kube_config was attempted first; update the test in test_k8s_client.py to assert mock_config.load_kube_config.assert_called_once() (or assert_called_once_with if specific args are expected) before asserting mock_config.load_incluster_config.assert_called_once(), referencing the EphemeralK8sClient instantiation and the mock_config mock to ensure the kubeconfig path was attempted prior to the in-cluster fallback.

bsquizz · 2026-05-12T15:38:18Z

/retest

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

bsquizz previously approved these changes May 12, 2026

View reviewed changes

fix: wrong kubernetes login method when running inside a pod

0c00a64

Signed-off-by: JuanmaBM <juanmabareamartinez@gmail.com>

JuanmaBM dismissed bsquizz’s stale review via 0c00a64 May 12, 2026 14:30

JuanmaBM force-pushed the fix-in-cluster-auth-detection branch from d80f59e to 0c00a64 Compare May 12, 2026 14:30

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

bsquizz approved these changes May 12, 2026

View reviewed changes

bsquizz merged commit cf08bac into RedHatInsights:master May 12, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: wrong kubernetes login method when running inside a pod#568

fix: wrong kubernetes login method when running inside a pod#568
bsquizz merged 1 commit into
RedHatInsights:masterfrom
JuanmaBM:fix-in-cluster-auth-detection

JuanmaBM commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

bsquizz commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JuanmaBM commented May 12, 2026

Problem

Root Cause

Fix

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

bsquizz commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 12, 2026 •

edited

Loading