Skip to content

fix: wrong kubernetes login method when running inside a pod#568

Merged
bsquizz merged 1 commit into
RedHatInsights:masterfrom
JuanmaBM:fix-in-cluster-auth-detection
May 12, 2026
Merged

fix: wrong kubernetes login method when running inside a pod#568
bsquizz merged 1 commit into
RedHatInsights:masterfrom
JuanmaBM:fix-in-cluster-auth-detection

Conversation

@JuanmaBM
Copy link
Copy Markdown
Contributor

Problem

bonfire 6.12.0 broke GitLab CI pipelines that use oc login before running bonfire namespace reserve.

The new bonfire_lib/k8s_client.py auto-detects when running inside a pod and prioritizes in-cluster service account auth over kubeconfig. This causes pipelines to use the GitLab runner's service account (which has no permissions) instead of the ephemeral-bot credentials from oc login.

Error:

ResourceNotFoundError: No matches found for {'api_version': 'cloud.redhat.com/v1alpha1', 'kind': 'NamespaceReservation'}

Root Cause

Auth detection logic in EphemeralK8sClient.__init__() checked _is_in_cluster() before attempting to load kubeconfig. GitLab CI pods always have /var/run/secrets/kubernetes.io/serviceaccount/token, so kubeconfig was ignored.

Fix

Reverse the priority: try loading kubeconfig first, only fall back to in-cluster auth if kubeconfig is not available.

This matches the behavior of bonfire 6.10.1 (which used ocviapy/oc commands that always respect kubeconfig).

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Summary by CodeRabbit

  • Bug Fixes
    • Kubeconfig is now attempted first when no explicit server/token are provided, falling back to in-cluster auth only if loading kubeconfig fails.
    • Improved error logging for resource lookup failures, providing clearer diagnostic information to aid troubleshooting.

Walkthrough

EphemeralK8sClient now attempts kubeconfig-based auth first, falls back to in-cluster only if kubeconfig load raises ConfigException, and raises if both fail. _get_resource now catches exceptions from dynamic CRD lookup, logs the error (including exception class/message), and returns no resource on failure.

Changes

Kubernetes Client Authentication and Resource Resolution

Layer / File(s) Summary
Authentication fallback ordering
bonfire_lib/k8s_client.py
EphemeralK8sClient.__init__ calls config.load_kube_config() first and records success; if that raises config.ConfigException it checks _is_in_cluster() and either calls config.load_incluster_config() or re-raises config.ConfigException.
CRD resource lookup error handling
bonfire_lib/k8s_client.py
_get_resource(kind) wraps self._dynamic.resources.get(...) in a broad try/except, logs log.error with {kind}, {CRD_API_VERSION}, and the exception class/message, and does not return a resource on exception (implicit None).
Test: simulate kubeconfig failure and in-cluster fallback
tests/test_bonfire_lib/test_k8s_client.py
Tests import ConfigException from kubernetes.config, set mock_config.load_kube_config.side_effect = ConfigException("No kubeconfig"), and assign mock_config.ConfigException = ConfigException to exercise the in-cluster fallback path.

Sequence Diagram

sequenceDiagram
    participant Client as EphemeralK8sClient
    participant KubeCfg as kubeconfig loader
    participant InCluster as in-cluster loader
    participant DynAPI as dynamic.resources
    participant Log as logger

    Client->>KubeCfg: load_kube_config()
    alt kubeconfig loads
        KubeCfg-->>Client: success
    else kubeconfig raises ConfigException
        KubeCfg-->>Client: ConfigException
        Client->>InCluster: _is_in_cluster()? then load_incluster_config()
        alt in-cluster available
            InCluster-->>Client: success
        else not in-cluster
            InCluster-->>Client: raise ConfigException
        end
    end

    Client->>DynAPI: resources.get(kind)
    alt resource found
        DynAPI-->>Client: resource handle
    else error
        DynAPI-->>Client: raises exception
        Client->>Log: error(kind, CRD_API_VERSION, ExceptionClass: message)
        Log-->>Client: logged
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main fix: reversing Kubernetes auth detection to prioritize kubeconfig over in-cluster service account authentication.
Description check ✅ Passed The description provides clear context on the problem, root cause, and solution, all directly related to the code changes in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
bonfire_lib/k8s_client.py (1)

48-52: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update docstring to reflect new auth priority order.

The docstring states the priority order as:

  1. Explicit server + token
  2. In-cluster service account
  3. Kubeconfig file

However, the implementation now checks kubeconfig before in-cluster (lines 79-103), so the actual priority is:

  1. Explicit server + token
  2. Kubeconfig file
  3. In-cluster service account
📝 Proposed docstring fix
     """Kubernetes API client for ephemeral resource CRDs.
 
-    Supports three auth modes (auto-detected in priority order):
-    1. Explicit server + token
-    2. In-cluster service account
-    3. Kubeconfig file
+    Supports three auth modes (auto-detected in priority order):
+    1. Explicit server + token
+    2. Kubeconfig file
+    3. In-cluster service account
     """
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bonfire_lib/k8s_client.py` around lines 48 - 52, Update the module-level
docstring in k8s_client.py to reflect the actual auth priority implemented: 1)
explicit server + token, 2) kubeconfig file, 3) in-cluster service account;
specifically change the listed order and wording so it matches the code path
that attempts load_kube_config (or equivalent kubeconfig check) before
config.load_incluster_config (or the in-cluster check) in the auth selection
logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bonfire_lib/k8s_client.py`:
- Around line 115-122: The exception handler in _get_resource currently logs the
error and falls through, returning None which causes AttributeError in callers
that then call methods like resource.create; change the handler to either
re-raise the caught exception or raise a new, informative exception after
logging so callers receive the original failure (do not silently return None).
Specifically, in the _get_resource function replace the silent return with a
re-raise (raise) or raise RuntimeError(...) wrapping e to preserve traceback and
include the kind and CRD_API_VERSION in the message so callers of _get_resource
(e.g., places that call resource.create) see the real error instead of getting a
NoneType AttributeError.
- Around line 79-103: The try/except around config.load_kube_config should not
only catch config.ConfigException but also any other exceptions that can occur
(e.g., OSError, TypeError) so the in-cluster fallback still runs; update the
block that calls config.load_kube_config (and sets kubeconfig_loaded,
self._auth_mode, and self._api_client) to catch Exception (or explicitly list
known exceptions) and log or record the underlying exception, then proceed to
call self._is_in_cluster(), config.load_incluster_config(), and set
self._api_client/self._auth_mode as before when kubeconfig loading fails.

---

Outside diff comments:
In `@bonfire_lib/k8s_client.py`:
- Around line 48-52: Update the module-level docstring in k8s_client.py to
reflect the actual auth priority implemented: 1) explicit server + token, 2)
kubeconfig file, 3) in-cluster service account; specifically change the listed
order and wording so it matches the code path that attempts load_kube_config (or
equivalent kubeconfig check) before config.load_incluster_config (or the
in-cluster check) in the auth selection logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 563dfa85-9372-4208-beb6-9fca40fd7c91

📥 Commits

Reviewing files that changed from the base of the PR and between dc26256 and d80f59e.

📒 Files selected for processing (1)
  • bonfire_lib/k8s_client.py

Comment thread bonfire_lib/k8s_client.py
Comment on lines +79 to +103
# Try kubeconfig first, then fall back to in-cluster auth.
# This ensures that explicit `oc login` credentials are used even when
# running inside a pod (e.g., GitLab CI runners).
kubeconfig_loaded = False
try:
config.load_kube_config(
config_file=kubeconfig_path,
context=context,
)
kubeconfig_loaded = True
self._auth_mode = "kubeconfig"
self._api_client = client.ApiClient()
except config.ConfigException:
# No valid kubeconfig found
pass

if not kubeconfig_loaded:
if self._is_in_cluster():
self._auth_mode = "in-cluster"
config.load_incluster_config()
self._api_client = client.ApiClient()
else:
raise config.ConfigException(
"Unable to load kubeconfig and not running in-cluster"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Does kubernetes.config.load_kube_config raise exceptions other than ConfigException?

💡 Result:

Yes, kubernetes.config.load_kube_config can raise exceptions other than ConfigException. While it primarily raises ConfigException for kubeconfig parsing and validation errors [1][2][3][4], it can also raise standard Python exceptions from underlying operations. Examples include: - TypeError: Seen in older versions or edge cases like empty config files (before fixes to raise ConfigException instead) [5][6]. - OSError or file-related exceptions: From os.path operations, file reading (e.g., if KUBE_CONFIG_DEFAULT_LOCATION cannot be accessed due to permissions) [1]. - MemoryError, KeyboardInterrupt, and other standard Python exceptions that can occur at any time [1]. The function loads the config file using os.path and yaml.safe_load, then constructs a KubeConfigLoader which performs further parsing that explicitly raises ConfigException [3]. However, file I/O and other low-level operations can propagate other exceptions. In some cases, ConfigException may be caught and logged without re-raising (e.g., in exec plugin handling), potentially leading to further errors [4]. No official documentation exhaustively lists all possible exceptions; the source code shows ConfigException as the main custom one, but Python standard exceptions are possible [7][2][3].

Citations:


Expand exception handling to catch all exceptions from kubeconfig loading.

The code currently catches only config.ConfigException when calling config.load_kube_config(), but the kubernetes client library can raise other exceptions including OSError (e.g., file permission errors), TypeError, and other standard Python exceptions. If any of these occur, they propagate and skip the in-cluster fallback, breaking the intended auth degradation path.

Catch a broader exception class (e.g., Exception) or explicitly handle known file/parsing exceptions to ensure all failures trigger the fallback to in-cluster auth.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bonfire_lib/k8s_client.py` around lines 79 - 103, The try/except around
config.load_kube_config should not only catch config.ConfigException but also
any other exceptions that can occur (e.g., OSError, TypeError) so the in-cluster
fallback still runs; update the block that calls config.load_kube_config (and
sets kubeconfig_loaded, self._auth_mode, and self._api_client) to catch
Exception (or explicitly list known exceptions) and log or record the underlying
exception, then proceed to call self._is_in_cluster(),
config.load_incluster_config(), and set self._api_client/self._auth_mode as
before when kubeconfig loading fails.

Comment thread bonfire_lib/k8s_client.py
Comment on lines +115 to +122
try:
return self._dynamic.resources.get(api_version=CRD_API_VERSION, kind=kind)
except Exception as e:
# Log available resources for debugging
log.error(
f"Failed to get resource {kind} from {CRD_API_VERSION}. "
f"Error: {e.__class__.__name__}: {e}"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Missing return or re-raise in exception handler causes AttributeError in all callers.

The _get_resource method logs the exception but does not return a value or re-raise, causing it to implicitly return None. All callers immediately invoke methods on the returned resource object (e.g., line 128: resource.create(...)), which will raise AttributeError: 'NoneType' object has no attribute 'create' instead of the original error.

This obscures the root cause and makes debugging significantly harder.

🔧 Proposed fix to re-raise after logging
     def _get_resource(self, kind: str):
         """Get a DynamicClient resource handle for a cloud.redhat.com/v1alpha1 CRD."""
         try:
             return self._dynamic.resources.get(api_version=CRD_API_VERSION, kind=kind)
         except Exception as e:
             # Log available resources for debugging
             log.error(
                 f"Failed to get resource {kind} from {CRD_API_VERSION}. "
                 f"Error: {e.__class__.__name__}: {e}"
             )
+            raise
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bonfire_lib/k8s_client.py` around lines 115 - 122, The exception handler in
_get_resource currently logs the error and falls through, returning None which
causes AttributeError in callers that then call methods like resource.create;
change the handler to either re-raise the caught exception or raise a new,
informative exception after logging so callers receive the original failure (do
not silently return None). Specifically, in the _get_resource function replace
the silent return with a re-raise (raise) or raise RuntimeError(...) wrapping e
to preserve traceback and include the kind and CRD_API_VERSION in the message so
callers of _get_resource (e.g., places that call resource.create) see the real
error instead of getting a NoneType AttributeError.

bsquizz
bsquizz previously approved these changes May 12, 2026
Signed-off-by: JuanmaBM <juanmabareamartinez@gmail.com>
@JuanmaBM JuanmaBM force-pushed the fix-in-cluster-auth-detection branch from d80f59e to 0c00a64 Compare May 12, 2026 14:30
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_bonfire_lib/test_k8s_client.py`:
- Around line 61-67: The test currently only asserts that load_incluster_config
was called after EphemeralK8sClient() but doesn't assert that load_kube_config
was attempted first; update the test in test_k8s_client.py to assert
mock_config.load_kube_config.assert_called_once() (or assert_called_once_with if
specific args are expected) before asserting
mock_config.load_incluster_config.assert_called_once(), referencing the
EphemeralK8sClient instantiation and the mock_config mock to ensure the
kubeconfig path was attempted prior to the in-cluster fallback.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 8774949b-9a94-4f8b-bd27-9f6614c92f37

📥 Commits

Reviewing files that changed from the base of the PR and between d80f59e and 0c00a64.

📒 Files selected for processing (2)
  • bonfire_lib/k8s_client.py
  • tests/test_bonfire_lib/test_k8s_client.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • bonfire_lib/k8s_client.py

Comment on lines +61 to 67
# Mock kubeconfig load to fail, so it falls back to in-cluster
mock_config.load_kube_config.side_effect = ConfigException("No kubeconfig")
mock_config.ConfigException = ConfigException

EphemeralK8sClient()
mock_config.load_incluster_config.assert_called_once()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Assert kubeconfig attempt in the fallback test.

Line 66 only proves fallback happened; it doesn’t prove kubeconfig was attempted first. Add an explicit kubeconfig call assertion so this test protects the regression you’re fixing.

Proposed test hardening
         EphemeralK8sClient()
+        mock_config.load_kube_config.assert_called_once()
         mock_config.load_incluster_config.assert_called_once()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Mock kubeconfig load to fail, so it falls back to in-cluster
mock_config.load_kube_config.side_effect = ConfigException("No kubeconfig")
mock_config.ConfigException = ConfigException
EphemeralK8sClient()
mock_config.load_incluster_config.assert_called_once()
# Mock kubeconfig load to fail, so it falls back to in-cluster
mock_config.load_kube_config.side_effect = ConfigException("No kubeconfig")
mock_config.ConfigException = ConfigException
EphemeralK8sClient()
mock_config.load_kube_config.assert_called_once()
mock_config.load_incluster_config.assert_called_once()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_bonfire_lib/test_k8s_client.py` around lines 61 - 67, The test
currently only asserts that load_incluster_config was called after
EphemeralK8sClient() but doesn't assert that load_kube_config was attempted
first; update the test in test_k8s_client.py to assert
mock_config.load_kube_config.assert_called_once() (or assert_called_once_with if
specific args are expected) before asserting
mock_config.load_incluster_config.assert_called_once(), referencing the
EphemeralK8sClient instantiation and the mock_config mock to ensure the
kubeconfig path was attempted prior to the in-cluster fallback.

@bsquizz
Copy link
Copy Markdown
Contributor

bsquizz commented May 12, 2026

/retest

@bsquizz bsquizz merged commit cf08bac into RedHatInsights:master May 12, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants