Skip to content

[BUG]Operator v1.25.0 crashloops on missing cluster-wide ConfigMaps RBAC L same class of bug as #2791 but for ConfigMaps #2886

@Paul-Weaver

Description

@Paul-Weaver

Pre-submission Checklist

  • I have searched existing issues and this is not a duplicate
  • This is a Datadog Operator issue (CRDs, reconciliation, etc.), not a Datadog Agent or Datadog service problem (dashboards, monitors, etc.)

Operator version

1.25.0

Operator Helm chart version

2.21.0

Bug Report

What happened:

After upgrading to v1.25.0 (which includes the fix for #2791), the operator no longer crashloops on missing Secrets RBAC — thanks for that fix. However, the same crashloop behavior now occurs due to ConfigMaps RBAC. The operator fails to sync the *v1.ConfigMap informer cache and exits, causing a CrashLoopBackOff:

Failed to watch *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:datadog-operator:datadog-operator" cannot list resource "configmaps" in API group "" at the cluster scope

This is the exact same pattern as #2791 — the controller-runtime cache sync times out on a resource the ServiceAccount can't list at cluster scope, which kills mgr.Start() and the pod exits. The fix in #2793/#2800 addressed this for Secrets but not for ConfigMaps.

We run rbac.create: false with a hand-managed ClusterRole and deliberately do not grant cluster-wide list/watch on ConfigMaps. Granting blanket cluster-scope ConfigMap access is not something we want to do — ConfigMaps can contain sensitive data across all namespaces, and we follow least-privilege RBAC.

What I expected:

The operator should degrade gracefully when ConfigMap list/watch is missing at cluster scope — same treatment as #2793 gave to Secrets. Specifically it should either:

  1. Scope its ConfigMap informer to specific namespaces it actually needs rather than defaulting to cluster-wide via controller-runtime's cache
  2. Use the uncached APIReader for one-off ConfigMap reads so the cluster-wide informer isn't triggered
  3. Log a warning and disable the dependent feature (e.g. Helm metadata collection), same as the operator did in v1.23.1 for Secrets

This was not an issue on 1.23.1. Several features between 1.23 and 1.25 appear to have introduced ConfigMap reads through the cached client (credential manager fallback, Helm metadata informer, PAR deployment), which triggers controller-runtime's default cluster-scoped informer.

Workaround: grant cluster-wide configmaps get/list/watch, but this defeats the purpose of least-privilege RBAC.

Related: #2791, #2793, #2800, #967

Steps to Reproduce

  1. Deploy chart 2.21.0 with rbac.create: false
  2. Create a ClusterRole that does NOT include configmaps list/watch at cluster scope
  3. Enable datadogMonitor and datadogDashboard controllers
  4. Observe pod — crashloops with Failed to watch *v1.ConfigMap / informer cache sync timeout / exit code 1

Environment

  • Operator: 1.25.0
  • Chart: datadog-operator 2.21.0
  • Kubernetes: 1.29
  • Helm: 3.x
  • Cloud: GKE (managed)

Additional Context

Pod logs showing the repeated ConfigMap RBAC errors and eventual crashloop:
{"level":"ERROR","ts":"2026-04-09T20:12:41.112Z","logger":"controller-runtime.cache.UnhandledError","msg":"Failed to watch","type":"*v1.ConfigMap","error":"failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:datadog-operator:datadog-operator" cannot list resource "configmaps" in API group "" at the cluster scope"}

The error repeats with exponential backoff until the informer cache sync times out and the pod exits.

Current workaround: grant cluster-wide configmaps get/list/watch, which we'd prefer not to do.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpending

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions