Skip to content

[BUG] DatadogDashboard controller does not detect external deletions #2854

@1oglop1

Description

@1oglop1

Pre-submission Checklist

  • I have searched existing issues and this is not a duplicate
  • This is a Datadog Operator issue (CRDs, reconciliation, etc.), not a Datadog Agent or Datadog service problem

Operator version

1.24.0

Operator Helm chart version

2.20.0

Bug Report

What happened:

When a DatadogDashboard is deleted directly in the Datadog UI, the DatadogDashboard CR continues to report syncStatus: OK with the stale dashboard ID indefinitely. The operator never detects the external deletion and never recreates the dashboard.

What was expected:

The operator should detect that the dashboard no longer exists in Datadog (e.g. via a periodic GET against the API) and either recreate it or update the CR status to reflect the drift.

Evidence from logs:

The controller reconciles every 60 seconds but never makes an API call to verify the dashboard exists:

{"level":"INFO","ts":"...","logger":"controllers.DatadogDashboard","msg":"Reconciling Datadog Dashboard","datadogdashboard":{"name":"my-dashboard","namespace":"services"}}
{"level":"INFO","ts":"...","logger":"controllers.DatadogDashboard","msg":"Reconciling Datadog Dashboard","datadogdashboard":{"name":"my-dashboard","namespace":"services"}}
# repeats every 60s with no error, no API call, no state change

CR status after 20+ minutes:

syncStatus: OK
id: 28v-nsr-sx8       # dashboard no longer exists in DD
lastForceSyncTime: <original creation time, never updated>

Root cause (suspected):

The reconcile loop appears to short-circuit when the CR already has an ID and the spec hash hasn't changed. It compares currentHash against the spec but never validates the resource exists in Datadog. The lastForceSyncTime never updates after initial creation, suggesting the force-sync path is not triggered for dashboards.

This contrasts with DatadogMonitor, which has a defaultRequeuePeriod and defaultForceSyncPeriod for periodic API validation (ref: #2682, #2816).

Steps to Reproduce

  1. Create a DatadogDashboard CR
  2. Wait for syncStatus: OK and note the dashboard ID
  3. Delete the dashboard directly in the Datadog UI
  4. Observe the CR — syncStatus remains OK with the stale ID
  5. Wait 20+ minutes — no change, no error, no recreation

Environment

Kubernetes version: 1.34 (EKS)
Cloud provider: AWS (eu-central-1)
CR count: 1 DatadogDashboard, 4 DatadogMonitors

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions