Pre-submission Checklist
Operator version
1.24.0
Operator Helm chart version
2.20.0
Bug Report
What happened:
When a DatadogDashboard is deleted directly in the Datadog UI, the DatadogDashboard CR continues to report syncStatus: OK with the stale dashboard ID indefinitely. The operator never detects the external deletion and never recreates the dashboard.
What was expected:
The operator should detect that the dashboard no longer exists in Datadog (e.g. via a periodic GET against the API) and either recreate it or update the CR status to reflect the drift.
Evidence from logs:
The controller reconciles every 60 seconds but never makes an API call to verify the dashboard exists:
{"level":"INFO","ts":"...","logger":"controllers.DatadogDashboard","msg":"Reconciling Datadog Dashboard","datadogdashboard":{"name":"my-dashboard","namespace":"services"}}
{"level":"INFO","ts":"...","logger":"controllers.DatadogDashboard","msg":"Reconciling Datadog Dashboard","datadogdashboard":{"name":"my-dashboard","namespace":"services"}}
# repeats every 60s with no error, no API call, no state change
CR status after 20+ minutes:
syncStatus: OK
id: 28v-nsr-sx8 # dashboard no longer exists in DD
lastForceSyncTime: <original creation time, never updated>
Root cause (suspected):
The reconcile loop appears to short-circuit when the CR already has an ID and the spec hash hasn't changed. It compares currentHash against the spec but never validates the resource exists in Datadog. The lastForceSyncTime never updates after initial creation, suggesting the force-sync path is not triggered for dashboards.
This contrasts with DatadogMonitor, which has a defaultRequeuePeriod and defaultForceSyncPeriod for periodic API validation (ref: #2682, #2816).
Steps to Reproduce
- Create a DatadogDashboard CR
- Wait for
syncStatus: OK and note the dashboard ID
- Delete the dashboard directly in the Datadog UI
- Observe the CR —
syncStatus remains OK with the stale ID
- Wait 20+ minutes — no change, no error, no recreation
Environment
Kubernetes version: 1.34 (EKS)
Cloud provider: AWS (eu-central-1)
CR count: 1 DatadogDashboard, 4 DatadogMonitors
Additional Context
Pre-submission Checklist
Operator version
1.24.0
Operator Helm chart version
2.20.0
Bug Report
What happened:
When a DatadogDashboard is deleted directly in the Datadog UI, the DatadogDashboard CR continues to report
syncStatus: OKwith the stale dashboard ID indefinitely. The operator never detects the external deletion and never recreates the dashboard.What was expected:
The operator should detect that the dashboard no longer exists in Datadog (e.g. via a periodic GET against the API) and either recreate it or update the CR status to reflect the drift.
Evidence from logs:
The controller reconciles every 60 seconds but never makes an API call to verify the dashboard exists:
CR status after 20+ minutes:
Root cause (suspected):
The reconcile loop appears to short-circuit when the CR already has an ID and the spec hash hasn't changed. It compares
currentHashagainst the spec but never validates the resource exists in Datadog. ThelastForceSyncTimenever updates after initial creation, suggesting the force-sync path is not triggered for dashboards.This contrasts with DatadogMonitor, which has a
defaultRequeuePeriodanddefaultForceSyncPeriodfor periodic API validation (ref: #2682, #2816).Steps to Reproduce
syncStatus: OKand note the dashboard IDsyncStatusremainsOKwith the stale IDEnvironment
Kubernetes version: 1.34 (EKS)
Cloud provider: AWS (eu-central-1)
CR count: 1 DatadogDashboard, 4 DatadogMonitors
Additional Context
forceSyncPeriodfor resources #2682 (configurable forceSyncPeriod), Feature Request: Configurable MaxConcurrentReconciles and requeue period for DatadogMonitor at scale #2816 (reconcile queue saturation at scale)