TL;DR
When a manifest edit changes only fields that are absent from the resource's list-API response (e.g. CronJob.message, CronJob.agentKey, Agent.toolsConfig, Provider.apiKey), gcplane logs a resource no-op may hide drift in unobservable fields warning and skips the apply — even though the manifest hash changed and the controller correctly entered the reconcile loop.
Net effect for GitOps users: git push of a CronJob message rewrite returns success on every layer (CI, git-sync, gcplane controller) but the new message never reaches GoClaw's DB. The cron continues firing with the previous prompt until something observable also drifts.
Reproduction
Real incident, May 21 2026, prod everest cluster, tenant annhien:
- Pushed
goclaw-config@2666f4a — rewrote the bao-cao-doanh-so-17h cron's message field (~22 KB → ~1.1 KB). No other field touched.
- git-sync picked up the commit within 30s. gcplane controller fetched the new manifest, computed a new hash, entered reconcile.
- Reconciler observed all 3 CronJob resources, found observable surface (
schedule, enabled, deliver*) matched DB → marked no-op + logged the warning:
```
level=WARN msg="resource no-op may hide drift in unobservable fields" tenant=annhien kind=CronJob name=bao-cao-doanh-so-17h unobservable_fields="[agentKey message]" hint="this field is not returned by the GoClaw list API; the reconciler cannot detect drift in it. Force re-apply by toggling another observable field (e.g. enabled) or by deleting and re-creating the resource."
```
- `reconcile complete tenant=annhien creates=0 updates=3 noops=33 applied=3 failed=0` — the 3 "applied" were unrelated MCPServer drift; the CronJobs landed in the 33 noops.
- Subsequent reconciles correctly observed `hash unchanged, skipping`. The new prompt was effectively un-deployable through GitOps.
- Manual recovery: forced observable drift by normalising `tz: Asia/Ho_Chi_Minh` → `tz: Asia/Saigon` (both IANA aliases of ICT). On next reconcile the resources re-applied and the message body flowed through with them.
Why this is severe
- GitOps trust assumption violated. Users (correctly) expect that any manifest change reachable through normal git-sync deploys. A successful push + successful reconcile cycle that silently retains stale state is invisible without log-reading.
- The warning is unhelpful for the realistic case. The hint says "toggle another observable field" — but for CronJobs the only safe observable toggle requires editing a field that has no semantic meaning (we used a tz-alias change). For other resource kinds the only suggestion ("delete and re-create") implies downtime.
- Compounds with the pod-restart no-op bug. A pod restart (e.g. healthcheck failure → rollout) does NOT recover this state — the fresh pod also no-ops because the observable surface still matches DB. We verified this: after `kubectl rollout restart deployment/gcplane` the new pod's first reconcile still showed `CronJob ... no-op may hide drift`.
Source-level analysis
- `internal/controller/controller.go:174-184` — `if hash == c.lastHash` skip is correct (controller-level dedup).
- `internal/reconciler/engine.go:380-403` — emits the warning then returns without scheduling an Update. The decision to no-op is made purely on observable diff, with no consideration that the manifest itself just changed.
Proposed fix (one of)
Option A — smart default (preferred): When a resource's observable surface diff is empty but it has non-empty unobservable fields in the manifest, AND the controller-level hash transitioned this cycle, treat it as drift and emit an Update. The Update already sends the full spec including unobservable fields, so the fix is small and contained to `stepCompare` in `engine.go`.
Sketch:
```go
// in engine.go around the unobservable-field warning
if len(present) > 0 && opts.ManifestChangedThisCycle {
// Manifest hash transitioned this cycle and we have unobservable fields
// that could have changed. Treat as drift to avoid silent no-op.
rc.action = ActionUpdate
rc.reason = "manifest hash transitioned with unobservable fields present"
return
}
```
The controller would need to pass a `ManifestChangedThisCycle bool` into `ReconcileOpts` (trivial — it already knows from the `hash != c.lastHash` branch).
Option B — explicit opt-in: Add a `gcplane.io/always-reapply: "true"` annotation users can stamp on resources known to have significant unobservable fields. Backwards-compatible but pushes the burden to users.
Option C — escalate the warning severity: Promote the warning to an ERROR + non-zero exit on `gcplane apply` when the user did NOT pass `--force`. Cheap to ship, makes the silent failure loud. Doesn't fix `serve` mode though.
Option D — push upstream fix to goclaw. Have the cron-list WS API return `message` and `agentKey` so they become observable. Cleanest long-term but blocked on goclaw repo (issues disabled, separate PR cycle).
I'd favour shipping A + D in parallel: A is a 20-line patch that prevents the entire class of silent-no-op bugs across all resource kinds; D closes the gap permanently for CronJob specifically.
Evidence bundle
Pod, reconcile timestamps, hash transitions, recovery commit — all available in the goclaw-config repo:
Related unobservable-field surfaces noted in the same logs
Worth auditing whether each of these has the same trap:
- `Provider.apiKey` (anthropic, openai, gemini, openrouter, dashscope, zai-coding) — rotating an API key in YAML would silently no-op.
- `Agent.contextFiles`, `Agent.toolsConfig` (van-anh, marketing-agent, sales-analyst, support, assistant) — editing tools config silently no-ops.
- `Channel.agentKey`, `Channel.config`, `Channel.credentials` — rebinding a channel to a different agent silently no-ops.
- `CronJob.agentKey`, `CronJob.message` — confirmed above.
Each of these is a latent footgun for normal GitOps workflows. Fix A above addresses the whole class.
TL;DR
When a manifest edit changes only fields that are absent from the resource's list-API response (e.g.
CronJob.message,CronJob.agentKey,Agent.toolsConfig,Provider.apiKey), gcplane logs aresource no-op may hide drift in unobservable fieldswarning and skips the apply — even though the manifest hash changed and the controller correctly entered the reconcile loop.Net effect for GitOps users:
git pushof a CronJob message rewrite returns success on every layer (CI, git-sync, gcplane controller) but the new message never reaches GoClaw's DB. The cron continues firing with the previous prompt until something observable also drifts.Reproduction
Real incident, May 21 2026, prod
everestcluster, tenantannhien:goclaw-config@2666f4a— rewrote thebao-cao-doanh-so-17hcron'smessagefield (~22 KB → ~1.1 KB). No other field touched.schedule,enabled,deliver*) matched DB → marked no-op + logged the warning:```
level=WARN msg="resource no-op may hide drift in unobservable fields" tenant=annhien kind=CronJob name=bao-cao-doanh-so-17h unobservable_fields="[agentKey message]" hint="this field is not returned by the GoClaw list API; the reconciler cannot detect drift in it. Force re-apply by toggling another observable field (e.g. enabled) or by deleting and re-creating the resource."
```
Why this is severe
Source-level analysis
Proposed fix (one of)
Option A — smart default (preferred): When a resource's observable surface diff is empty but it has non-empty unobservable fields in the manifest, AND the controller-level hash transitioned this cycle, treat it as drift and emit an Update. The Update already sends the full spec including unobservable fields, so the fix is small and contained to `stepCompare` in `engine.go`.
Sketch:
```go
// in engine.go around the unobservable-field warning
if len(present) > 0 && opts.ManifestChangedThisCycle {
// Manifest hash transitioned this cycle and we have unobservable fields
// that could have changed. Treat as drift to avoid silent no-op.
rc.action = ActionUpdate
rc.reason = "manifest hash transitioned with unobservable fields present"
return
}
```
The controller would need to pass a `ManifestChangedThisCycle bool` into `ReconcileOpts` (trivial — it already knows from the `hash != c.lastHash` branch).
Option B — explicit opt-in: Add a `gcplane.io/always-reapply: "true"` annotation users can stamp on resources known to have significant unobservable fields. Backwards-compatible but pushes the burden to users.
Option C — escalate the warning severity: Promote the warning to an ERROR + non-zero exit on `gcplane apply` when the user did NOT pass `--force`. Cheap to ship, makes the silent failure loud. Doesn't fix `serve` mode though.
Option D — push upstream fix to goclaw. Have the cron-list WS API return `message` and `agentKey` so they become observable. Cleanest long-term but blocked on goclaw repo (issues disabled, separate PR cycle).
I'd favour shipping A + D in parallel: A is a 20-line patch that prevents the entire class of silent-no-op bugs across all resource kinds; D closes the gap permanently for CronJob specifically.
Evidence bundle
Pod, reconcile timestamps, hash transitions, recovery commit — all available in the goclaw-config repo:
Related unobservable-field surfaces noted in the same logs
Worth auditing whether each of these has the same trap:
Each of these is a latent footgun for normal GitOps workflows. Fix A above addresses the whole class.