Problem
A rotated provider secret in Kubernetes did not reach GoClaw even though gcplane manages provider apiKey with writeOnlyHash.
Incident evidence from SHTP on 2026-05-29:
- Trace
019e726a-b40d-7189-8bb9-5639a9c76963 failed at first LLM call: HTTP 401: zai-coding: token expired or incorrect.
- GoClaw provider verify reproduced the failure for both
glm-5.1 and glm-5-turbo.
- The Kubernetes secret
GOCLAW_ZAI_CODING_API_KEY was present and non-empty.
- Live GoClaw provider
zai-coding had write_only_hash=58d2....
- Desired hash from the current Kubernetes secret was
3c87....
gcplane plan -f shtp --force with the current env correctly showed Provider/zai-coding writeOnlyHash drift.
- The running
gcplane service logs repeatedly showed manifest unchanged, skipping for tenant shtp, so it never reconciled this secret-only drift.
Manual workaround used: update only the zai-coding provider via GoClaw API with the current secret and desired write_only_hash. Provider verify then passed for both models.
Root Cause
The controller skip is based on manifest source hash only. Secret/env values referenced by ${ENV_VAR} are not part of that hash. If a Kubernetes Secret changes while the mounted git/config content does not, gcplane treats the tenant as unchanged and skips reconciliation.
There is a second Kubernetes-specific caveat: the deployment injects secrets through envFrom, so running pods keep old env values until restarted. Even if the controller did not skip, a running pod may still hold stale secret values after K8s Secret rotation.
Expected Behavior
Provider secret rotations should converge without requiring unrelated manifest edits or manual provider updates.
Suggested Design
Options, from conservative to stronger:
- Include a hash of resolved write-only fields in the tenant/source hash for skip decisions. For provider
apiKey, the resolved env value should affect the hash without logging/exposing the value.
- Re-run reconciliation periodically for resources with write-only fields even when the manifest file hash is unchanged. This can be bounded, e.g. every N intervals or when
verifyProviders fails.
- Do not skip
verifyProviders on unchanged manifests. If provider verification fails and desired write-only hash differs, force a provider update.
- For K8s deployments, document or automate rollout restart on
gcplane-secrets updates. Better: mount secrets as files and resolve file:// on each reconcile, because mounted Secret volumes update without restarting the pod.
Safety Requirements
- Never log resolved secret values.
- Continue to log only hash prefixes/full write-only hashes.
- Avoid broad forced updates of all providers when only one provider key drifted.
Problem
A rotated provider secret in Kubernetes did not reach GoClaw even though gcplane manages provider
apiKeywithwriteOnlyHash.Incident evidence from SHTP on 2026-05-29:
019e726a-b40d-7189-8bb9-5639a9c76963failed at first LLM call:HTTP 401: zai-coding: token expired or incorrect.glm-5.1andglm-5-turbo.GOCLAW_ZAI_CODING_API_KEYwas present and non-empty.zai-codinghadwrite_only_hash=58d2....3c87....gcplane plan -f shtp --forcewith the current env correctly showedProvider/zai-codingwriteOnlyHash drift.gcplaneservice logs repeatedly showedmanifest unchanged, skippingfor tenantshtp, so it never reconciled this secret-only drift.Manual workaround used: update only the
zai-codingprovider via GoClaw API with the current secret and desiredwrite_only_hash. Provider verify then passed for both models.Root Cause
The controller skip is based on manifest source hash only. Secret/env values referenced by
${ENV_VAR}are not part of that hash. If a Kubernetes Secret changes while the mounted git/config content does not, gcplane treats the tenant as unchanged and skips reconciliation.There is a second Kubernetes-specific caveat: the deployment injects secrets through
envFrom, so running pods keep old env values until restarted. Even if the controller did not skip, a running pod may still hold stale secret values after K8s Secret rotation.Expected Behavior
Provider secret rotations should converge without requiring unrelated manifest edits or manual provider updates.
Suggested Design
Options, from conservative to stronger:
apiKey, the resolved env value should affect the hash without logging/exposing the value.verifyProvidersfails.verifyProviderson unchanged manifests. If provider verification fails and desired write-only hash differs, force a provider update.gcplane-secretsupdates. Better: mount secrets as files and resolvefile://on each reconcile, because mounted Secret volumes update without restarting the pod.Safety Requirements