Context
writeOnlyHash drift-detection (gcplane reconciler.WriteOnlyHashField + goclaw migrations 000059 cron + 000060 providers, goclaw PR #3 #5 #6 #7 + gcplane PR #10) is shipped for CronJob + Provider as of goclaw v3.12.1 + gcplane v1.5.0.
The remaining 9 manifest kinds with write-only fields are still on the legacy blind-noop path. gcplane v1.5.0's kindsSupportingWriteOnlyHash allowlist gate prevents an infinite update loop on those kinds (good), but drift in their write-only fields is never detected (bad) — same failure mode the zai-coding 401 incident exposed for Provider.apiKey. Rotating any of these write-only fields silently does nothing until someone bumps an observable field.
Affected kinds (from internal/manifest/field_config.go::writeOnlyFields)
| Priority |
Kind |
Write-only fields |
Transport |
Pattern template |
| P1 |
Channel |
agentKey, credentials, config |
HTTP |
Provider (goclaw PR #5) |
| P1 |
MCPCredentials |
credentials |
HTTP |
Provider (goclaw PR #5) |
| P1 |
SecureCLI |
env (encrypted env vars) |
HTTP |
Provider (goclaw PR #5) |
| P2 |
Agent |
contextFiles, systemPrompt, 12+ JSONB configs |
HTTP |
Provider (goclaw PR #5) |
| P2 |
MCPServer |
grants |
HTTP |
Provider (goclaw PR #5) |
| P3 |
AgentTeam |
lead, members, displayName† |
WS-RPC |
CronJob (goclaw PR #3) |
| P3 |
Skill |
sourceDir, version ‡ |
HTTP |
Provider (goclaw PR #5) |
| P3 |
SecureCLIGrant |
agentKey, binaryName (manifest refs) |
HTTP |
Provider (goclaw PR #5) |
| P3 |
AgentLink |
sourceAgent, targetAgent, settings |
WS-RPC |
CronJob (goclaw PR #3) |
† AgentTeam.displayName being write-only is suspicious — investigate whether it's actually observable but mis-classified. ‡ Skill.sourceDir/version are local-only / server-managed; hash mostly catches "manifest changed but goclaw lost it" cases.
P1 = credential-rotation impacted (same failure mode as zai-coding 401). Channel rotations are most frequent in practice (telegram/slack/zalo bot tokens).
Per-kind goclaw checklist (~11 mechanical edits each)
Follow goclaw#5 for HTTP-transport kinds, goclaw#3 for WS-RPC kinds:
Then in this repo (gcplane), one-line per kind:
Per-kind gotchas to verify before changes
- Channel — has 3 write-only fields; ensure the hash captures all 3 in
HashWriteOnlyFields ordering (it sorts by field name, so should be fine, but verify the test confirms agentKey + credentials + config round-trip)
- MCPCredentials — single-field, simplest case
- SecureCLI —
env is a JSONB map of encrypted vars; ensure hash is stable for map ordering (HashWriteOnlyFields uses json.Marshal of a slice-of-pairs which preserves order — but map iteration in spec build must be sorted upstream)
- Agent — 12+ write-only fields; hash captures a lot. Watch for false-drift if reconciler injects defaults that the API doesn't echo
- AgentTeam — investigate
displayName mis-classification first; the actual write-only set may be just lead + members
- MCPServer —
grants is JSONB; same map-ordering concern as SecureCLI.env
- AgentLink/SecureCLIGrant — write-only fields are manifest references (slugs) resolved to UUIDs at create time; the hash captures the manifest's intent rather than DB state. Confirm this is the right semantic before adding.
- Skill —
sourceDir is local-fs path on the gcplane runner, never sent to goclaw. Verify the hash excludes it via ParseIgnoreFields annotation pattern or by removing from spec before hashing.
Suggested rollout order
- MCPCredentials — simplest (single field), lowest blast radius, validates the cron→provider→Nth-kind generalization
- SecureCLI — single field (env), high value (encrypted CLI secrets)
- Channel — biggest win operationally, but 3 fields means most edge cases
- MCPServer then Agent — broader scope
- AgentTeam + AgentLink + SecureCLIGrant — investigate semantics first
- Skill — lowest priority
Verification (per-kind, post-deploy)
# After deploying goclaw with the new migration + gcplane with the allowlist add:
kubectl logs deploy/gcplane | grep \"write-only hash mismatch\" | grep kind=KIND_NAME
# Expect: one mismatch+update per existing row, then steady-state silence.
kubectl logs deploy/gcplane | grep \"no-op may hide drift\" | grep kind=KIND_NAME
# Expect: empty (the WARN that exists today should disappear).
Open questions
- Should we generate the boilerplate (codegen for the 5-layer touch)? After 11 kinds × 11 edits, the manual pattern is tedious but each kind has subtle differences (different SQL helper, different scan struct shape). Codegen overhead probably not worth it for ≤11 kinds.
- Should
kindsSupportingWriteOnlyHash be a per-kind annotation on the Resource definition instead of a global map? Reduces drift between gcplane and goclaw releases (currently a manual sync).
- Worth adding a CI check that fails if a goclaw migration adds
write_only_hash to a new table but gcplane's allowlist isn't updated in the same release?
References
- Original incident report:
infra/plans/reports/debugger-260521-1113-zai-coding-401-provider-resync.md
- Provider rollout plan:
gcplane/plans/260521-1127-provider-writeonly-hash-resync/
- Memory note:
infra/memory/gcplane-provider-resync.md
- Shipped artifacts: goclaw #5 #6 #7, gcplane #10, tags goclaw
v3.12.1 + gcplane v1.5.0
Context
writeOnlyHashdrift-detection (gcplane reconciler.WriteOnlyHashField + goclaw migrations 000059 cron + 000060 providers, goclaw PR #3 #5 #6 #7 + gcplane PR #10) is shipped for CronJob + Provider as of goclaw v3.12.1 + gcplane v1.5.0.The remaining 9 manifest kinds with write-only fields are still on the legacy blind-noop path. gcplane v1.5.0's
kindsSupportingWriteOnlyHashallowlist gate prevents an infinite update loop on those kinds (good), but drift in their write-only fields is never detected (bad) — same failure mode the zai-coding 401 incident exposed for Provider.apiKey. Rotating any of these write-only fields silently does nothing until someone bumps an observable field.Affected kinds (from
internal/manifest/field_config.go::writeOnlyFields)ChannelagentKey,credentials,configMCPCredentialscredentialsSecureCLIenv(encrypted env vars)AgentcontextFiles,systemPrompt, 12+ JSONB configsMCPServergrantsAgentTeamlead,members,displayName†SkillsourceDir,version‡SecureCLIGrantagentKey,binaryName(manifest refs)AgentLinksourceAgent,targetAgent,settings†
AgentTeam.displayNamebeing write-only is suspicious — investigate whether it's actually observable but mis-classified. ‡Skill.sourceDir/versionare local-only / server-managed; hash mostly catches "manifest changed but goclaw lost it" cases.P1 = credential-rotation impacted (same failure mode as zai-coding 401). Channel rotations are most frequent in practice (telegram/slack/zalo bot tokens).
Per-kind goclaw checklist (~11 mechanical edits each)
Follow goclaw#5 for HTTP-transport kinds, goclaw#3 for WS-RPC kinds:
migrations/0000XX_KIND_write_only_hash.up.sql—ALTER TABLE <table> ADD COLUMN write_only_hash TEXT NOT NULL DEFAULT '';migrations/0000XX_KIND_write_only_hash.down.sql— DROP COLUMNinternal/upgrade/version.go— bumpRequiredSchemaVersionby 1 (the version_test.go guard fails CI if you forget)internal/store/KIND_store.go— addWriteOnlyHash stringwithjson:\"write_only_hash,omitempty\" db:\"write_only_hash\"(HTTP) orjson:\"writeOnlyHash\"(WS-RPC)internal/http/validate.go— add\"write_only_hash\": trueto the kind's allowlist (HTTP only)internal/gateway/methods/KIND.go— AddWriteOnlyHashto inline params struct + apply via patch (WS-RPC only — see cron.go:74-114)internal/store/pg/KIND.go— INSERT column + placeholder + UPSERTEXCLUDED.write_only_hash; add to all SELECT listsinternal/store/sqlitestore/KIND.go— same as pg; bumpSchemaVersionconst + add ALTER to incremental migrations map inschema.gointernal/store/sqlitestore/sqlx_scan_structs.go— add field to scan struct + mapper (if HTTP kind uses sqlx scan struct)internal/store/sqlitestore/schema.sql— add column to fresh-DB CREATE TABLEinternal/store/sqlitestore/schema_migration_test.go— DROP COLUMN fixup fortargetVersion < NEWso v11→current migration path still worksinternal/store/sqlitestore/KIND_write_only_hash_test.go— round-trip test (mirrorproviders_write_only_hash_test.goorcron_crud_test.go::TestSQLiteCronStore_WriteOnlyHashRoundtrip)Then in this repo (gcplane), one-line per kind:
internal/manifest/field_config.go::kindsSupportingWriteOnlyHash— add the kindPer-kind gotchas to verify before changes
HashWriteOnlyFieldsordering (it sorts by field name, so should be fine, but verify the test confirmsagentKey + credentials + configround-trip)envis a JSONB map of encrypted vars; ensure hash is stable for map ordering (HashWriteOnlyFields usesjson.Marshalof a slice-of-pairs which preserves order — but map iteration in spec build must be sorted upstream)displayNamemis-classification first; the actual write-only set may be justlead+membersgrantsis JSONB; same map-ordering concern as SecureCLI.envsourceDiris local-fs path on the gcplane runner, never sent to goclaw. Verify the hash excludes it viaParseIgnoreFieldsannotation pattern or by removing from spec before hashing.Suggested rollout order
Verification (per-kind, post-deploy)
Open questions
kindsSupportingWriteOnlyHashbe a per-kind annotation on the Resource definition instead of a global map? Reduces drift between gcplane and goclaw releases (currently a manual sync).write_only_hashto a new table but gcplane's allowlist isn't updated in the same release?References
infra/plans/reports/debugger-260521-1113-zai-coding-401-provider-resync.mdgcplane/plans/260521-1127-provider-writeonly-hash-resync/infra/memory/gcplane-provider-resync.mdv3.12.1+ gcplanev1.5.0