Surfaced by
Core-dump TC2 staging cutover (jon@langevin.me, 2026-05-05). After workflow#541 fix landed in v0.21.2 + the corresponding deploy.yml stopgap-removal (drop `STAGING_PG_PASSWORD` from Plan env block), `wfctl infra apply --plan plan.json` fails with:
```
error: plan stale: config hash mismatch (run wfctl infra plan again)
```
Root cause
`infraPreserveKeys` in `cmd/wfctl/infra.go` only preserves `env_vars`, `env_vars_secret`, `secret_env_vars` submap keys through plan-time serialization. Other config fields that legitimately contain `${VAR}` references — e.g. Droplet `user_data` (cloud-init script that needs the random_hex secret to provision Postgres) — are still substituted by `os.ExpandEnv` at plan-time.
When Plan runs WITHOUT the secret in env (post-#541, no need for the stopgap) and Apply runs WITH the secret in env (W-5 JIT for the value during apply-time substitution):
- Plan-time: `user_data` contains `POSTGRES_PASSWORD: '${STAGING_PG_PASSWORD}'` → `os.ExpandEnv` substitutes to `''` (empty).
- Apply-time: `user_data` contains the same template → `os.ExpandEnv` substitutes to the actual hex value.
`desiredStateHash(specs)` JSON-serializes the resolved specs and SHA-256s them. The two substitutions diverge → DesiredHash mismatch → `plan stale`.
Reproduction
Core-dump's infra.yaml has a Droplet whose `config.user_data` cloud-init script references `${STAGING_PG_PASSWORD}`:
```yaml
- name: coredump-staging-pg
type: infra.droplet
config:
user_data: |
#cloud-config
write_files:
- path: /opt/coredump-pg/docker-compose.yml
content: |
services:
postgres:
environment:
POSTGRES_PASSWORD: '${STAGING_PG_PASSWORD}'
```
Run with `STAGING_PG_PASSWORD` unset at `wfctl infra plan` (R-A4 in v0.21.2 sees the top-level `secrets.generate` declaration and skips the env-var-resolution check) and set at `wfctl infra apply`. The hash check fails.
Real example
core-dump deploy.yml run 25380846940 on commit 723a55a8 — Plan step succeeded with 2 updates planned (firewall + container_service), Apply step failed with the message above. v0.21.2 + DO plugin v0.10.1 in use.
Fix options
Option A (narrow): Add `user_data` to `infraPreserveKeys`. Pros: surgical. Cons: doesn't generalize — every new field that legitimately holds `${VAR}` needs to be added.
Option B (broad, correct): Use `TolerantEnvProvider`-style preservation sentinel for ALL string-valued config fields when the var is in `cfg.Secrets.Generate`. Plan emits the literal; Apply substitutes; hash uses the sentinel. Both Plan and Apply produce hash-identical specs. This is what `infraPreserveKeys` should have been: every config string referencing a declared secret is preserved through plan, substituted at apply-time-driver-dispatch.
Option C (workaround): Re-add `STAGING_PG_PASSWORD` (and any other top-level-secret env vars) to the Plan + Validate steps in deploy.yml. This is the pre-#541 stopgap; works but undoes the W-541 cleanup gain.
Recommended
Option B with a follow-up to deprecate `infraPreserveKeys` once `cfg.Secrets.Generate`-aware preservation is the default. Until then, downstream consumers (core-dump, BMW) should re-add the env stopgap on Plan + Validate as documented in the deploy.yml comment block.
References
Surfaced by
Core-dump TC2 staging cutover (jon@langevin.me, 2026-05-05). After workflow#541 fix landed in v0.21.2 + the corresponding deploy.yml stopgap-removal (drop `STAGING_PG_PASSWORD` from Plan env block), `wfctl infra apply --plan plan.json` fails with:
```
error: plan stale: config hash mismatch (run wfctl infra plan again)
```
Root cause
`infraPreserveKeys` in `cmd/wfctl/infra.go` only preserves `env_vars`, `env_vars_secret`, `secret_env_vars` submap keys through plan-time serialization. Other config fields that legitimately contain `${VAR}` references — e.g. Droplet `user_data` (cloud-init script that needs the random_hex secret to provision Postgres) — are still substituted by `os.ExpandEnv` at plan-time.
When Plan runs WITHOUT the secret in env (post-#541, no need for the stopgap) and Apply runs WITH the secret in env (W-5 JIT for the value during apply-time substitution):
`desiredStateHash(specs)` JSON-serializes the resolved specs and SHA-256s them. The two substitutions diverge → DesiredHash mismatch → `plan stale`.
Reproduction
Core-dump's infra.yaml has a Droplet whose `config.user_data` cloud-init script references `${STAGING_PG_PASSWORD}`:
```yaml
type: infra.droplet
config:
user_data: |
#cloud-config
write_files:
- path: /opt/coredump-pg/docker-compose.yml
content: |
services:
postgres:
environment:
POSTGRES_PASSWORD: '${STAGING_PG_PASSWORD}'
```
Run with `STAGING_PG_PASSWORD` unset at `wfctl infra plan` (R-A4 in v0.21.2 sees the top-level `secrets.generate` declaration and skips the env-var-resolution check) and set at `wfctl infra apply`. The hash check fails.
Real example
core-dump deploy.yml run 25380846940 on commit 723a55a8 — Plan step succeeded with 2 updates planned (firewall + container_service), Apply step failed with the message above. v0.21.2 + DO plugin v0.10.1 in use.
Fix options
Option A (narrow): Add `user_data` to `infraPreserveKeys`. Pros: surgical. Cons: doesn't generalize — every new field that legitimately holds `${VAR}` needs to be added.
Option B (broad, correct): Use `TolerantEnvProvider`-style preservation sentinel for ALL string-valued config fields when the var is in `cfg.Secrets.Generate`. Plan emits the literal; Apply substitutes; hash uses the sentinel. Both Plan and Apply produce hash-identical specs. This is what `infraPreserveKeys` should have been: every config string referencing a declared secret is preserved through plan, substituted at apply-time-driver-dispatch.
Option C (workaround): Re-add `STAGING_PG_PASSWORD` (and any other top-level-secret env vars) to the Plan + Validate steps in deploy.yml. This is the pre-#541 stopgap; works but undoes the W-541 cleanup gain.
Recommended
Option B with a follow-up to deprecate `infraPreserveKeys` once `cfg.Secrets.Generate`-aware preservation is the default. Until then, downstream consumers (core-dump, BMW) should re-add the env stopgap on Plan + Validate as documented in the deploy.yml comment block.
References