DesiredHash mismatch when secret refs appear outside env_vars (e.g. Droplet user_data)

## Surfaced by

Core-dump TC2 staging cutover (jon@langevin.me, 2026-05-05). After workflow#541 fix landed in v0.21.2 + the corresponding deploy.yml stopgap-removal (drop \`STAGING_PG_PASSWORD\` from Plan env block), \`wfctl infra apply --plan plan.json\` fails with:

\`\`\`
error: plan stale: config hash mismatch (run wfctl infra plan again)
\`\`\`

## Root cause

\`infraPreserveKeys\` in \`cmd/wfctl/infra.go\` only preserves \`env_vars\`, \`env_vars_secret\`, \`secret_env_vars\` submap keys through plan-time serialization. Other config fields that legitimately contain \`\${VAR}\` references — e.g. Droplet \`user_data\` (cloud-init script that needs the random_hex secret to provision Postgres) — are still substituted by \`os.ExpandEnv\` at plan-time.

When Plan runs WITHOUT the secret in env (post-#541, no need for the stopgap) and Apply runs WITH the secret in env (W-5 JIT for the value during apply-time substitution):

- Plan-time: \`user_data\` contains \`POSTGRES_PASSWORD: '\${STAGING_PG_PASSWORD}'\` → \`os.ExpandEnv\` substitutes to \`''\` (empty).
- Apply-time: \`user_data\` contains the same template → \`os.ExpandEnv\` substitutes to the actual hex value.

\`desiredStateHash(specs)\` JSON-serializes the resolved specs and SHA-256s them. The two substitutions diverge → DesiredHash mismatch → \`plan stale\`.

## Reproduction

Core-dump's infra.yaml has a Droplet whose \`config.user_data\` cloud-init script references \`\${STAGING_PG_PASSWORD}\`:

\`\`\`yaml
- name: coredump-staging-pg
  type: infra.droplet
  config:
    user_data: |
      #cloud-config
      write_files:
        - path: /opt/coredump-pg/docker-compose.yml
          content: |
            services:
              postgres:
                environment:
                  POSTGRES_PASSWORD: '\${STAGING_PG_PASSWORD}'
\`\`\`

Run with \`STAGING_PG_PASSWORD\` unset at \`wfctl infra plan\` (R-A4 in v0.21.2 sees the top-level \`secrets.generate\` declaration and skips the env-var-resolution check) and set at \`wfctl infra apply\`. The hash check fails.

## Real example

core-dump deploy.yml run 25380846940 on commit 723a55a8 — Plan step succeeded with 2 updates planned (firewall + container_service), Apply step failed with the message above. v0.21.2 + DO plugin v0.10.1 in use.

## Fix options

**Option A (narrow)**: Add \`user_data\` to \`infraPreserveKeys\`. Pros: surgical. Cons: doesn't generalize — every new field that legitimately holds \`\${VAR}\` needs to be added.

**Option B (broad, correct)**: Use \`TolerantEnvProvider\`-style preservation sentinel for ALL string-valued config fields when the var is in \`cfg.Secrets.Generate\`. Plan emits the literal; Apply substitutes; hash uses the sentinel. Both Plan and Apply produce hash-identical specs. This is what \`infraPreserveKeys\` should have been: every config string referencing a declared secret is preserved through plan, substituted at apply-time-driver-dispatch.

**Option C (workaround)**: Re-add \`STAGING_PG_PASSWORD\` (and any other top-level-secret env vars) to the Plan + Validate steps in deploy.yml. This is the pre-#541 stopgap; works but undoes the W-541 cleanup gain.

## Recommended

Option B with a follow-up to deprecate \`infraPreserveKeys\` once \`cfg.Secrets.Generate\`-aware preservation is the default. Until then, downstream consumers (core-dump, BMW) should re-add the env stopgap on Plan + Validate as documented in the deploy.yml comment block.

## References

- workflow PR #525 (W-1 InputSnapshot + TolerantEnvProvider preservation sentinel)
- workflow PR #541 (R-A4 top-level \`secrets.generate\` keys)
- core-dump deploy.yml run: https://github.com/GoCodeAlone/core-dump/actions/runs/25380846940
- \`cmd/wfctl/infra.go::infraPreserveKeys\` (line ~570)
- \`cmd/wfctl/infra_apply.go::desiredStateHash\` (line 632)
- \`config/env_expand.go::expandEnvInValueWithPreserve\`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DesiredHash mismatch when secret refs appear outside env_vars (e.g. Droplet user_data) #558

Surfaced by

Root cause

Reproduction

Real example

Fix options

Recommended

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DesiredHash mismatch when secret refs appear outside env_vars (e.g. Droplet user_data) #558

Description

Surfaced by

Root cause

Reproduction

Real example

Fix options

Recommended

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions