Skip to content

fix(examples/03-immich): wire db password secret into immich-server#35

Merged
outergod merged 1 commit into
masterfrom
fix/immich-server-db-password
May 7, 2026
Merged

fix(examples/03-immich): wire db password secret into immich-server#35
outergod merged 1 commit into
masterfrom
fix/immich-server-db-password

Conversation

@outergod
Copy link
Copy Markdown
Owner

@outergod outergod commented May 7, 2026

Summary

  • examples/03-immich/services/immich-server/quadlet/immich-server.container declared the DB hostname/username/database but did not mount the immich-db-password Podman secret nor point the server at the password file via DB_PASSWORD_FILE. End-to-end on a clean host this resulted in PostgresError: password authentication failed for user "immich" (code 28P01) and immich-server.service cycling on Restart=always indefinitely — even after core-ops apply reported Outcome: converged.
  • The canonical Immich walkthrough (spec/018 FR-005) thus shipped a misleading all-green plan/apply/re-plan triple while the application was fundamentally non-functional. Directly undercuts spec/018's adoption-likelihood goal.
  • Add Secret=immich-db-password,target=/run/secrets/immich-db-password and Environment=DB_PASSWORD_FILE=/run/secrets/immich-db-password to the unit. Immich supports the _FILE env-var convention natively, matching the same Podman secret as the database — a single operator-provided secret feeds both ends.
  • Patch bump (unclassified_path_releasable_default for examples/); provenance-state fixture pinned in lock-step.

Discovered while exercising the canonical walkthrough for spec/018 session 3; the canonical example never ran end-to-end on a clean host since spec/017 shipped it. Same shape as PR #34 (postgres image-tag pin).

Test plan

  • cargo test — 472 passed
  • cargo clippy --all-targets -- -D warnings — clean
  • cargo run --bin core-ops-release -- validate --base-ref master — passed (patch)
  • End-to-end on core-ops-uat (Fedora CoreOS): sudo core-ops apply --source-repo examples/03-immich --host exampleOutcome: converged; systemctl is-active immich-serveractive; NRestarts=1; journal shows Immich Microservices is running [v2.7.5] [production].
  • Idempotent re-plan: core-ops plan ...10 unchanged.
  • CI green
  • Post-merge: spec/018 picks this up via rebase + re-records docs/onboarding.cast against the now-healthy stack.

🤖 Generated with Claude Code

`examples/03-immich/services/immich-server/quadlet/immich-server.container`
declared `DB_USERNAME=immich` and `DB_HOSTNAME=immich-database` but did
not mount the `immich-db-password` Podman secret nor point the server
at the password file. The server fell back to whatever default
Immich's image assumes, which does not match the random password the
sibling `immich-database.container` initialises Postgres with via:

  Secret=immich-db-password,target=/run/secrets/immich-db-password
  Environment=POSTGRES_PASSWORD_FILE=/run/secrets/immich-db-password

End-to-end on a clean host this manifested as:

  PostgresError: password authentication failed for user "immich"
    code: '28P01'   file: 'auth.c'   routine: 'auth_failed'

with `immich-server.service` cycling on `Restart=always` indefinitely
even after `core-ops apply` reported `Outcome: converged`. The
canonical Immich walkthrough (FR-005 of spec/018) thus presented an
all-green plan/apply/re-plan triple while the application was
fundamentally non-functional — directly undercutting spec/018's
"increase adoption likelihood" goal.

Add to `immich-server.container`:

  Environment=DB_PASSWORD_FILE=/run/secrets/immich-db-password
  Secret=immich-db-password,target=/run/secrets/immich-db-password

Immich supports the `_FILE` env-var convention for secrets natively
(per Immich documentation), matching the same Podman secret as the
database, so a single operator-provided secret feeds both ends.

Discovered while exercising the canonical walkthrough for spec/018
session 3; the canonical example never ran end-to-end on a clean
host since spec/017 shipped it. Same shape as PR #34 (postgres
image-tag pin).

Patch bump (`unclassified_path_releasable_default` fires for
`examples/`); provenance-state fixture pinned in lock-step.

Verification on `core-ops-uat` (Fedora CoreOS guest):

$ podman secret create immich-db-password <(openssl rand -hex 16)
$ sudo core-ops apply --source-repo examples/03-immich --host example
  Outcome: converged
$ systemctl is-active immich-server                              active
$ systemctl show immich-server -p NRestarts                      NRestarts=1
$ journalctl -u immich-server | grep "Microservices is running"
  Immich Microservices is running [v2.7.5] [production]
$ core-ops plan --source-repo examples/03-immich --host example
  Summary: 10 unchanged

$ cargo test                                                     472 passed
$ cargo clippy --all-targets -- -D warnings                      clean
$ cargo run --bin core-ops-release -- validate --base-ref master passed (patch)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@outergod outergod merged commit 4aba6ae into master May 7, 2026
5 checks passed
@outergod outergod deleted the fix/immich-server-db-password branch May 7, 2026 07:01
outergod added a commit that referenced this pull request May 7, 2026


Rebases onto post-promote master at v2.2.3 (which now contains both
spec/018-blocking fixes: PR #35 wired the immich-db-password Podman
secret into immich-server.container, and PR #36 introduced
ApplyRunDisplayState::Stateless so stateless re-plans no longer
flag a healthy host as "(recovery from failed initial apply)").

Re-recording on `core-ops-uat` (Fedora CoreOS guest) against the
fixed binary + fixed example produces the truthful narrative the
spec/018 walkthrough is supposed to demonstrate:

  Beat 1 (plan):   Plan for host example @ (stateless) (first run)
  Beat 2 (apply):  Apply for host example @ (stateless) (first run)
                   Outcome: converged
  Beat 3 (replan): Plan for host example @ (stateless)
                   10 unchanged

immich-server reaches `active running` (NRestarts=1) and journal
shows "Immich Microservices is running [v2.7.5] [production]" —
no auth restart loop. The misleading
"(recovery from failed initial apply)" suffix is gone.

Cast post-processed to strip OSC 3008 sequences (pam_systemd
hostname/machineid leak under sudo) and replace nix-store-path
SHELL env value, per decision_018-recording-ssh-delegation. Cast
duration 4.80 s, ≤ 90 s SC-005a budget.

T013 README walkthrough block updated to keep verbatim fidelity to
the new T012 capture: the second fenced block's header and underline
now match the corrected output (no "(recovery)" suffix; underline
length adjusted from 72 chars to 35 chars to match the shorter title).
Block count = 2 (FR-006), combined non-blank lines = 25 (SC-007b),
unique unit identifiers = 5 (SC-007).

The rebase dropped the prior `chore(release): bump 2.2.1 -> 2.2.2`
commit (subsumed by master's PR #34/#35/#36 promotes); spec/018 now
bumps `2.2.3 -> 2.2.4` per `packaged_readme_surface` carve-out.

Verification:

$ cargo test                                                                  473 passed
$ cargo clippy --all-targets -- -D warnings                                   clean
$ cargo run --bin core-ops-release -- validate --base-ref master              passed (patch)
$ wc -l README.md                                                             297       (≤ 400)            PASS  SC-001
$ <walkthrough section>: 2 fenced blocks, 25 non-blank lines, 5 unit IDs       PASS  FR-006/SC-007/SC-007b
$ head -n 1 docs/onboarding.cast | jq '.version'                              2                            PASS  SC-005
$ head -n 1 docs/onboarding.cast | jq '.duration'                             4.80      (≤ 90)             PASS  SC-005a
$ grep -iE '(not\.one|ulthar|192\.168\.|10\.0\.|172\.16\.)' docs/onboarding.cast docs/onboarding-script.sh
                                                                              (no matches)                 PASS  SC-006a
$ grep -c '3008' docs/onboarding.cast                                         0                            PASS  FR-009a
$ asciinema play docs/onboarding.cast                                         plays end-to-end             PASS

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
outergod added a commit that referenced this pull request May 7, 2026
… playback

Operator reported staircase distortion in the rendered cast/GIF: each
successive line started indented by the length of the previous line,
the classic "LF without CR" pattern.

Root cause: the inner recording script piped SSH output through
`| sed 's/\r$//'`, which stripped every `\r` that the remote PTY's
ONLCR translation had added. The captured cast events therefore
contained bare `\n` (line feed only). asciinema 2.4.0 plays casts
through a raw-mode local TTY (so it can faithfully replay any
escape sequences without double-translation), which means LF stays
LF on output — the terminal treats each LF as "down one row" with
no column reset, hence the staircase.

The original sed filter was probably defensive — line-buffered tools
sometimes choke on trailing `\r` — but for asciinema-capture the CRs
are essential. Drop the filter and re-record.

After re-recording the plan-output event has 124 CRs and 124 LFs
(matched), confirming `\r\n` line terminators throughout. `asciinema
cat docs/onboarding.cast` now renders left-aligned. The GIF rendered
from this cast (`docs/assets/core-ops-demo.gif`, 107 KB GIF89a)
displays the correct terminal layout inline on GitHub.

Cast re-recorded against the same fixed stack (PRs #34/#35/#36
merged):

  Beat 1 (plan):   Plan for host example @ (stateless) (first run)
  Beat 2 (apply):  Apply for host example @ (stateless) (first run)
                   Outcome: converged
  Beat 3 (replan): Plan for host example @ (stateless)
                   10 unchanged

Same OSC 3008 strip + SHELL sanitize post-processing applied (per
decision_018-recording-ssh-delegation). Duration 4.88 s, well under
SC-005a's 90 s budget.

Verification:

$ head -n 1 docs/onboarding.cast | jq '.version'                  2
$ head -n 1 docs/onboarding.cast | jq '.duration'                 4.88012  (≤ 90)
$ grep -c '3008' docs/onboarding.cast                             0
$ grep -iE '(not\.one|ulthar|192\.168\.|10\.0\.|172\.16\.)' docs/onboarding.cast docs/onboarding-script.sh
                                                                  (no matches)
$ asciinema cat docs/onboarding.cast | head -3                    left-aligned, no staircase
$ wc -c < docs/assets/core-ops-demo.gif                           106777  (≤ 1 MB)

The non-tracked recording driver `/tmp/onboarding-inner.sh` carried
the buggy sed pipe in this session; the canonical
`docs/onboarding-script.sh` does NOT have a sed filter (its inner
`demo()` runs commands directly via `eval`, not through SSH
delegation). The bug therefore lives only in the SSH-delegation
recording procedure documented in
decision_018-recording-ssh-delegation, not in the canonical
regeneration script.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
outergod added a commit that referenced this pull request May 7, 2026


Rebases onto post-promote master at v2.2.3 (which now contains both
spec/018-blocking fixes: PR #35 wired the immich-db-password Podman
secret into immich-server.container, and PR #36 introduced
ApplyRunDisplayState::Stateless so stateless re-plans no longer
flag a healthy host as "(recovery from failed initial apply)").

Re-recording on `core-ops-uat` (Fedora CoreOS guest) against the
fixed binary + fixed example produces the truthful narrative the
spec/018 walkthrough is supposed to demonstrate:

  Beat 1 (plan):   Plan for host example @ (stateless) (first run)
  Beat 2 (apply):  Apply for host example @ (stateless) (first run)
                   Outcome: converged
  Beat 3 (replan): Plan for host example @ (stateless)
                   10 unchanged

immich-server reaches `active running` (NRestarts=1) and journal
shows "Immich Microservices is running [v2.7.5] [production]" —
no auth restart loop. The misleading
"(recovery from failed initial apply)" suffix is gone.

Cast post-processed to strip OSC 3008 sequences (pam_systemd
hostname/machineid leak under sudo) and replace nix-store-path
SHELL env value, per decision_018-recording-ssh-delegation. Cast
duration 4.80 s, ≤ 90 s SC-005a budget.

T013 README walkthrough block updated to keep verbatim fidelity to
the new T012 capture: the second fenced block's header and underline
now match the corrected output (no "(recovery)" suffix; underline
length adjusted from 72 chars to 35 chars to match the shorter title).
Block count = 2 (FR-006), combined non-blank lines = 25 (SC-007b),
unique unit identifiers = 5 (SC-007).

The rebase dropped the prior `chore(release): bump 2.2.1 -> 2.2.2`
commit (subsumed by master's PR #34/#35/#36 promotes); spec/018 now
bumps `2.2.3 -> 2.2.4` per `packaged_readme_surface` carve-out.

Verification:

$ cargo test                                                                  473 passed
$ cargo clippy --all-targets -- -D warnings                                   clean
$ cargo run --bin core-ops-release -- validate --base-ref master              passed (patch)
$ wc -l README.md                                                             297       (≤ 400)            PASS  SC-001
$ <walkthrough section>: 2 fenced blocks, 25 non-blank lines, 5 unit IDs       PASS  FR-006/SC-007/SC-007b
$ head -n 1 docs/onboarding.cast | jq '.version'                              2                            PASS  SC-005
$ head -n 1 docs/onboarding.cast | jq '.duration'                             4.80      (≤ 90)             PASS  SC-005a
$ grep -iE '(not\.one|ulthar|192\.168\.|10\.0\.|172\.16\.)' docs/onboarding.cast docs/onboarding-script.sh
                                                                              (no matches)                 PASS  SC-006a
$ grep -c '3008' docs/onboarding.cast                                         0                            PASS  FR-009a
$ asciinema play docs/onboarding.cast                                         plays end-to-end             PASS

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
outergod added a commit that referenced this pull request May 7, 2026
… playback

Operator reported staircase distortion in the rendered cast/GIF: each
successive line started indented by the length of the previous line,
the classic "LF without CR" pattern.

Root cause: the inner recording script piped SSH output through
`| sed 's/\r$//'`, which stripped every `\r` that the remote PTY's
ONLCR translation had added. The captured cast events therefore
contained bare `\n` (line feed only). asciinema 2.4.0 plays casts
through a raw-mode local TTY (so it can faithfully replay any
escape sequences without double-translation), which means LF stays
LF on output — the terminal treats each LF as "down one row" with
no column reset, hence the staircase.

The original sed filter was probably defensive — line-buffered tools
sometimes choke on trailing `\r` — but for asciinema-capture the CRs
are essential. Drop the filter and re-record.

After re-recording the plan-output event has 124 CRs and 124 LFs
(matched), confirming `\r\n` line terminators throughout. `asciinema
cat docs/onboarding.cast` now renders left-aligned. The GIF rendered
from this cast (`docs/assets/core-ops-demo.gif`, 107 KB GIF89a)
displays the correct terminal layout inline on GitHub.

Cast re-recorded against the same fixed stack (PRs #34/#35/#36
merged):

  Beat 1 (plan):   Plan for host example @ (stateless) (first run)
  Beat 2 (apply):  Apply for host example @ (stateless) (first run)
                   Outcome: converged
  Beat 3 (replan): Plan for host example @ (stateless)
                   10 unchanged

Same OSC 3008 strip + SHELL sanitize post-processing applied (per
decision_018-recording-ssh-delegation). Duration 4.88 s, well under
SC-005a's 90 s budget.

Verification:

$ head -n 1 docs/onboarding.cast | jq '.version'                  2
$ head -n 1 docs/onboarding.cast | jq '.duration'                 4.88012  (≤ 90)
$ grep -c '3008' docs/onboarding.cast                             0
$ grep -iE '(not\.one|ulthar|192\.168\.|10\.0\.|172\.16\.)' docs/onboarding.cast docs/onboarding-script.sh
                                                                  (no matches)
$ asciinema cat docs/onboarding.cast | head -3                    left-aligned, no staircase
$ wc -c < docs/assets/core-ops-demo.gif                           106777  (≤ 1 MB)

The non-tracked recording driver `/tmp/onboarding-inner.sh` carried
the buggy sed pipe in this session; the canonical
`docs/onboarding-script.sh` does NOT have a sed filter (its inner
`demo()` runs commands directly via `eval`, not through SSH
delegation). The bug therefore lives only in the SSH-delegation
recording procedure documented in
decision_018-recording-ssh-delegation, not in the canonical
regeneration script.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant