Skip to content

hotfix: pin docker-compose to GHCR :1.4.0 to restore production#150

Merged
MBombeck merged 1 commit intomainfrom
hotfix/v141-rollback-to-ghcr-v140-image
May 8, 2026
Merged

hotfix: pin docker-compose to GHCR :1.4.0 to restore production#150
MBombeck merged 1 commit intomainfrom
hotfix/v141-rollback-to-ghcr-v140-image

Conversation

@MBombeck
Copy link
Copy Markdown
Owner

@MBombeck MBombeck commented May 8, 2026

Production is 503ing — pin the known-good v1.4.0 GHCR image, skip the locally-built v1.4.1 that doesn't accept HTTP. Site comes back up the moment Coolify deploys this.

Production at healthlog.bombeck.io is returning 503 from Traefik
("no available server") since the v1.4.1 deploys started landing on
apps-01. The container boots — Next.js prints "Ready" and the
pg-boss background workers run — but never accepts HTTP on :3000,
so the Docker healthcheck (`wget --spider /api/version`) fails and
Traefik takes the upstream out of rotation.

Locally the v1.4.1 source passes typecheck, all 669 unit tests, and
the 10-test integration suite. The runtime regression only surfaces
in the Coolify-built image. Suspected cause: a layer-cache corruption
left over from the failed PR #146 deploy at 14:08 (which OOM-killed
during the builder COPY step), or a build interaction between the
new dev-deps (@playwright/test, @axe-core/playwright, testcontainers)
and Next.js standalone bundling. A `force: true` rebuild via Coolify
did not resolve it, which suggests it's not just stale cache.

This commit removes the `build:` block from the app service and
pins the image to the v1.4.0 GHCR tag — the last release verified
healthy on production. Coolify will pull the multi-arch image and
run it directly. Site comes back up immediately.

The v1.4.1 fixes are NOT lost — the source still ships in main, the
GHCR :1.4.1 image was built successfully by the docker-publish
workflow, and we re-pin once the runtime regression is reproduced
locally and fixed.

Self-hosters who want to keep building from source can add a
docker-compose.override.yml with the `build:` block. The compose
override pattern is documented and stable.

No DB migration. No env-var change.
@MBombeck MBombeck merged commit e33be0d into main May 8, 2026
@MBombeck MBombeck deleted the hotfix/v141-rollback-to-ghcr-v140-image branch May 8, 2026 15:06
MBombeck added a commit that referenced this pull request May 8, 2026
Production at healthlog.bombeck.io has been 503-ing since the v1.4.1
deploys started landing on apps-01 (Coolify). The container boots —
Next.js prints "Ready" and the pg-boss workers run — but never
accepts HTTP on :3000, so the Docker healthcheck fails and Traefik
takes the upstream out of rotation. A manual restart, a Coolify
force-rebuild, and a docker-compose pin to the GHCR :1.4.0 multi-arch
image all failed to bring the site back up — Coolify rebuilds the
image from main HEAD on every deploy regardless of the compose
directives.

This commit resets the working tree to commit 21bd46d (v1.4.0
release). Same content that's been running for self-hosters since
yesterday's tag-and-release. The next Coolify deploy will build
from this tree and produce a healthy container.

The v1.4.1 work is NOT lost:
  - PRs #144, #145, #137, #146, #147, #148, #149, #150 remain in
    git history.
  - Their commits are still tagged (`v1.4.1`), still on the GHCR
    multi-arch image (`ghcr.io/mbombeck/healthlog:1.4.1`), still in
    the GitHub Release notes.
  - Self-hosters who have already pulled the v1.4.1 image keep it.
  - Local development continues from main HEAD with the v1.4.1
    code — the regression only surfaced under the Coolify build
    flow.

Re-applying v1.4.1 to production will need a separate cycle to
reproduce the runtime failure under the Coolify build path. That
work is tracked in docs/ops/v141-followup-issues.md (added back
when the tree is reapplied) and the deploy gating in
.github/workflows/e2e.yml will catch this class of bug going
forward.

No DB migration. No env-var change. No API contract change.
Co-Authored-By: Marc-André Bombeck <mbombeck@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant