fix: drop head_object precheck in uploads migration by dosaki · Pull Request #3 · TheKeyholdingCompany/tkc-library

dosaki · 2026-05-04T08:11:00Z

What broke

PR #1 deployed successfully through Terraform but the new ECS tasks crashed on startup with:

botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

This came from the head_object idempotency precheck in scripts/migrate_uploads_to_s3.py, which I added to skip already-uploaded files. The precheck needs s3:GetObject to behave correctly — without it, S3 returns 403 Forbidden (not 404) for missing objects, because S3 refuses to disclose existence to unauthorized callers. Our except ClientError branch only mapped 404/NoSuchKey/NotFound to "doesn't exist", so the 403 was re-raised and crashed the script. ECS retried, hit the same crash, and never reached steady state — aws ecs wait services-stable timed out at 10 minutes.

What this fixes

Drops the precheck entirely. PutObject is idempotent for the same body, so re-running on every container boot just overwrites with the same bytes — no functional difference vs. the precheck version, and no extra IAM permission needed. About 24 redundant PUTs per cold boot until cleanup PR #2 removes the migration script altogether; in-region S3 PUTs are fast and free.

The alternative would have been to grant s3:GetObject on the bucket prefix to the ECS task role — but that broadens permissions for a check whose only purpose is to avoid those redundant PUTs, and it would have required a Terraform round-trip to apply. This change is one Python file, ships in the same deploy.yml run, and gets us out of the failed-deploy state immediately.

Current prod state

Old tasks still serving (ECS keeps them running because deployment_minimum_healthy_percent = 50 blocks drain when new tasks fail health checks). App is degraded-but-functional.
New task definition exists with the new env vars; new image is in ECR; bucket and IAM are applied.
DB rows: still bare filenames (migration crashed before reaching the DB phase).
S3 bucket: empty (migration crashed before uploading anything).

After this merges and deploy.yml re-runs:

ECS rolls new tasks → first new task boots → flask db upgrade (no-op) → python scripts/migrate_uploads_to_s3.py PUTs all 24 files and prefixes the 24 DB rows → gunicorn starts → health check passes → ALB shifts traffic.
Old tasks drain. Brief rollout window where old tasks may 404 covers (DB rows now prefixed, old image doesn't have the legacy shim because the legacy shim is in the new code only).

Test plan

After merge, watch the deploy.yml log for == Migrating uploads to S3 [LIVE] == followed by Done. Uploaded 24 object(s); updated 24 row(s).
aws ecs wait services-stable succeeds.
Spot-check 2-3 book covers render in prod UI (now serving from S3).
Open PR chore: remove disk-upload remnants after S3 cutover #2 for review and merge it once verified.

🤖 Generated with Claude Code

The ECS task role is scoped to s3:PutObject only. HeadObject without GetObject returns 403 (S3 hides existence from unauthorized callers), which the existing handler doesn't translate to "doesn't exist", so the script crashed on container start and ECS never reached steady state. Drop the precheck — PutObject is idempotent for the same body, so re-runs just overwrite with the same bytes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Removes the S3 HeadObject idempotency precheck from the one-shot uploads migration script to prevent ECS startup crashes when the task role lacks s3:GetObject (where HeadObject yields 403 rather than a distinguishable “missing object” signal).

Changes:

Drops the _exists() helper and associated head_object logic so uploads always proceed via PutObject.
Removes the unused ClientError import.
Updates inline/documentation text to reflect the new overwrite-based idempotency approach.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The "Runs once per environment" wording was true when the script was intended to run manually, but with entrypoint.sh invoking it on every container boot it just misleads. Note that PutObject re-runs are cheap in-region overwrites until the file is removed in the cleanup PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 4, 2026 08:11

Copilot started reviewing on behalf of dosaki May 4, 2026 08:11 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

Comment thread scripts/migrate_uploads_to_s3.py Outdated

dosaki merged commit dd320f4 into main May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: drop head_object precheck in uploads migration#3

fix: drop head_object precheck in uploads migration#3
dosaki merged 2 commits into
mainfrom
fix/s3-migration-permissions

dosaki commented May 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dosaki commented May 4, 2026

What broke

What this fixes

Current prod state

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants