fix: drop head_object precheck in uploads migration#3
Merged
Conversation
The ECS task role is scoped to s3:PutObject only. HeadObject without GetObject returns 403 (S3 hides existence from unauthorized callers), which the existing handler doesn't translate to "doesn't exist", so the script crashed on container start and ECS never reached steady state. Drop the precheck — PutObject is idempotent for the same body, so re-runs just overwrite with the same bytes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Removes the S3 HeadObject idempotency precheck from the one-shot uploads migration script to prevent ECS startup crashes when the task role lacks s3:GetObject (where HeadObject yields 403 rather than a distinguishable “missing object” signal).
Changes:
- Drops the
_exists()helper and associatedhead_objectlogic so uploads always proceed viaPutObject. - Removes the unused
ClientErrorimport. - Updates inline/documentation text to reflect the new overwrite-based idempotency approach.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The "Runs once per environment" wording was true when the script was intended to run manually, but with entrypoint.sh invoking it on every container boot it just misleads. Note that PutObject re-runs are cheap in-region overwrites until the file is removed in the cleanup PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What broke
PR #1 deployed successfully through Terraform but the new ECS tasks crashed on startup with:
This came from the
head_objectidempotency precheck inscripts/migrate_uploads_to_s3.py, which I added to skip already-uploaded files. The precheck needss3:GetObjectto behave correctly — without it, S3 returns 403 Forbidden (not 404) for missing objects, because S3 refuses to disclose existence to unauthorized callers. Ourexcept ClientErrorbranch only mapped404/NoSuchKey/NotFoundto "doesn't exist", so the 403 was re-raised and crashed the script. ECS retried, hit the same crash, and never reached steady state —aws ecs wait services-stabletimed out at 10 minutes.What this fixes
Drops the precheck entirely.
PutObjectis idempotent for the same body, so re-running on every container boot just overwrites with the same bytes — no functional difference vs. the precheck version, and no extra IAM permission needed. About 24 redundant PUTs per cold boot until cleanup PR #2 removes the migration script altogether; in-region S3 PUTs are fast and free.The alternative would have been to grant
s3:GetObjecton the bucket prefix to the ECS task role — but that broadens permissions for a check whose only purpose is to avoid those redundant PUTs, and it would have required a Terraform round-trip to apply. This change is one Python file, ships in the samedeploy.ymlrun, and gets us out of the failed-deploy state immediately.Current prod state
deployment_minimum_healthy_percent = 50blocks drain when new tasks fail health checks). App is degraded-but-functional.After this merges and
deploy.ymlre-runs:flask db upgrade(no-op) →python scripts/migrate_uploads_to_s3.pyPUTs all 24 files and prefixes the 24 DB rows → gunicorn starts → health check passes → ALB shifts traffic.Test plan
deploy.ymllog for== Migrating uploads to S3 [LIVE] ==followed byDone. Uploaded 24 object(s); updated 24 row(s).aws ecs wait services-stablesucceeds.🤖 Generated with Claude Code