Skip to content

fix: Tag platform images via ECR API to fix wrapper bug in private ECR#759

Open
wangzlei wants to merge 1 commit into
aws-observability:mainfrom
wangzlei:fix/ecr-api-retag-platform-images
Open

fix: Tag platform images via ECR API to fix wrapper bug in private ECR#759
wangzlei wants to merge 1 commit into
aws-observability:mainfrom
wangzlei:fix/ecr-api-retag-platform-images

Conversation

@wangzlei
Copy link
Copy Markdown
Contributor

Summary

  • Removes the two single-platform build/push steps that produced wrapper-index tags instead of real platform manifests
  • Keeps the single multi-arch docker/build-push-action push to both private and public ECR
  • Adds an ECR-API retag step that reads the multi-arch manifest index and applies v$VERSION-amd64 / v$VERSION-arm64 directly to the platform digests via aws ecr put-image
  • Tags attestation manifests (architecture=unknown) with their sha hex

This is the same fix applied in:

Test plan

  • Trigger a release build and verify v$VERSION-amd64 / v$VERSION-arm64 tags in private ECR point to platform manifests (not wrapper indexes)
  • Verify v$VERSION multi-arch tag still works correctly in both public and private ECR
  • Verify attestation manifests are tagged with their sha hex

🤖 Generated with Claude Code

The release workflow was producing $VERSION-amd64 and $VERSION-arm64 tags
that pointed at single-entry image indexes (wrappers) instead of the
platform image manifests, because docker/build-push-action's default
buildx behavior bundles each push with provenance attestations inside
an index.

This change:
- Removes the per-platform build/push steps (the source of the wrapper tags)
- Adds an ECR-API retag step that reads the multi-arch manifest index and
  applies $VERSION-amd64 / $VERSION-arm64 directly to the platform digests
  via aws ecr put-image
- Tags attestation manifests by their sha hex

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wangzlei wangzlei requested a review from a team as a code owner May 29, 2026 22:02
@wangzlei wangzlei enabled auto-merge (squash) May 29, 2026 22:07
@wangzlei wangzlei added the skip changelog doesn't need a CHANGELOG entry label May 29, 2026
Copy link
Copy Markdown
Contributor

@viw-test1 viw-test1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good

echo "ERROR: failed to fetch ${arch} platform manifest at ${DIGEST}"
exit 1
fi
aws ecr put-image \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aws ecr put-image will fail with ImageAlreadyExistsException if this workflow is re-run for the same version (e.g., after a transient failure in a later step like publish-layer-prod). Consider adding || true or catching this specific error to make the step idempotent:

aws ecr put-image \
  --repository-name "${REPO_NAME}" \
  --image-tag "v${VERSION}-${arch}" \
  --image-manifest "$MANIFEST" \
  --region "${REGION}" > /dev/null 2>&1 || {
    echo "Tag v${VERSION}-${arch} may already exist (expected on re-runs)"
  }

The same applies to the attestation put-image call below.

${{ env.RELEASE_PUBLIC_REPOSITORY }}:v${{ env.VERSION }}
${{ env.RELEASE_PRIVATE_REPOSITORY }}:v${{ env.VERSION }}

- name: Tag platform images in private ECR
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repository name is hardcoded here, duplicating the suffix of RELEASE_PRIVATE_REPOSITORY (020628701572.dkr.ecr.us-west-2.amazonaws.com/adot-autoinstrumentation-python). If the repository name in that env var ever changes, this would silently diverge.

Consider deriving it from the existing env var to keep a single source of truth:

REPO_NAME: ${{ env.RELEASE_PRIVATE_REPOSITORY && ... }}

Or extract it in the script:

REPO_NAME="${RELEASE_PRIVATE_REPOSITORY##*/}"

exit 1
fi

# Tag platform images (amd64, arm64) with v$VERSION-$arch
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This jq filter could return multiple lines if a manifest index contains more than one entry for the same architecture (e.g., image variants or platform features). If that happens, DIGEST would be a multi-line string and the subsequent batch-get-image call would fail with a confusing error.

Consider adding | first (jq 1.6+) or | limit(1; ...) to ensure a single value, or validating the result:

DIGEST=$(echo "$MANIFEST_INDEX" | jq -r --arg a "$arch" '[.manifests[] | select(.platform.architecture == $a)] | first | .digest')

Low risk for this project in practice, but worth hardening.

REPO_NAME: adot-autoinstrumentation-python
VERSION: ${{ env.VERSION }}
REGION: ${{ env.AWS_PRIVATE_ECR_REGION }}
run: |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AWS credentials were last configured for public ECR (region us-east-1, line 119-122) before this step runs. The batch-get-image and put-image calls target private ECR in us-west-2. While this step passes --region explicitly to each AWS CLI call (which handles the region correctly), the IAM role assumed at line 119 is for public ECR access.

The docker/build-push-action step likely succeeds because Docker auth tokens from both login steps are still cached. However, the ECR API calls here need IAM permissions on the private registry (020628701572.dkr.ecr.us-west-2).

Please verify that the role assumed at "Configure AWS credentials for public ECR" (line 119) has ecr:BatchGetImage and ecr:PutImage permissions on the private repository. If it does not, you will need to re-assume the private ECR role before this step or reorder the credential configuration steps so the private ECR credentials are active when this step runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip changelog doesn't need a CHANGELOG entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants