Skip to content

fix(aws-s3-bucket): replace enumerated action list with s3:* in bucket mgmt policy#7

Open
jcastiarena wants to merge 1 commit into
mainfrom
fix/s3-policy-include-bucket-read-actions
Open

fix(aws-s3-bucket): replace enumerated action list with s3:* in bucket mgmt policy#7
jcastiarena wants to merge 1 commit into
mainfrom
fix/s3-policy-include-bucket-read-actions

Conversation

@jcastiarena

Copy link
Copy Markdown

Summary

aws-s3-bucket/requirements/main.tf enumerates ~22 specific s3:* actions in nullplatform_s3_policy. The AWS Terraform provider refreshes aws_s3_bucket by reading a wide surface of bucket attributes (ACL, CORS, Logging, Lifecycle, Replication, OwnershipControls, Website, Notification, AccelerateConfiguration, RequestPayment, ObjectLock, ...). Each time the provider gains a new refreshed attribute, the enumerated list breaks with AccessDenied on the missing Get* action.

This replaces the enumerated list with Action = "s3:*". Resource stays "*" to preserve the current blast radius — no scope tightening, just the action-list fix.

Why this matters (two observed incidents)

1. Galicia POC (provider v6.x, April 2026)

First smoke-test of the service against a provider v6.x tenant aborted with:

Error: reading S3 Bucket (np-svc-e898cde8-…-smoketest01) ACL:
operation error S3: GetBucketAcl, https response error StatusCode: 403
User: arn:aws:sts::<acct>:assumed-role/<agent-role>/<session> is not
authorized to perform: s3:GetBucketAcl on resource: "…"

s3:GetBucketAcl is not in the current enumerated list. The tenant patched locally by replacing the enumerated list with s3:* and then resuming the create workflow via a notification resend — the same service instance recovered without losing the tfstate bucket that was already created pre-failure.

2. nullplatform-implementations/implementation-aws (prior incident, predates this PR)

The equivalent inline policy in that tenant carries a comment that documents the same class of error:

The AWS provider (v6+) refreshes aws_s3_bucket by reading a wide surface of bucket attributes (ACL, CORS, Logging, Lifecycle, Replication, etc.). Enumerating each s3:Get* action is brittle, so we grant s3:* scoped to the same wildcard resource already chosen. If a tighter scope is ever needed, narrow the Resource (e.g. arn:aws:s3:::np-*) rather than the Action list.

That tenant arrived at the same fix independently. This PR lifts the fix upstream so the next tenant that adopts services-s-3 on a current AWS provider doesn't have to rediscover it.

Why not also narrow Resource to np-*

An earlier draft of this PR narrowed Resource from "*" to arn:aws:s3:::np-* under the rationale that scripts/aws/build_context guarantees every service-created bucket starts with np- (INSTANCE_NAME="np-$(sanitize $SERVICE_NAME)", tfstate buckets np-service-$SERVICE_ID). That would be a strictly tighter blast radius.

Pulled out of this PR because it changes behavior for tenants who might be relying on the current "*" scope — e.g. manually setting service.attributes.bucket_name to an imported bucket with a non-np-* name (the attribute is editableOn: [] in the spec but the API doesn't enforce it). Upstream modules should be flexible by default and restrictive by tenant override, not the other way around. Tenants wanting the narrower scope can attach the policy via a wrapper that overrides Resource, or inline the statements (implementation-aws + galicia-banco both do this).

A separate follow-up PR could add a var.bucket_arn_patterns with default = ["*"] so tenants can narrow without forking, if there's appetite upstream.

Out of scope (intentionally)

  • nullplatform_s3_iam_policy is unaffected: it only manages IAM users/keys, no aws_s3_bucket refresh surface.
  • nullplatform_s3_tfstate_policy is unaffected: tfstate buckets are created via the aws s3api create-bucket CLI in build_context, and the OpenTofu s3 backend does Object-level I/O only. No aws_s3_bucket resource refresh happens on this surface, so the enumerated list remains sufficient.

Diff

+17 / -24 on a single file (aws-s3-bucket/requirements/main.tf). The enumerated action list (22 items + ListAllMyBuckets) collapses to "s3:*", and a comment block above the resource records the why + the observed failure for the next maintainer.

Test plan

  • Local verification on the Galicia tenant (nullplatform-implementations/galicia-banco): after applying the equivalent change locally, the previously-failed service instance recovered on a notification resend. Bucket state: versioning Enabled, SSEAlgorithm=AES256, PublicAccessBlock all true. Service status active. Attributes bucket_name / bucket_arn / bucket_region populated.
  • Fresh apply on a newly-provisioned tenant using this branch as the source — suggested for maintainer before merge if CI doesn't exercise this path.

🤖 Generated with Claude Code

…t mgmt policy

The agent IAM policy `nullplatform_s3_policy` enumerated ~22 specific s3
actions. The AWS Terraform provider refreshes `aws_s3_bucket` by reading a
wide surface of bucket attributes (ACL, CORS, Logging, Lifecycle,
Replication, OwnershipControls, Website, Notification,
AccelerateConfiguration, RequestPayment, ObjectLock, ...). Each time the
provider gains a new refreshed attribute, the enumerated list breaks with
AccessDenied on the missing Get* action.

Observed failure on provider v6.x:

  Error: reading S3 Bucket (...) ACL: operation error S3: GetBucketAcl,
  https response error StatusCode: 403, ...
  User: arn:aws:sts::<acct>:assumed-role/<agent-role>/<session> is not
  authorized to perform: s3:GetBucketAcl on resource: "..."

`s3:GetBucketAcl` is not in the current enumerated list, and neither are
several other attributes the provider reads during refresh.

Fix: replace the enumerated list with `s3:*`. Resource stays `"*"` to
preserve current behavior — tenants wanting a tighter blast radius can
attach this policy via a wrapper that overrides the Resource list to a
known prefix (the service's default naming is `np-<sanitized-name>-<suffix>`
for user buckets and `np-service-<SERVICE_ID>` for tfstate buckets — both
under `np-*`), which is the approach documented in the
`nullplatform-implementations/implementation-aws` tenant comment on the
equivalent inline policy.

`nullplatform_s3_iam_policy` and `nullplatform_s3_tfstate_policy` are
unaffected: IAM users operate through Object-level API only, and tfstate
buckets are created via the `aws s3api create-bucket` CLI plus the OpenTofu
s3 backend's Object-level I/O — neither triggers an `aws_s3_bucket`
resource refresh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant