Skip to content

feat(providers): AWS STS AssumeRole refresh strategy and aws-s3 profile#1782

Draft
russellb wants to merge 12 commits into
NVIDIA:mainfrom
russellb:feat/1576-aws-sts-with-sigv4
Draft

feat(providers): AWS STS AssumeRole refresh strategy and aws-s3 profile#1782
russellb wants to merge 12 commits into
NVIDIA:mainfrom
russellb:feat/1576-aws-sts-with-sigv4

Conversation

@russellb
Copy link
Copy Markdown
Contributor

@russellb russellb commented Jun 5, 2026

Summary

Add gateway-owned AWS STS AssumeRole as a credential refresh strategy, and ship an aws-s3 provider profile with SigV4 signing pre-configured.

Prerequisite: #1638 must merge first — this branch includes those commits as a base.

Related Issue

Refs #1576

Changes

STS credential refresh (openshell-server):

  • aws_sts_assume_role refresh strategy (v2 provider API only)
  • Gateway calls sts:AssumeRole and writes three coupled credentials (AccessKeyId, SecretAccessKey, SessionToken) atomically via an extended multi-key MintedCredential
  • ConfigureProviderRefresh rejects aws_sts_assume_role when providers_v2_enabled is false
  • Multi-key collision validation at both configure-time and mint-time

Provider profiles (openshell-providers):

  • aws.yaml — generic AWS profile (credentials only)
  • aws-s3.yaml — S3 profile with **.amazonaws.com host glob, tls: terminate, credential_signing: sigv4, signing_service: s3, and binary allowlist for common S3 clients
  • Wire credential_signing and signing_service through EndpointProfile serde and proto conversion

TUI (openshell-tui):

  • Handle AwsStsAssumeRole in refresh strategy label (rebase fix)

Documentation:

  • docs/sandboxes/manage-providers.mdx: chunked transfer limitation note
  • examples/aws-s3-sts.md: end-to-end manual test guide for S3 access with STS

Testing

  • mise run pre-commit passes (except pre-existing python:proto failure on main)
  • Unit tests for STS refresh, multi-key collision validation, profile round-trip
  • End-to-end manual test: Podman sandbox → boto3 → S3 PUT/GET/LIST with STS-minted credentials and proxy-side SigV4 re-signing (all passed)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)

jhjaggars and others added 11 commits June 5, 2026 12:06
…nnels

Add proxy-side AWS SigV4 re-signing so sandbox clients can reach AWS
services (Bedrock) through the CONNECT tunnel using placeholder
credentials. The proxy strips the invalid signature, resolves real
credentials from the SecretResolver, re-signs with the aws-sigv4 crate,
and forwards. Configuration is policy-driven via two new fields
(credential_signing, signing_service).
Policy YAML example:
    credential_signing: sigv4
    signing_service: bedrock
Implementation:
- sigv4.rs: strip_aws_headers removes old auth headers before the
  fail-closed placeholder scan; apply_sigv4_to_request re-signs using
  the aws-sigv4 SDK with PayloadChecksumKind::XAmzSha256 enabled.
  Returns Result instead of panicking. Non-signed headers (Accept,
  User-Agent, etc.) are preserved in the output.
- rest.rs: SigV4 path buffers body (capped at MAX_REWRITE_BODY_BYTES)
  for signing, then forwards the re-signed request upstream.
- Proto: credential_signing (field 19), signing_service (field 20)
  on NetworkEndpoint.
- Policy/OPA: plumbed through serde, proto conversion, and Rego data.
- Supports AWS session tokens (STS temporary credentials).
- Integration test against real Bedrock (ignored, requires AWS creds).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reject policies where credential_signing is set but signing_service is
empty during validate_sandbox_policy() instead of failing at connection
time. The runtime check in rest.rs is kept as defense-in-depth.
Extend the SigV4 proxy re-signing to auto-detect the correct payload
signing mode from the client SDK's x-amz-content-sha256 header:
- Hex hash → buffer body and include hash in signature (Bedrock)
- STREAMING-UNSIGNED-PAYLOAD-TRAILER → sign headers only, stream body
  through for aws-chunked uploads (S3 PutObject, upload_fileobj)
- UNSIGNED-PAYLOAD → sign headers only, no body buffering (S3 over HTTPS)
- Absent → fall back to Content-Length heuristic
This eliminates the need for body buffering on S3 uploads and adds
support for chunked transfer encoding that the previous implementation
could not handle.
New credential_signing policy values:
- sigv4       — auto-detect from client headers (recommended)
- sigv4:body  — always buffer and hash the body
- sigv4:no_body — always use UNSIGNED-PAYLOAD
Also adds Expect: 100-continue handling in the REST L7 relay so clients
like boto3's S3 PutObject receive the interim 100 response before
sending the body.
Validated end-to-end from inside a Podman sandbox against real AWS:
Bedrock InvokeModel, S3 PUT/GET/DELETE, and streaming upload_fileobj.
Critical:
- Scope Expect: 100-continue handling to SigV4 paths only. Previously
  it fired for all L7-proxied requests, violating RFC 7231 §5.1.1 and
  risking double 100 responses on non-SigV4 traffic.
Warnings:
- Reject unknown credential_signing values at policy validation time.
  A typo like "sigv4_typo" now produces a clear PolicyViolation instead
  of silently falling back to no signing.
- Support dualstack, FIPS, virtual-hosted, and China partition hostnames
  in extract_aws_region (e.g. s3.dualstack.us-west-2.amazonaws.com,
  s3.cn-north-1.amazonaws.com.cn).
- Emit OCSF NetworkActivity event for SigV4 re-signing decisions
  instead of debug! tracing, per AGENTS.md structured logging guidelines.
- Update architecture/sandbox.md to document all three signing modes
  (signed body, streaming unsigned trailer, unsigned payload) and the
  auto-detection mechanism.
…OCSF nit

- Fix extract_aws_region for FIPS+dualstack combo hostnames like
  s3-fips.dualstack.us-west-2.amazonaws.com (scans past all "dualstack"
  labels instead of just one).
- Add tests for FIPS+dualstack and GovCloud region extraction.
- Add unit test for UnknownCredentialSigning policy validation
  (e.g. "sigv4_typo" produces the expected violation).
- Use ActivityId::Traffic instead of ActivityId::Other for the SigV4
  OCSF event — more descriptive for a signing operation on existing
  traffic flow.
…TO, startup validation

Critical:
- Reject STREAMING-AWS4-HMAC-SHA256-PAYLOAD in detect_payload_mode()
  instead of silently treating it as SignBody (per-chunk signing is not
  supported). Returns a clear error directing the user to sigv4:no_body.
- Add defense-in-depth guard in the SignBody path: fail closed if the
  request uses chunked transfer encoding, preventing body-less forwards.
Warnings:
- Wire credential_signing and signing_service through EndpointProfile
  DTO in openshell-providers. Both endpoint_to_proto() and
  endpoint_from_proto() now preserve the fields during round-trip.
- Reject unknown credential_signing values at sandbox L7 config parse
  time (returns None, disabling L7 for the endpoint) instead of
  silently downgrading to CredentialSigning::None. Also reject SigV4
  modes with empty signing_service at startup rather than deferring
  the error to request time.
Add gateway-owned AWS STS AssumeRole as a new credential refresh
strategy. The gateway calls sts:AssumeRole and writes three coupled
credentials (AccessKeyId, SecretAccessKey, SessionToken) atomically
to the provider record via an extended multi-key MintedCredential.

Scoped to provider v2 only — ConfigureProviderRefresh rejects
aws_sts_assume_role when providers_v2_enabled is false.

Ships two built-in profiles: generic 'aws' (credentials only) and
'aws-s3' (with S3 endpoint policy rules). Multi-key collision
validation runs at both configure-time and mint-time.

Refs NVIDIA#1576
Wire credential_signing and signing_service through EndpointProfile
serde and proto conversion so provider profiles can enable SigV4
signing. Configure aws-s3 profile with **.amazonaws.com host glob,
tls: terminate, credential_signing: sigv4, signing_service: s3,
and a binary allowlist for common S3 clients.

Add chunked transfer limitation to manage-providers.mdx. Add
examples/aws-s3-sts.md manual E2E test guide.

Refs NVIDIA#1576
Boto3 connects to global S3 endpoints like
bucket.s3.amazonaws.com (no region in the hostname). The previous
extract_aws_region returned "s3" for this pattern because it took
the label at parts[len-3] without checking if it was actually a
region.

Add looks_like_region() which requires a hyphen followed by a digit
(e.g., us-east-1). Service names like "s3" or "bedrock-runtime" are
rejected, causing the fallback to us-east-1.

Refs NVIDIA#1576
Boto3 put_object sends x-amz-content-sha256 with the value
STREAMING-AWS4-HMAC-SHA256-PAYLOAD, which was rejected by
detect_payload_mode() because per-chunk signing is not supported.

Treat all streaming- variants as StreamingUnsignedTrailer: re-sign
headers only and stream the body through. The proxy cannot reproduce
per-chunk signatures, but AWS accepts unsigned streaming payloads
over HTTPS.

Refs NVIDIA#1576
Handle the new ProviderCredentialRefreshStrategy::AwsStsAssumeRole
variant added by the SigV4 feature branch after rebase onto main
introduced the exhaustive match in openshell-tui.
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 5, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Comment thread providers/aws-s3.yaml Outdated
…tighten aws-s3 endpoints

Allow `*` as an entire middle DNS label in host wildcard patterns (e.g.
`*.s3.*.amazonaws.com`) while rejecting partial middle-label wildcards
(`us-*`) and middle-label `**`. This enables S3-specific regional
endpoint shapes without the overly broad `**.amazonaws.com`.

- Add `InvalidHostWildcard` policy violation and `host_wildcard_shape_invalid`
  validator in openshell-policy
- Update L7 `validate_host_wildcard` to accept whole-label `*` in middle
  positions while still rejecting `**` and partial wildcards
- Add OPA runtime tests confirming `*` matches exactly one DNS label and
  does not cross label boundaries (dualstack, missing bucket prefix)
- Tighten aws-s3.yaml from `**.amazonaws.com` to four S3-specific shapes:
  `*.s3.*.amazonaws.com`, `s3.*.amazonaws.com`,
  `*.s3.dualstack.*.amazonaws.com`, `s3.dualstack.*.amazonaws.com`
- Pin profile test to assert the old broad endpoint is gone
- Update architecture/security-policy.md wildcard table

Signed-off-by: Russell Bryant <rbryant@nvidia.com>
Comment thread providers/aws-s3.yaml
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question (non-blocking): What's the use-case for a dedicated S3 provider over just using the AWS provider?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pre-defined endpoints list, mainly. and it's just what I was focused on as a test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants