diff --git a/README.md b/README.md index 354c37c..d2d282c 100644 --- a/README.md +++ b/README.md @@ -288,6 +288,67 @@ LOG_RETENTION_DAYS: 30 PARAMETERS_STRATEGY: "env" # or "secretsmanager" ``` +### Placeholder Image (Scope Bootstrap) + +When a scope is created, the Lambda function and its IAM role must exist **before** +the first real deployment — otherwise aliases, networking, and IAM have nothing to +attach to. To bootstrap this, `create-scope` provisions a throwaway **placeholder** +function that the first deployment then overwrites with the real code. + +How the placeholder is sourced depends on the scope's **package type**: + +- **Zip** — fully self-contained. A minimal handler ships pre-built and + base64-encoded in the repo (`scope/placeholder/placeholder_lambda.zip.b64`) and is + used automatically. **No configuration needed.** +- **Image** — the placeholder must be a container image, and this is where + `PLACEHOLDER_IMAGE_URI_DEFAULT` comes in. + +#### Why `PLACEHOLDER_IMAGE_URI_DEFAULT` is needed for Image scopes + +A Lambda function with `PackageType=Image` can only pull from a **private ECR +repository in the same account and region** — Lambda rejects `public.ecr.aws` +images at function-creation time. The built-in default in +`scope/scripts/resolve_placeholder_image` points at a public image +(`public.ecr.aws/nullplatform/aws-lambda/nullplatform-lambda-placeholder:latest`), +which is fine to *validate* but cannot actually back a real Lambda function. + +So for Image-based scopes you **must** mirror a placeholder into your own private +ECR and point the scope at it. The image must also be **single-arch matching the +scope architecture** (`-amd64` for `x86_64`, `-arm64` for `arm64`) — Lambda does +not accept multi-arch manifest lists. + +#### Resolution precedence + +The placeholder image URI is resolved in this order (first match wins): + +1. scope-configurations provider key `deployment.placeholder_image_uri` — per-scope, + managed without code +2. `PLACEHOLDER_IMAGE_URI_DEFAULT` env var — the **account-wide** knob, set in + `values.yaml` or via the agent's `extra_envs` (Helm) +3. the public default in `scope/scripts/resolve_placeholder_image` (validation-only + fallback; not usable for real Image functions) + +Because the URI is account-specific, `values.yaml` ships it commented out — set it +once per installation and every Image scope in that account uses it, unless a +specific scope overrides it via the provider key. + +#### Publishing a placeholder image + +Use the helper script to build and push the single-arch placeholders to your private +ECR (it creates the repository if it does not exist): + +```bash +export PLACEHOLDER_IMAGE_REPO=123456789012.dkr.ecr.us-east-1.amazonaws.com/aws-lambda/nullplatform-lambda-placeholder +lambda/scope/placeholder/publish # pushes :latest-arm64 and :latest-amd64 +``` + +Then set the URI (matching your scope architecture) in `values.yaml` or the agent's +`extra_envs`: + +```yaml +PLACEHOLDER_IMAGE_URI_DEFAULT: "123456789012.dkr.ecr.us-east-1.amazonaws.com/aws-lambda/nullplatform-lambda-placeholder:latest-arm64" +``` + ### Resource Naming | Resource | Format | Example | @@ -507,6 +568,7 @@ export TOFU_LOCK_TABLE=my-lock-table | Issue | Cause | Solution | |-------|-------|----------| | "Function name too long" | Name exceeds 64 chars | Shorten namespace/application/scope slugs | +| "Placeholder image not found" | Image scope with no private placeholder published | Run `lambda/scope/placeholder/publish` and set `PLACEHOLDER_IMAGE_URI_DEFAULT` (see [Placeholder Image](#placeholder-image-scope-bootstrap)) | | "Provisioned concurrency timeout" | Warmup taking too long | Increase `PROVISIONED_CONCURRENCY_MAX_WAIT_SECONDS` | | "ALB listener rule capacity" | Too many rules on ALB | Increase `ALB_LISTENER_RULE_CAPACITY` in values.yaml | | "Module not composed" | `MODULES_TO_USE` not updated | Verify setup script appends to `MODULES_TO_USE` | diff --git a/lambda/deployment/scripts/update_function_code b/lambda/deployment/scripts/update_function_code index bd91556..27c68e8 100755 --- a/lambda/deployment/scripts/update_function_code +++ b/lambda/deployment/scripts/update_function_code @@ -34,6 +34,22 @@ if [ "$package_type" = "Image" ]; then fi log debug " ✅ image_uri=$IMAGE_URI" + # Ensure the image's ECR repo lets the Lambda service pull it. Container-image + # Lambdas require a repository policy granting lambda.amazonaws.com; without it + # update-function-code fails with "Lambda does not have permission to access + # the ECR image". Idempotent and best-effort (cross-account repos may not be + # writable from here — Lambda would then need the policy set on the source side). + if [[ "$IMAGE_URI" == *.dkr.ecr.*.amazonaws.com/* ]]; then + ecr_region=$(echo "${IMAGE_URI%%/*}" | cut -d. -f4) + ecr_repo="${IMAGE_URI#*/}"; ecr_repo="${ecr_repo%%:*}"; ecr_repo="${ecr_repo%%@*}" + lambda_pull_policy='{"Version":"2008-10-17","Statement":[{"Sid":"LambdaECRImageRetrievalPolicy","Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":["ecr:BatchGetImage","ecr:GetDownloadUrlForLayer"]}]}' + if aws ecr set-repository-policy --repository-name "$ecr_repo" --region "$ecr_region" --policy-text "$lambda_pull_policy" >/dev/null 2>&1; then + log debug " ✅ ensured Lambda pull policy on ECR repo $ecr_repo" + else + log warn " ⚠️ could not set Lambda pull policy on ECR repo $ecr_repo (continuing; pull may fail if not already allowed)" + fi + fi + update_output=$(aws lambda update-function-code \ --function-name "$LAMBDA_FUNCTION_NAME" \ --image-uri "$IMAGE_URI" \ diff --git a/lambda/deployment/workflows/blue_green.yaml b/lambda/deployment/workflows/blue_green.yaml index a73b128..c0913e4 100644 --- a/lambda/deployment/workflows/blue_green.yaml +++ b/lambda/deployment/workflows/blue_green.yaml @@ -3,6 +3,16 @@ include: configuration: DEPLOYMENT_STRATEGY: "blue_green" steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment - name: build_context type: script file: "$SERVICE_PATH/deployment/build_context" diff --git a/lambda/deployment/workflows/delete.yaml b/lambda/deployment/workflows/delete.yaml index 548c749..01b0933 100644 --- a/lambda/deployment/workflows/delete.yaml +++ b/lambda/deployment/workflows/delete.yaml @@ -1,6 +1,16 @@ include: - "$SERVICE_PATH/values.yaml" steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment - name: build_context type: script file: "$SERVICE_PATH/deployment/build_context" diff --git a/lambda/deployment/workflows/diagnose.yaml b/lambda/deployment/workflows/diagnose.yaml new file mode 100644 index 0000000..10d6d39 --- /dev/null +++ b/lambda/deployment/workflows/diagnose.yaml @@ -0,0 +1,41 @@ +include: + - "$SERVICE_PATH/values.yaml" +steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment + - name: build_context + type: script + file: "$SERVICE_PATH/diagnose/build_context" + output: + - name: SCOPE_ID + type: environment + - name: SCOPE_NRN + type: environment + - name: LAMBDA_FUNCTION_NAME + type: environment + - name: LAMBDA_FUNCTION_ARN + type: environment + - name: LAMBDA_ROLE_ARN + type: environment + - name: SCOPE_DOMAIN + type: environment + - name: diagnose + type: executor + before_each: + name: notify_check_running + type: script + file: "$SERVICE_PATH/diagnose/notify_check_running" + after_each: + name: notify_check_results + type: script + file: "$SERVICE_PATH/diagnose/notify_results" + folders: + - "$SERVICE_PATH/diagnose/checks" diff --git a/lambda/deployment/workflows/finalize.yaml b/lambda/deployment/workflows/finalize.yaml index 2c49db9..98e32e4 100644 --- a/lambda/deployment/workflows/finalize.yaml +++ b/lambda/deployment/workflows/finalize.yaml @@ -1,6 +1,16 @@ include: - "$SERVICE_PATH/values.yaml" steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment - name: build_context type: script file: "$SERVICE_PATH/deployment/build_context" diff --git a/lambda/deployment/workflows/initial.yaml b/lambda/deployment/workflows/initial.yaml index 0b4723d..580642a 100644 --- a/lambda/deployment/workflows/initial.yaml +++ b/lambda/deployment/workflows/initial.yaml @@ -1,6 +1,16 @@ include: - "$SERVICE_PATH/values.yaml" steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment - name: build_context type: script file: "$SERVICE_PATH/deployment/build_context" diff --git a/lambda/deployment/workflows/rollback.yaml b/lambda/deployment/workflows/rollback.yaml index 49537a4..59c75db 100644 --- a/lambda/deployment/workflows/rollback.yaml +++ b/lambda/deployment/workflows/rollback.yaml @@ -1,6 +1,16 @@ include: - "$SERVICE_PATH/values.yaml" steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment - name: build_context type: script file: "$SERVICE_PATH/deployment/build_context" diff --git a/lambda/deployment/workflows/switch_traffic.yaml b/lambda/deployment/workflows/switch_traffic.yaml index b00f893..95023c5 100644 --- a/lambda/deployment/workflows/switch_traffic.yaml +++ b/lambda/deployment/workflows/switch_traffic.yaml @@ -1,6 +1,16 @@ include: - "$SERVICE_PATH/values.yaml" steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment - name: build_context type: script file: "$SERVICE_PATH/deployment/build_context" diff --git a/lambda/diagnose/build_context b/lambda/diagnose/build_context index 1ef14b6..0120d32 100755 --- a/lambda/diagnose/build_context +++ b/lambda/diagnose/build_context @@ -16,6 +16,10 @@ fi source "$SERVICE_PATH/utils/lambda_function_name" +# NOTE: The IAM role is assumed by the dedicated `assume_role` step that runs +# first in the workflow (see utils/assume_role_step); credentials are already in +# the environment here. + lambda_info=$(aws lambda get-function --function-name "$LAMBDA_FUNCTION_NAME" --output json 2>/dev/null || echo "{}") LAMBDA_FUNCTION_ARN=$(echo "$lambda_info" | jq -r '.Configuration.FunctionArn // ""') LAMBDA_ROLE_ARN=$(echo "$lambda_info" | jq -r '.Configuration.Role // ""') diff --git a/lambda/installation.md b/lambda/installation.md index d2d3a08..f34eb07 100644 --- a/lambda/installation.md +++ b/lambda/installation.md @@ -29,17 +29,24 @@ git clone https://github.com/nullplatform/tofu-modules /root/.np/nullplatform/to ### 2. Configure variables ```bash -cd lambda/tofu +cd lambda/specs/tofu cp terraform.tfvars.example terraform.tfvars ``` -Edit `terraform.tfvars` with your values: +This module registers the scope type **and** provisions the IAM policies the +agent needs to operate Lambda scopes (formerly the separate `requirements` +module — now consolidated here). Edit `terraform.tfvars` with your values: | Variable | Required | Description | |---|---|---| | `nrn` | ✅ | Nullplatform Resource Name (`organization:account`) | | `np_api_key` | ✅ | Nullplatform API key | | `tags_selectors` | ✅ | Tags to select the agent (e.g. `{ environment = "production" }`) | +| `name` | ✅ | Unique identifier for IAM policy naming (account-global, e.g. `prod-us-east-1`) | +| `aws_region` | — | AWS provider region. IAM is global; leave unset to resolve from the environment | +| `create_role` | — | `true` to create a new IAM role and attach the Lambda policies to it | +| `trusted_arns` | — | Principal ARNs allowed to assume the created role (with `create_role = true`) | +| `role_name` | — | Existing IAM role to attach the Lambda policies to (instead of `create_role`) | | `github_branch` | — | Branch to fetch specs from (default: `main`) | | `repo_path` | — | Path where scopes-lambda is cloned on the agent | | `overrides_enabled` | — | Set `true` to enable config overrides from scopes-networking | diff --git a/lambda/instance/workflows/list.yaml b/lambda/instance/workflows/list.yaml index ad29d14..0efef23 100644 --- a/lambda/instance/workflows/list.yaml +++ b/lambda/instance/workflows/list.yaml @@ -1,6 +1,16 @@ include: - "$SERVICE_PATH/values.yaml" steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment - name: build_context type: script file: "$SERVICE_PATH/instance/build_context" diff --git a/lambda/log/workflows/log.yaml b/lambda/log/workflows/log.yaml index 391733d..bb48230 100644 --- a/lambda/log/workflows/log.yaml +++ b/lambda/log/workflows/log.yaml @@ -1,6 +1,16 @@ include: - "$SERVICE_PATH/values.yaml" steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment - name: build_context type: script file: "$SERVICE_PATH/log/build_context" diff --git a/lambda/metric/workflows/list.yaml b/lambda/metric/workflows/list.yaml index ecdf27e..e8c2bf6 100644 --- a/lambda/metric/workflows/list.yaml +++ b/lambda/metric/workflows/list.yaml @@ -1,6 +1,16 @@ include: - "$SERVICE_PATH/values.yaml" steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment - name: list_metrics type: script file: "$SERVICE_PATH/metric/list_metrics" diff --git a/lambda/metric/workflows/metric.yaml b/lambda/metric/workflows/metric.yaml index a3b3e61..725b76e 100644 --- a/lambda/metric/workflows/metric.yaml +++ b/lambda/metric/workflows/metric.yaml @@ -1,6 +1,16 @@ include: - "$SERVICE_PATH/values.yaml" steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment - name: build_context type: script file: "$SERVICE_PATH/metric/build_context" diff --git a/lambda/prerequisites.md b/lambda/prerequisites.md index 9e216f4..c569754 100644 --- a/lambda/prerequisites.md +++ b/lambda/prerequisites.md @@ -229,7 +229,7 @@ Agents run in a Kubernetes pod and authenticate to AWS via a **Service Account** The IAM policies above let the agent CREATE Lambda functions and target groups, but the `create-scope` workflow ALSO depends on three runtime artifacts that must exist BEFORE the first scope is created. None are -auto-created by the bundled `install/tofu/main.tf` today — the operator +auto-created by the bundled `specs/tofu/main.tf` today — the operator must provision them. ### 1. Placeholder image (private ECR) @@ -383,7 +383,7 @@ This applies to **every** ECR repository that ever stores a Lambda image: 1. The placeholder ECR (created during installation, addressed by - `lambda/tofu/main.tf` if you use the bundled module — the policy is + `lambda/specs/tofu/main.tf` if you use the bundled module — the policy is already applied there). 2. **The per-application ECR repositories** that `np asset push` creates dynamically when each app does its first build, named diff --git a/lambda/scope/placeholder/publish b/lambda/scope/placeholder/publish index db5f60d..98de7a7 100755 --- a/lambda/scope/placeholder/publish +++ b/lambda/scope/placeholder/publish @@ -51,7 +51,7 @@ if ! docker buildx version &>/dev/null; then fi # Extract registry host and region from IMAGE_REPO -ECR_REGISTRY=$(echo "$IMAGE_REPO" | cut -d/ -f1) # 688720756067.dkr.ecr.us-east-1.amazonaws.com +ECR_REGISTRY=$(echo "$IMAGE_REPO" | cut -d/ -f1) # 123456789012.dkr.ecr.us-east-1.amazonaws.com ECR_REGION=$(echo "$ECR_REGISTRY" | cut -d. -f4) # us-east-1 ECR_REPO_NAME=$(echo "$IMAGE_REPO" | cut -d/ -f2-) # aws-lambda/nullplatform-lambda-placeholder diff --git a/lambda/scope/scripts/resolve_placeholder_image b/lambda/scope/scripts/resolve_placeholder_image index ca6a1a1..c196631 100755 --- a/lambda/scope/scripts/resolve_placeholder_image +++ b/lambda/scope/scripts/resolve_placeholder_image @@ -38,14 +38,15 @@ log info "🔍 Resolving placeholder image URI..." placeholder_image_base="${PLACEHOLDER_IMAGE_URI:-public.ecr.aws/nullplatform/aws-lambda/nullplatform-lambda-placeholder:latest}" architecture="${ARCHITECTURE:-arm64}" -# Lambda uses "x86_64" but images are tagged with Docker convention "amd64" -arch_tag="${architecture}" -[ "$architecture" = "x86_64" ] && arch_tag="amd64" +log debug " 📋 architecture=$architecture" +# Use the image URI as-is. If PLACEHOLDER_IMAGE_URI is not set, the default +# :latest tag is used without any architecture suffix — publish arch-specific +# tags and set PLACEHOLDER_IMAGE_URI explicitly if needed. if [[ "$placeholder_image_base" == *":"* ]]; then - placeholder_image_uri="${placeholder_image_base}-${arch_tag}" + placeholder_image_uri="$placeholder_image_base" else - placeholder_image_uri="${placeholder_image_base}:latest-${arch_tag}" + placeholder_image_uri="${placeholder_image_base}:latest" fi log debug " 📋 architecture=$architecture" diff --git a/lambda/scope/tests/scripts/assume_role_lib.bats b/lambda/scope/tests/scripts/assume_role_lib.bats new file mode 100644 index 0000000..2428799 --- /dev/null +++ b/lambda/scope/tests/scripts/assume_role_lib.bats @@ -0,0 +1,172 @@ +#!/usr/bin/env bats +# Unit tests for the pure resolution functions in utils/assume_role_lib. +# +# arn_for_selector_from_json is pure jq — exercised directly. +# provider_arn_for_selector orchestrates `np provider list` -> `np provider read`; +# we stub np() branching on its arguments (stateless, so it survives the +# command-substitution subshells the function uses) instead of a sequential mock. + +setup() { + TEST_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")" && pwd)" + HELPERS_DIR="$TEST_DIR/helpers" + LAMBDA_DIR="$(cd "$TEST_DIR/../../.." && pwd)" + + load "$HELPERS_DIR/test_helper.bash" + + # Stub np branching on args; FAKE_NP_MODE tweaks the `provider list` result. + np() { + local args="$*" + case "$args" in + *"provider list"*) + if [ "${FAKE_NP_MODE:-}" = "no_provider" ]; then + echo '{"results":[]}' + else + echo '{"results":[{"id":"prov-123"}]}' + fi + ;; + *"provider read"*) + echo '{"attributes":{"iam_role_arns":{"arns":[{"selector":"my-scope","arn":"arn:aws:iam::123456789012:role/test-lambda-role"}]}}}' + ;; + *) echo '{}' ;; + esac + } + export -f np + + source "$LAMBDA_DIR/utils/assume_role_lib" +} + +# --- arn_for_selector_from_json (pure) ------------------------------------- + +JSON='{"attributes":{"iam_role_arns":{"arns":[{"selector":"s3","arn":"arn:aws:iam::111:role/s3"},{"selector":"lambda","arn":"arn:aws:iam::111:role/lambda"}]}}}' + +@test "arn_for_selector_from_json: matching selector returns its arn" { + run arn_for_selector_from_json "$JSON" lambda + assert_success + [ "$output" = "arn:aws:iam::111:role/lambda" ] +} + +@test "arn_for_selector_from_json: unknown selector returns empty" { + run arn_for_selector_from_json "$JSON" ecs + assert_success + [ -z "$output" ] +} + +@test "arn_for_selector_from_json: missing arns key returns empty" { + run arn_for_selector_from_json '{"attributes":{}}' s3 + assert_success + [ -z "$output" ] +} + +@test "arn_for_selector_from_json: empty input returns empty" { + run arn_for_selector_from_json '' s3 + assert_success + [ -z "$output" ] +} + +@test "arn_for_selector_from_json: malformed json returns empty" { + run arn_for_selector_from_json 'not json' s3 + assert_success + [ -z "$output" ] +} + +@test "arn_for_selector_from_json: empty selector returns empty" { + run arn_for_selector_from_json "$JSON" '' + assert_success + [ -z "$output" ] +} + +@test "arn_for_selector_from_json: duplicate selector takes first" { + local dup='{"attributes":{"iam_role_arns":{"arns":[{"selector":"s3","arn":"first"},{"selector":"s3","arn":"second"}]}}}' + run arn_for_selector_from_json "$dup" s3 + assert_success + [ "$output" = "first" ] +} + +# --- provider_arn_for_selector (np list -> read orchestration) ------------- + +@test "provider_arn_for_selector: resolves arn for matching selector" { + run provider_arn_for_selector "organization=1:account=2" my-scope + assert_success + [ "$output" = "arn:aws:iam::123456789012:role/test-lambda-role" ] +} + +@test "provider_arn_for_selector: no provider instance returns empty" { + export FAKE_NP_MODE=no_provider + run provider_arn_for_selector "organization=1:account=2" my-scope + assert_success + [ -z "$output" ] +} + +@test "provider_arn_for_selector: selector not in provider returns empty" { + run provider_arn_for_selector "organization=1:account=2" does-not-exist + assert_success + [ -z "$output" ] +} + +@test "provider_arn_for_selector: empty nrn returns empty" { + run provider_arn_for_selector "" my-scope + assert_success + [ -z "$output" ] +} + +@test "provider_arn_for_selector: empty selector returns empty" { + run provider_arn_for_selector "organization=1:account=2" "" + assert_success + [ -z "$output" ] +} + +# --- resolve_assume_role_arn (full precedence chain) ----------------------- +# Each test defines its own stateless np stub (branches on args) so it survives +# the command-substitution subshells the resolver uses. + +@test "resolve_assume_role_arn: env override wins over everything" { + export ASSUME_ROLE_ARN="arn:env" + run resolve_assume_role_arn "organization=1:account=2" lambda + assert_success + [ "$output" = "arn:env" ] +} + +@test "resolve_assume_role_arn: IAM provider when no env override" { + np() { + case "$*" in + *"--specification_slug aws-iam-configuration"*) echo '{"results":[{"id":"iam-1"}]}' ;; + *"provider read"*) echo '{"attributes":{"iam_role_arns":{"arns":[{"selector":"lambda","arn":"arn:provider:lambda"}]}}}' ;; + *) echo '{}' ;; + esac + } + export -f np + run resolve_assume_role_arn "organization=1:account=2" lambda + assert_success + [ "$output" = "arn:provider:lambda" ] +} + +@test "resolve_assume_role_arn: scope-config fallback when provider misses" { + np() { + case "$*" in + *"--specification_slug aws-iam-configuration"*) echo '{"results":[]}' ;; + *"--categories scope-configurations"*) echo '{"results":[{"attributes":{"assume_role":{"arn":"arn:scopecfg:legacy"}}}]}' ;; + *) echo '{}' ;; + esac + } + export -f np + run resolve_assume_role_arn "organization=1:account=2" lambda + assert_success + [ "$output" = "arn:scopecfg:legacy" ] +} + +@test "resolve_assume_role_arn: ASSUME_ROLE_ARN_DEFAULT when nothing else resolves" { + np() { echo '{"results":[]}'; } + export -f np + export ASSUME_ROLE_ARN_DEFAULT="arn:default" + run resolve_assume_role_arn "organization=1:account=2" lambda + assert_success + [ "$output" = "arn:default" ] +} + +@test "resolve_assume_role_arn: empty (IRSA) when nothing resolves and no default" { + np() { echo '{"results":[]}'; } + export -f np + run resolve_assume_role_arn "organization=1:account=2" lambda + assert_success + [ -z "$output" ] +} diff --git a/lambda/scope/tofu/do_tofu b/lambda/scope/tofu/do_tofu index 30954c3..e6d94e5 100755 --- a/lambda/scope/tofu/do_tofu +++ b/lambda/scope/tofu/do_tofu @@ -176,14 +176,13 @@ if [ "$TOFU_ACTION" = "apply" ]; then fi -# Run tofu action log info "📝 Running tofu $TOFU_ACTION..." tofu_exit_code=0 -tofu -chdir="$TF_WORKING_DIR" "$TOFU_ACTION" -auto-approve -var-file="$TOFU_VAR_FILE" || tofu_exit_code=$? +tofu -chdir="$TF_WORKING_DIR" "$TOFU_ACTION" -auto-approve -var-file="$TOFU_VAR_FILE" 2>&1 || tofu_exit_code=$? if [ $tofu_exit_code -ne 0 ]; then echo "" - echo "❌ Tofu $TOFU_ACTION failed with exit code $tofu_exit_code" >&2 - echo "" >&2 + echo "❌ Tofu $TOFU_ACTION failed with exit code $tofu_exit_code" + echo "" return 1 fi diff --git a/lambda/scope/tofu/iam/setup b/lambda/scope/tofu/iam/setup index 803f4b9..a113d04 100755 --- a/lambda/scope/tofu/iam/setup +++ b/lambda/scope/tofu/iam/setup @@ -1,12 +1,30 @@ #!/bin/bash source "$SERVICE_PATH/utils/log" +source "$SERVICE_PATH/utils/get_config_value" log info "🔍 Configuring IAM role for deployment..." -iam_role_name="${LAMBDA_FUNCTION_NAME}-role" +# Execution-role name = -role. The prefix is configurable +# (scope-config provider > LAMBDA_EXECUTION_ROLE_PREFIX env > default), but it +# MUST keep matching the iam:CreateRole/PassRole Resource constraint of the +# assume role's policy in lambda/setup (arn:aws:iam::*:role/np-lambda-* or +# .../nullplatform-*); otherwise CreateRole/PassRole are denied. The default +# preserves the historical "np-lambda-". +exec_role_prefix=$(get_config_value \ + --provider '.providers["scope-configurations"].lambda.execution_role_prefix' \ + --env LAMBDA_EXECUTION_ROLE_PREFIX \ + --default 'np-lambda-') + +iam_role_name="${exec_role_prefix}${LAMBDA_FUNCTION_NAME}-role" iam_role_name="${iam_role_name:0:64}" +# Warn (don't block) if the prefix falls outside the policy's allowed prefixes. +case "$exec_role_prefix" in + np-lambda-*|nullplatform-*) : ;; + *) log warn " ⚠️ execution_role_prefix='$exec_role_prefix' is outside the assume role's IAM policy constraint (np-lambda-* / nullplatform-*); CreateRole/PassRole may be denied unless that policy is updated" ;; +esac + log debug " 📋 role_name=$iam_role_name" role_output=$(aws iam get-role --role-name "$iam_role_name" 2>&1) diff --git a/lambda/scope/workflows/adjust_provisioned_concurrency.yaml b/lambda/scope/workflows/adjust_provisioned_concurrency.yaml index 93b2a53..4728283 100644 --- a/lambda/scope/workflows/adjust_provisioned_concurrency.yaml +++ b/lambda/scope/workflows/adjust_provisioned_concurrency.yaml @@ -1,6 +1,16 @@ include: - "$SERVICE_PATH/values.yaml" steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment - name: build_context type: script file: "$SERVICE_PATH/scope/build_context" diff --git a/lambda/scope/workflows/adjust_reserved_concurrency.yaml b/lambda/scope/workflows/adjust_reserved_concurrency.yaml index 5fab200..3039133 100644 --- a/lambda/scope/workflows/adjust_reserved_concurrency.yaml +++ b/lambda/scope/workflows/adjust_reserved_concurrency.yaml @@ -1,6 +1,16 @@ include: - "$SERVICE_PATH/values.yaml" steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment - name: build_context type: script file: "$SERVICE_PATH/scope/build_context" diff --git a/lambda/scope/workflows/create.yaml b/lambda/scope/workflows/create.yaml index de08b9a..8ec86c3 100644 --- a/lambda/scope/workflows/create.yaml +++ b/lambda/scope/workflows/create.yaml @@ -3,6 +3,16 @@ include: configuration: TOFU_ACTION: "apply" steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment - name: build_context type: script file: "$SERVICE_PATH/scope/build_context" diff --git a/lambda/scope/workflows/delete.yaml b/lambda/scope/workflows/delete.yaml index 2fb8faf..c5bdf87 100644 --- a/lambda/scope/workflows/delete.yaml +++ b/lambda/scope/workflows/delete.yaml @@ -3,6 +3,16 @@ include: configuration: TOFU_ACTION: "destroy" steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment - name: build_context type: script file: "$SERVICE_PATH/scope/build_context" diff --git a/lambda/scope/workflows/diagnose.yaml b/lambda/scope/workflows/diagnose.yaml index dd990df..f084fa4 100644 --- a/lambda/scope/workflows/diagnose.yaml +++ b/lambda/scope/workflows/diagnose.yaml @@ -1,6 +1,16 @@ include: - "$SERVICE_PATH/values.yaml" steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment - name: build_context type: script file: "$SERVICE_PATH/scope/build_context" diff --git a/lambda/scope/workflows/invoke.yaml b/lambda/scope/workflows/invoke.yaml index 58af584..fb9a25a 100644 --- a/lambda/scope/workflows/invoke.yaml +++ b/lambda/scope/workflows/invoke.yaml @@ -1,6 +1,16 @@ include: - "$SERVICE_PATH/values.yaml" steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment - name: build_context type: script file: "$SERVICE_PATH/scope/build_context" diff --git a/lambda/scope/workflows/update.yaml b/lambda/scope/workflows/update.yaml index f9a6bc4..9969bc8 100644 --- a/lambda/scope/workflows/update.yaml +++ b/lambda/scope/workflows/update.yaml @@ -1,6 +1,16 @@ include: - "$SERVICE_PATH/values.yaml" steps: + - name: assume_role + type: script + file: "$SERVICE_PATH/utils/assume_role_step" + output: + - name: AWS_ACCESS_KEY_ID + type: environment + - name: AWS_SECRET_ACCESS_KEY + type: environment + - name: AWS_SESSION_TOKEN + type: environment - name: build_context type: script file: "$SERVICE_PATH/scope/build_context" diff --git a/lambda/specs/tofu/backend.tf b/lambda/specs/tofu/backend.tf index 7330a13..cc6acba 100644 --- a/lambda/specs/tofu/backend.tf +++ b/lambda/specs/tofu/backend.tf @@ -1,5 +1,5 @@ terraform { backend "s3" { - key = "lambda/install/terraform.tfstate" + key = "lambda/specs/tofu/terraform.tfstate" } } diff --git a/lambda/specs/tofu/outputs.tf b/lambda/specs/tofu/outputs.tf new file mode 100644 index 0000000..a3570a3 --- /dev/null +++ b/lambda/specs/tofu/outputs.tf @@ -0,0 +1,29 @@ +output "lambda_policy_arn" { + description = "ARN of the Lambda core management policy" + value = aws_iam_policy.nullplatform_lambda_policy.arn +} + +output "lambda_iam_policy_arn" { + description = "ARN of the IAM execution role management policy" + value = aws_iam_policy.nullplatform_lambda_iam_policy.arn +} + +output "lambda_networking_policy_arn" { + description = "ARN of the networking policy (API GW + ALB + Route53)" + value = aws_iam_policy.nullplatform_lambda_networking_policy.arn +} + +output "lambda_storage_policy_arn" { + description = "ARN of the storage & observability policy (ECR + SM + CW + S3)" + value = aws_iam_policy.nullplatform_lambda_storage_policy.arn +} + +output "role_arn" { + description = "ARN of the IAM role created by this module. Empty string when create_role is false." + value = var.create_role ? aws_iam_role.nullplatform_lambda_role[0].arn : "" +} + +output "role_name" { + description = "Name of the IAM role created by this module. Empty string when create_role is false." + value = var.create_role ? aws_iam_role.nullplatform_lambda_role[0].name : "" +} diff --git a/lambda/specs/tofu/provider.tf b/lambda/specs/tofu/provider.tf index ef613db..5ea18b8 100644 --- a/lambda/specs/tofu/provider.tf +++ b/lambda/specs/tofu/provider.tf @@ -2,7 +2,7 @@ terraform { required_providers { nullplatform = { source = "nullplatform/nullplatform" - version = "0.0.87, < 0.1.0" + version = ">= 0.0.90, < 0.1.0" } http = { source = "hashicorp/http" @@ -16,9 +16,17 @@ terraform { source = "hashicorp/null" version = "~> 3.2" } + aws = { + source = "hashicorp/aws" + version = "~> 6.47.0" + } } } provider "nullplatform" { api_key = var.np_api_key } + +provider "aws" { + region = var.aws_region +} diff --git a/lambda/specs/tofu/requirements.tf b/lambda/specs/tofu/requirements.tf new file mode 100644 index 0000000..ca10521 --- /dev/null +++ b/lambda/specs/tofu/requirements.tf @@ -0,0 +1,304 @@ +################################################################################ +# IAM role (only when create_role = true) +################################################################################ + +resource "aws_iam_role" "nullplatform_lambda_role" { + count = var.create_role ? 1 : 0 + name = "nullplatform_${var.name}_lambda_role" + + assume_role_policy = jsonencode({ + Version = "2012-10-17" + Statement = [ + { + Effect = "Allow" + Principal = { AWS = var.trusted_arns } + Action = "sts:AssumeRole" + } + ] + }) +} + +################################################################################ +# Policy attachments +################################################################################ + +locals { + effective_role_name = var.create_role ? aws_iam_role.nullplatform_lambda_role[0].name : var.role_name + attach_policies = var.create_role || var.role_name != null +} + +resource "aws_iam_role_policy_attachment" "lambda" { + count = local.attach_policies ? 1 : 0 + role = local.effective_role_name + policy_arn = aws_iam_policy.nullplatform_lambda_policy.arn +} + +resource "aws_iam_role_policy_attachment" "lambda_iam" { + count = local.attach_policies ? 1 : 0 + role = local.effective_role_name + policy_arn = aws_iam_policy.nullplatform_lambda_iam_policy.arn +} + +resource "aws_iam_role_policy_attachment" "lambda_networking" { + count = local.attach_policies ? 1 : 0 + role = local.effective_role_name + policy_arn = aws_iam_policy.nullplatform_lambda_networking_policy.arn +} + +resource "aws_iam_role_policy_attachment" "lambda_storage" { + count = local.attach_policies ? 1 : 0 + role = local.effective_role_name + policy_arn = aws_iam_policy.nullplatform_lambda_storage_policy.arn +} + +################################################################################ +# Lambda core policy +# Manages Lambda functions, versions, aliases, concurrency, and invocations. +################################################################################ + +resource "aws_iam_policy" "nullplatform_lambda_policy" { + name = "nullplatform_${var.name}_lambda_policy" + description = "Policy for managing Lambda functions provisioned by the scopes-lambda provider" + + policy = jsonencode({ + Version = "2012-10-17" + Statement = [ + { + Effect = "Allow" + Action = [ + "lambda:CreateFunction", + "lambda:DeleteFunction", + "lambda:GetFunction", + "lambda:GetFunctionConfiguration", + "lambda:GetFunctionConcurrency", + "lambda:UpdateFunctionCode", + "lambda:UpdateFunctionConfiguration", + "lambda:PublishVersion", + "lambda:ListVersionsByFunction", + "lambda:GetAlias", + "lambda:ListAliases", + "lambda:CreateAlias", + "lambda:UpdateAlias", + "lambda:DeleteAlias", + "lambda:InvokeFunction", + "lambda:PutFunctionConcurrency", + "lambda:DeleteFunctionConcurrency", + "lambda:PutProvisionedConcurrencyConfig", + "lambda:DeleteProvisionedConcurrencyConfig", + "lambda:GetProvisionedConcurrencyConfig", + "lambda:GetAccountSettings", + "lambda:AddPermission", + "lambda:RemovePermission", + "lambda:TagResource", + "lambda:UntagResource", + "lambda:ListTags" + ] + Resource = "*" + } + ] + }) +} + +################################################################################ +# IAM management policy +# Creates and manages Lambda execution roles (scoped to nullplatform roles). +################################################################################ + +resource "aws_iam_policy" "nullplatform_lambda_iam_policy" { + name = "nullplatform_${var.name}_lambda_iam_policy" + description = "Policy for managing IAM execution roles for Lambda scopes" + + policy = jsonencode({ + Version = "2012-10-17" + Statement = [ + { + Effect = "Allow" + Action = [ + "iam:CreateRole", + "iam:GetRole", + "iam:DeleteRole", + "iam:PutRolePolicy", + "iam:GetRolePolicy", + "iam:DeleteRolePolicy", + "iam:ListRolePolicies", + "iam:AttachRolePolicy", + "iam:DetachRolePolicy", + "iam:ListAttachedRolePolicies", + "iam:TagRole", + "iam:UntagRole", + "iam:PassRole" + ] + Resource = [ + "arn:aws:iam::*:role/nullplatform-*", + "arn:aws:iam::*:role/np-lambda-*" + ] + }, + { + Effect = "Allow" + Action = ["sts:GetCallerIdentity"] + Resource = "*" + } + ] + }) +} + +################################################################################ +# Networking policy +# API Gateway (HTTP APIs), ALB (target groups + listener rules), Route53 DNS. +################################################################################ + +resource "aws_iam_policy" "nullplatform_lambda_networking_policy" { + name = "nullplatform_${var.name}_lambda_networking_policy" + description = "Policy for managing API Gateway, ALB, and Route53 for Lambda scopes" + + policy = jsonencode({ + Version = "2012-10-17" + Statement = [ + { + Effect = "Allow" + Action = [ + "apigateway:GET", + "apigateway:POST", + "apigateway:PUT", + "apigateway:PATCH", + "apigateway:DELETE", + "apigateway:TagResource", + "apigateway:UntagResource" + ] + Resource = "*" + }, + { + Effect = "Allow" + Action = [ + "elasticloadbalancing:CreateTargetGroup", + "elasticloadbalancing:DeleteTargetGroup", + "elasticloadbalancing:ModifyTargetGroup", + "elasticloadbalancing:ModifyTargetGroupAttributes", + "elasticloadbalancing:DescribeTargetGroups", + "elasticloadbalancing:DescribeTargetGroupAttributes", + "elasticloadbalancing:RegisterTargets", + "elasticloadbalancing:DeregisterTargets", + "elasticloadbalancing:DescribeTargetHealth", + "elasticloadbalancing:CreateListenerRule", + "elasticloadbalancing:DeleteListenerRule", + "elasticloadbalancing:ModifyListenerRule", + "elasticloadbalancing:DescribeRules", + "elasticloadbalancing:DescribeListeners", + "elasticloadbalancing:AddTags", + "elasticloadbalancing:RemoveTags" + ] + Resource = "*" + }, + { + Effect = "Allow" + Action = [ + "route53:ChangeResourceRecordSets", + "route53:GetHostedZone", + "route53:ListResourceRecordSets", + "route53:ListHostedZones" + ] + Resource = "*" + } + ] + }) +} + +################################################################################ +# Storage & Observability policy +# ECR (placeholder image), Secrets Manager (deployment parameters), +# CloudWatch Logs & Metrics, S3 (tfstate bucket). +################################################################################ + +resource "aws_iam_policy" "nullplatform_lambda_storage_policy" { + name = "nullplatform_${var.name}_lambda_storage_policy" + description = "Policy for ECR, Secrets Manager, CloudWatch, and S3 tfstate for Lambda scopes" + + policy = jsonencode({ + Version = "2012-10-17" + Statement = [ + { + Sid = "ECR" + Effect = "Allow" + Action = [ + "ecr:GetAuthorizationToken", + "ecr:CreateRepository", + "ecr:DescribeRepositories", + "ecr:DescribeImages", + "ecr:BatchGetImage", + "ecr:GetDownloadUrlForLayer", + "ecr:InitiateLayerUpload", + "ecr:UploadLayerPart", + "ecr:CompleteLayerUpload", + "ecr:PutImage", + "ecr:BatchCheckLayerAvailability", + "ecr:TagResource", + "ecr:GetRepositoryPolicy", + "ecr:SetRepositoryPolicy" + ] + Resource = "*" + }, + { + Sid = "SecretsManager" + Effect = "Allow" + Action = [ + "secretsmanager:CreateSecret", + "secretsmanager:PutSecretValue", + "secretsmanager:GetSecretValue", + "secretsmanager:DescribeSecret", + "secretsmanager:ListSecrets", + "secretsmanager:DeleteSecret", + "secretsmanager:TagResource" + ] + Resource = "arn:aws:secretsmanager:*:*:secret:nullplatform/*" + }, + { + Sid = "CloudWatchLogs" + Effect = "Allow" + Action = [ + "logs:CreateLogGroup", + "logs:DeleteLogGroup", + "logs:DescribeLogGroups", + "logs:CreateLogStream", + "logs:PutLogEvents", + "logs:FilterLogEvents", + "logs:GetLogEvents", + "logs:PutRetentionPolicy", + "logs:TagLogGroup", + "logs:ListTagsForResource", + "logs:TagResource", + "logs:UntagResource" + ] + Resource = "*" + }, + { + Sid = "CloudWatchMetrics" + Effect = "Allow" + Action = [ + "cloudwatch:GetMetricStatistics", + "cloudwatch:ListMetrics", + "cloudwatch:GetMetricData" + ] + Resource = "*" + }, + { + Sid = "S3Tfstate" + Effect = "Allow" + Action = [ + "s3:CreateBucket", + "s3:HeadBucket", + "s3:PutBucketVersioning", + "s3:ListBucket", + "s3:ListBucketVersions", + "s3:GetObject", + "s3:PutObject", + "s3:DeleteObject", + "s3:DeleteObjectVersion" + ] + Resource = [ + "arn:aws:s3:::nullplatform-lambda-tfstate-*", + "arn:aws:s3:::nullplatform-lambda-tfstate-*/*" + ] + } + ] + }) +} diff --git a/lambda/specs/tofu/terraform.tfvars.example b/lambda/specs/tofu/terraform.tfvars.example index b5951da..7861244 100644 --- a/lambda/specs/tofu/terraform.tfvars.example +++ b/lambda/specs/tofu/terraform.tfvars.example @@ -12,6 +12,23 @@ tags_selectors = { environment = "production" } +# Unique identifier for IAM policy naming (policy names are account-global). +name = "prod-us-east-1" + +################################################################################ +# IAM permissions (optional) +################################################################################ + +# AWS provider region (IAM is global; leave unset to resolve from the environment). +# aws_region = "us-east-1" + +# Attach the Lambda policies to a brand-new role (and trust the given principals)... +# create_role = true +# trusted_arns = ["arn:aws:iam::123456789012:role/my-agent-role"] + +# ...or attach them to an existing role instead: +# role_name = "my-existing-agent-role" + ################################################################################ # Repository (override if using a fork or private mirror) ################################################################################ diff --git a/lambda/specs/tofu/variables.tf b/lambda/specs/tofu/variables.tf index 0a1c9ba..52ad529 100644 --- a/lambda/specs/tofu/variables.tf +++ b/lambda/specs/tofu/variables.tf @@ -95,3 +95,38 @@ variable "overrides_service_path" { type = string default = null } + +################################################################################ +# IAM permissions (requirements) +# Policies the agent needs to operate Lambda scopes. IAM is global, but the AWS +# provider still needs a region to initialize. +################################################################################ + +variable "aws_region" { + description = "AWS region used to initialize the AWS provider. IAM resources are global; leave null to resolve from the environment (AWS_REGION / profile)." + type = string + default = null +} + +variable "name" { + description = "Unique identifier for policy naming. Must be unique per AWS account (IAM policy names are account-global). Example: \"prod-us-east-1\"." + type = string +} + +variable "create_role" { + description = "When true, creates a new IAM role and attaches all policies to it. The role will allow the ARNs in trusted_arns to assume it via sts:AssumeRole." + type = bool + default = false +} + +variable "role_name" { + description = "Existing IAM role name to attach the Lambda policies to. Ignored when create_role is true." + type = string + default = null +} + +variable "trusted_arns" { + description = "List of IAM principal ARNs allowed to assume the role. Only used when create_role is true." + type = list(string) + default = [] +} diff --git a/lambda/utils/assume_role b/lambda/utils/assume_role new file mode 100755 index 0000000..0c851de --- /dev/null +++ b/lambda/utils/assume_role @@ -0,0 +1,41 @@ +#!/bin/bash +# Sourceable helper — do NOT execute directly. +# Reads ASSUME_ROLE_ARN from environment. If set, calls sts:AssumeRole and exports +# temporary credentials so all subsequent AWS calls (CLI + Tofu) use that role. +# If empty, does nothing — pod's IRSA handles auth directly. +# +# Requires: aws CLI, jq +# Expects: ASSUME_ROLE_ARN (exported by fetch_scope_configuration or values.yaml) +# SCOPE_ID (optional, used for the session name) + +_ar_log() { + if declare -f log > /dev/null 2>&1; then + log "$1" "$2" + else + echo "$2" + fi +} + +if [ -n "${ASSUME_ROLE_ARN:-}" ]; then + _ar_log info " 🔑 Assuming role: $ASSUME_ROLE_ARN" + + _ar_sts_error=$(mktemp) + if ! ASSUMED_CREDS=$(aws sts assume-role \ + --role-arn "$ASSUME_ROLE_ARN" \ + --role-session-name "np-lambda-${SCOPE_ID:-workflow}" \ + --output json 2>"$_ar_sts_error"); then + _ar_log info "ERROR: sts:AssumeRole failed for $ASSUME_ROLE_ARN" + _ar_log info "$(cat "$_ar_sts_error")" + rm -f "$_ar_sts_error" + return 1 + fi + rm -f "$_ar_sts_error" + + export AWS_ACCESS_KEY_ID=$(echo "$ASSUMED_CREDS" | jq -r '.Credentials.AccessKeyId') + export AWS_SECRET_ACCESS_KEY=$(echo "$ASSUMED_CREDS" | jq -r '.Credentials.SecretAccessKey') + export AWS_SESSION_TOKEN=$(echo "$ASSUMED_CREDS" | jq -r '.Credentials.SessionToken') + + _ar_log info " ✅ Role assumed successfully" +else + _ar_log debug " ✅ assume_role=skipped (using pod credentials)" +fi diff --git a/lambda/utils/assume_role_lib b/lambda/utils/assume_role_lib new file mode 100644 index 0000000..7f2b503 --- /dev/null +++ b/lambda/utils/assume_role_lib @@ -0,0 +1,78 @@ +#!/bin/bash +# Sourceable library of PURE helpers for assume-role resolution. +# Defines functions only — NO side effects on source, so it can be unit-tested +# (see scope/tests/scripts/assume_role_lib.bats) and reused by the scripts that +# resolve the role to assume (fetch_scope_configuration, diagnose/build_context). + +# arn_for_selector_from_json +# Given the JSON returned by `np provider read --id --format json` and a +# selector string, echoes the matching IAM role ARN, or empty string if there +# is no match / the input is missing or malformed. First match wins. +arn_for_selector_from_json() { + local json="$1" selector="$2" + [ -n "$json" ] || return 0 + [ -n "$selector" ] || return 0 + printf '%s' "$json" | jq -r --arg sel "$selector" ' + [ .attributes.iam_role_arns.arns[]? + | select(.selector == $sel) + | .arn ] + | first // ""' 2>/dev/null +} + +# provider_arn_for_selector +# Looks up the "AWS IAM" provider (specification aws-iam-configuration, category +# "Identity & Access Control") at , reads it, and echoes the ARN matching +# . Empty string if no provider / no match. Requires np + jq. +# NOTE: `np provider list` does NOT return deep attributes, so we list to get the +# provider id and then `np provider read --id` to obtain the arns (same two-step +# pattern used for account.region resolution in fetch_scope_configuration). +provider_arn_for_selector() { + local nrn="$1" selector="$2" + [ -n "$nrn" ] || return 0 + [ -n "$selector" ] || return 0 + + local pid data + pid=$(np provider list --nrn "$nrn" \ + --specification_slug aws-iam-configuration \ + --format json --limit 100 2>/dev/null \ + | jq -r '[ (.results // [])[] ] | first | .id // ""' 2>/dev/null) + [ -n "$pid" ] && [ "$pid" != "null" ] || return 0 + + data=$(np provider read --id "$pid" --format json 2>/dev/null) + arn_for_selector_from_json "$data" "$selector" +} + +# scope_config_assume_role_arn +# Back-compat fallback: reads the scope-configurations provider(s) at and +# echoes the first .attributes.assume_role.arn found, or empty string. This is +# the legacy per-scope override that predates the IAM provider mechanism. +scope_config_assume_role_arn() { + local nrn="$1" + [ -n "$nrn" ] || return 0 + np provider list --nrn "$nrn" --categories scope-configurations \ + --format json --limit 100 2>/dev/null \ + | jq -r '[ (.results // [])[] | .attributes.assume_role.arn? // empty ] | first // ""' 2>/dev/null +} + +# resolve_assume_role_arn +# Full precedence chain, echoes the IAM role ARN to assume (empty = use IRSA): +# 1. $ASSUME_ROLE_ARN env var (explicit override) +# 2. "AWS IAM" provider (aws-iam-configuration) at by +# 3. scope-configurations provider key assume_role.arn (back-compat) +# 4. $ASSUME_ROLE_ARN_DEFAULT env var (per-account agent default) +resolve_assume_role_arn() { + local nrn="$1" selector="$2" arn="" + + arn="${ASSUME_ROLE_ARN:-}" + + if [ -z "$arn" ] && [ -n "$nrn" ] && [ -n "$selector" ]; then + arn=$(provider_arn_for_selector "$nrn" "$selector") + fi + + if [ -z "$arn" ] && [ -n "$nrn" ]; then + arn=$(scope_config_assume_role_arn "$nrn") + fi + + arn="${arn:-${ASSUME_ROLE_ARN_DEFAULT:-}}" + printf '%s' "$arn" +} diff --git a/lambda/utils/assume_role_step b/lambda/utils/assume_role_step new file mode 100755 index 0000000..ccc86cc --- /dev/null +++ b/lambda/utils/assume_role_step @@ -0,0 +1,38 @@ +#!/bin/bash +# Dedicated workflow step: resolve the target IAM role and assume it, exporting +# temporary credentials so every subsequent step in the workflow inherits them. +# +# Runs FIRST in each AWS-touching workflow. The workflow YAML must declare +# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN as +# output:environment so the nullplatform engine propagates them downstream. +# +# Resolution precedence (see resolve_assume_role_arn in assume_role_lib): +# $ASSUME_ROLE_ARN env -> IAM provider by selector -> scope-config -> DEFAULT -> IRSA +# +# Requires: aws CLI, jq, np. Expects: CONTEXT, SERVICE_PATH. + +# Optional pretty logging (utils/assume_role falls back to echo if `log` absent). +[ -f "$SERVICE_PATH/utils/log" ] && source "$SERVICE_PATH/utils/log" +# shellcheck source=assume_role_lib +source "$SERVICE_PATH/utils/assume_role_lib" + +# Account NRN from CONTEXT (scope / service / generic event), stripping the +# :namespace= segment and everything after it — the IAM provider is account-level. +NRN=$(echo "${CONTEXT:-}" | jq -r '.scope.nrn // .service.nrn // .entity_nrn // ""' 2>/dev/null) +ACCOUNT_NRN=$(echo "$NRN" | sed 's/:namespace=.*$//') +ASSUME_ROLE_SELECTOR="${ASSUME_ROLE_SELECTOR:-lambda}" + +ASSUME_ROLE_ARN=$(resolve_assume_role_arn "$ACCOUNT_NRN" "$ASSUME_ROLE_SELECTOR") +export ASSUME_ROLE_ARN + +# utils/assume_role performs sts:AssumeRole and exports AWS_* when an ARN is set, +# or no-ops (leaving pod IRSA in place) when empty. It returns non-zero only when +# sts:AssumeRole itself fails. +if ! source "$SERVICE_PATH/utils/assume_role"; then + echo "❌ assume_role step failed: could not assume $ASSUME_ROLE_ARN" >&2 + echo "💡 Possible causes:" >&2 + echo " - The agent's pod role is not allowed to sts:AssumeRole the target role" >&2 + echo " - The target role's trust policy does not trust the agent role" >&2 + echo " - The resolved ARN is wrong (check the IAM provider selector=$ASSUME_ROLE_SELECTOR)" >&2 + exit 1 +fi diff --git a/lambda/utils/fetch_scope_configuration b/lambda/utils/fetch_scope_configuration index 63429b7..5ab0cc5 100755 --- a/lambda/utils/fetch_scope_configuration +++ b/lambda/utils/fetch_scope_configuration @@ -67,15 +67,22 @@ HOSTED_PRIVATE_ZONE_ID=$(echo "$CLOUD_PROVIDER_CONFIG" | jq -r '.networking.host log debug " ✅ hosted_private_zone_id=$HOSTED_PRIVATE_ZONE_ID" # From scope-configurations category -TOFU_STATE_BUCKET=$(echo "$SCOPE_CONFIG" | jq -r '.state.tofu_state_bucket // empty') +TOFU_STATE_BUCKET=$(echo "$SCOPE_CONFIG" | jq -r '.state.tofu_state_bucket // .provider.aws_state_bucket // empty') log debug " ✅ tofu_state_bucket=$TOFU_STATE_BUCKET" PLACEHOLDER_IMAGE_URI=$(echo "$SCOPE_CONFIG" | jq -r '.deployment.placeholder_image_uri // empty') -log debug " ✅ placeholder_image_uri=$PLACEHOLDER_IMAGE_URI" +# Fallback to env var set in values.yaml when the provider does not supply it. +PLACEHOLDER_IMAGE_URI="${PLACEHOLDER_IMAGE_URI:-${PLACEHOLDER_IMAGE_URI_DEFAULT:-}}" +log debug " ✅ placeholder_image_uri=${PLACEHOLDER_IMAGE_URI:-(not set, using script default)}" NULL_AGENT_LAYER_ARN=$(echo "$SCOPE_CONFIG" | jq -r '.agent.null_agent_layer_arn // empty') log debug " ✅ null_agent_layer_arn=$NULL_AGENT_LAYER_ARN" +# NOTE: The IAM role is assumed by the dedicated `assume_role` step that runs +# first in each workflow (see utils/assume_role_step). By the time this script +# runs, AWS_ACCESS_KEY_ID/SECRET/SESSION_TOKEN are already in the environment, so +# no assume-role resolution happens here anymore. + export ALB_PUBLIC_LISTENER_ARN export ALB_PRIVATE_LISTENER_ARN export VPC_ID diff --git a/lambda/values.yaml b/lambda/values.yaml index f8891bd..179dd6b 100644 --- a/lambda/values.yaml +++ b/lambda/values.yaml @@ -35,6 +35,36 @@ configuration: # ── Null Agent ───────────────────────────────────────────────────────────── USE_NULL_AGENT: false + # ── Placeholder image ────────────────────────────────────────────────────── + # Container image used to bootstrap the Lambda function at scope creation, + # before the first real deployment. MUST live in a private ECR in the same + # account/region (Lambda rejects public.ecr.aws images), and be single-arch + # matching the scope architecture (publish :latest-amd64 / :latest-arm64). + # Resolution precedence (see utils/fetch_scope_configuration): + # scope-configurations provider key deployment.placeholder_image_uri + # > PLACEHOLDER_IMAGE_URI_DEFAULT env var (set per-account on the agent) + # > the public default in scope/scripts/resolve_placeholder_image + # Default placeholder image for this installation. Account-specific, but + # committed here intentionally. Can still be overridden per scope via the + # scope-config provider key deployment.placeholder_image_uri, or per agent + # via extra_envs in Helm. + PLACEHOLDER_IMAGE_URI_DEFAULT: "235494813897.dkr.ecr.us-east-1.amazonaws.com/aws-lambda/nullplatform-lambda-placeholder:latest" + + # ── Assume Role ──────────────────────────────────────────────────────────── + # IAM role ARN to assume before any AWS operation. + # Resolution precedence (see utils/fetch_scope_configuration): + # 1. ASSUME_ROLE_ARN env var (explicit override) + # 2. "AWS IAM" provider (Identity & Access Control, spec aws-iam-configuration) + # at the account NRN, matched by selector. The selector defaults to + # "lambda"; override it with ASSUME_ROLE_SELECTOR if the provider's arns + # list uses a different key. + # 3. scope-configurations provider key assume_role.arn (back-compat) + # 4. ASSUME_ROLE_ARN_DEFAULT env var (set per-account on the agent) + # All account-specific, so none are set here — provide them via the IAM + # provider, the scope-config, or the agent's extra_envs (Helm). If nothing + # resolves, the agent's own pod credentials (IRSA) are used. + # ASSUME_ROLE_SELECTOR: "lambda" + # ── IAM ──────────────────────────────────────────────────────────────────── IAM_PROPAGATION_WAIT_SECONDS: 20