Skip to content

fix: surface stale CDK construct synth errors and validate gateway name length#1652

Open
aidandaly24 wants to merge 1 commit into
mainfrom
fix/gateway-stale-cdk-construct-and-name-length
Open

fix: surface stale CDK construct synth errors and validate gateway name length#1652
aidandaly24 wants to merge 1 commit into
mainfrom
fix/gateway-stale-cdk-construct-and-name-length

Conversation

@aidandaly24

Copy link
Copy Markdown
Contributor

Description

Adding a gateway then deploying could fail two opaque ways. This PR turns both into clear, actionable errors raised before AWS/CDK ever sees them.

1. Stale bundled CDK construct → baffling synth error. A project created with an older CLI pins @aws/agentcore-cdk in agentcore/cdk/node_modules via a create-time lockfile. The deploy flow only runs npm run build (tsc) and never refreshes that dependency. Once the CLI is upgraded to a version that writes protocolType into the gateway config, the frozen older construct's strict schema rejects the key at synth with unknown keys (remove): "protocolType" — with no hint that the cause is a stale bundled construct.

synthesizeCdk now detects an unknown keys (remove) synth failure and rewrites it to a StaleCdkConstructError that names the rejected key(s), reports the installed construct version, and tells the user to run npm update @aws/agentcore-cdk in agentcore/cdk/. This is provable, not presumptuous: the CLI strict-validates agentcore.json against its own schema in preflight before synth, so a genuine bad/typo'd key fails earlier — any unknown keys failure reaching synth is necessarily version skew. The original synth error is preserved as { cause } (still visible via AGENTCORE_DEBUG=1), and unrelated synth errors pass through untouched.

2. Gateway name length only validated by AWS at deploy. The deployed gateway resource name is composed as ${projectName}-${gatewayName} and AWS caps it at 48 chars (([0-9a-zA-Z][-]?){1,48}). A too-long name passed add gateway and synth, then failed mid-deploy with an opaque CloudFormation CREATE_FAILED. The composed name is now validated against the limit at add gateway (fails the moment the command is run) and in deploy preflight via validateGatewayNames (backstop for imported/hand-edited configs), mirroring the existing validateRuntimeNames. Imported gateways carrying an explicit resourceName are skipped.

No changes to @aws/agentcore-cdk or src/assets/ are required — the latest construct already supports protocolType; the bug is purely that deploy never refreshes the frozen install.

Related Issue

Closes #1651

Documentation PR

N/A — no user-facing docs change; both are clearer error messages on existing flows.

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Other (please describe):

Testing

How have you tested the change?

  • I ran npm run test:unit and npm run test:integ
  • I ran npm run typecheck
  • I ran npm run lint
  • If I modified src/assets/, I ran npm run test:update-snapshots and committed the updated snapshots (N/A — no src/assets/ changes)

Added 12 tests:

  • extractUnknownKeys / rewriteIfStaleCdkConstruct — single/multiple keys, nested cause chain (the real propagation shape), unrelated errors pass through, non-Error normalization.
  • validateGatewayNames via validateProject — over-limit rejection (the reporter's exact 51-char name), exactly-at-limit acceptance, imported-gateway skip.
  • add gateway e2e — composed name over 48 chars is rejected with the expected message.

Also reproduced Fix B end-to-end with the reporter's exact inputs:

$ agentcore add gateway --name ZeroTrustTodayAgentCoreGateway ... (project ZeroTrustTodayACProj)
{"success":false,"error":"Gateway name too long: \"ZeroTrustTodayACProj-ZeroTrustTodayAgentCoreGateway\" (51 chars). AWS limits gateway names to 48 characters. ..."}

Note on Fix A test coverage: a pure downgrade-only live repro isn't possible from the registry — every published construct old enough to reject gateway protocolType (≤ alpha.36) is too old to compile against the current vended template (missing HarnessSpecSchema), and every version that compiles (alpha.38+) already accepts protocolType. Fix A is covered by unit tests feeding the exact construct error string from the reporter's log plus verification that it surfaces through the full deploy → display path; the original is preserved as the error cause.

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly (N/A)
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…me length

Adding a gateway then deploying could fail two opaque ways:

1. CDK synth rejected `unknown keys (remove): "protocolType"` when the
   project's bundled @aws/agentcore-cdk (frozen in node_modules by a
   create-time lockfile) was older than the CLI, which now writes
   protocolType into the gateway config. The deploy flow only runs
   `npm run build`, never refreshing the dependency, so the error gave
   no hint that the bundled construct was stale.

2. Gateway names were only validated by AWS at deploy, failing mid-deploy
   with an opaque CloudFormation CREATE_FAILED when the composed
   `${projectName}-${gatewayName}` exceeded the 48-char service limit.

Fix A: in synthesizeCdk, detect an "unknown keys (remove)" synth failure
and rewrite it to a StaleCdkConstructError that names the rejected keys,
reports the installed construct version, and tells the user to run
`npm update @aws/agentcore-cdk`. This is provable rather than presumptuous:
the CLI strict-validates agentcore.json against its own schema in preflight
before synth, so a genuine bad key fails earlier — any unknown-keys failure
reaching synth is necessarily version skew. The original error is preserved
as the cause (visible via AGENTCORE_DEBUG). Unrelated errors pass through.

Fix B: validate the composed gateway name against the 48-char limit at
`add gateway` (fails immediately) and in deploy preflight via
validateGatewayNames (backstop for imported/hand-edited configs),
mirroring the existing validateRuntimeNames. Imported gateways with an
explicit resourceName are skipped.

Closes #1651
@aidandaly24 aidandaly24 requested a review from a team June 26, 2026 19:42
@github-actions github-actions Bot added the size/m PR size: M label Jun 26, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 26, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 26, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 26, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Package Tarball

aws-agentcore-0.21.1.tgz

How to install

gh release download pr-1652-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.21.1.tgz

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Two small, focused fixes that turn previously-opaque deploy failures into clear, actionable errors.

Things I checked:

  • Version-skew rewrite is provably narrow. AgentCoreGatewaySchema is .strict() and is parsed by the CLI in preflight before synth, so any unknown keys (remove) reaching synth necessarily implies the bundled construct's parser is stricter (i.e. older) than the CLI's — the comment in rewriteIfStaleCdkConstruct accurately captures this.
  • Regex against the construct's actual output. UNKNOWN_KEYS_PATTERN matches the exact format produced by @aws/agentcore-cdk src/lib/errors/config.ts (${path}: unknown keys (remove): "a", "b"). The lazy .+? with \n|$ boundary correctly captures the full key list on one line and stops at the next line (handles the Caused by: wrapping from formatError).
  • Error preservation. Original synth error is kept as { cause } and remains visible with AGENTCORE_DEBUG=1 via formatError's cause chain. Non-Error throws are normalized.
  • Telemetry instrumentation. StaleCdkConstructError is added to the ErrorName enum in common-shapes.ts, so classifyError will emit it via cli.command_run rather than rolling up to UnknownError. ✅
  • Gateway length check placement. validateGatewayNames mirrors validateRuntimeNames, runs in deploy preflight as a backstop, and createGatewayFromWizard fails the moment the user runs add gateway. resourceName (imported gateways) is correctly skipped. The new add gateway failure path returns through add()'s toError(err) and surfaces as JSON success: false with the expected message — verified against the e2e test.
  • Tests follow established patterns. The new preflight tests use the same hoisted mocks the existing suite already established (mocks at the I/O boundary — ConfigIO, AWS credentials, LocalCdkProject — not at every fs call). extractUnknownKeys / rewriteIfStaleCdkConstruct are tested with pure functions and real Error instances (including the realistic nested cause shape). The add gateway e2e test uses a real temp project via runCLI. No excessive mocking.
  • MAX_RUNTIME_NAME_LENGTH was promoted from a file-local const to constants.ts alongside the new MAX_GATEWAY_NAME_LENGTH. Value/behavior preserved.

No blocking issues.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 26, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 37.66% 13875 / 36835
🔵 Statements 36.93% 14756 / 39950
🔵 Functions 32.22% 2375 / 7370
🔵 Branches 31.6% 9226 / 29187
Generated in workflow #3868 for commit 6f22995 by the Vitest Coverage Report Action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/m PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gateway add: stale bundled CDK construct synth failure + late gateway-name-length validation

2 participants