feat(ci): add hourly canary for smoke test by Hweinstock · Pull Request #1486 · aws/agentcore-cli

Hweinstock · 2026-06-08T23:25:07Z

Description

Add smoke tests once an hour that create an issue on failure. The issue will notify the team via the slack integration.

centralize issue creation logic into a script.
add script to tsbuild ignore and eslint ignore.
run across two dimensions: prerelease (latest main push) and released for both GA and preview.

Related Issue

#1485

Closes #

Documentation PR

Type of Change

Testing

How have you tested the change?

I ran npm run test:unit and npm run test:integ
I ran npm run typecheck
I ran npm run lint
If I modified src/assets/, I ran npm run test:update-snapshots and committed the updated snapshots

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.

agentcore-devx-automation · 2026-06-08T23:25:37Z

Claude Security Review: no high-confidence findings. (run)

github-actions · 2026-06-08T23:25:55Z

Package Tarball

aws-agentcore-0.18.0.tgz

How to install

gh release download pr-1486-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.18.0.tgz

agentcore-cli-automation

The CI plumbing looks solid overall — the script extraction is a nice cleanup, and the matrix covers the install vectors I'd expect. I have one concern about noise that's worth thinking through before merging; everything else is minor.

agentcore-cli-automation · 2026-06-08T23:30:30Z

+          OPENAI_API_KEY: ${{ env.E2E_OPENAI_API_KEY }}
+          GEMINI_API_KEY: ${{ env.E2E_GEMINI_API_KEY }}
+        # Smoke test for now, only runs strands-bedrock.test.ts
+        run: npx vitest run --project e2e e2e-tests/strands-bedrock.test.ts


Hourly canary + no retry will create noisy high-severity GitHub issues on transient failures.

This job runs an e2e flow (create → deploy → invoke → teardown against real AWS) hourly across 4 matrix cells. Real e2e tests are inherently flaky — AWS API throttles, transient CFN/Lambda hiccups, network blips, occasional model-provider 5xx, etc. — and the very first such flake will:

Open a high-severity issue with title Canary Failure: <variant>/<build>

Notify the team via the Slack integration

Stay open until someone manually closes it (which then re-arms the alert for the next flake)

There's no retry around npx vitest run … and vitest doesn't retry by default, so a single transient failure → a real page. At hourly cadence with 4 variants, that's almost certainly going to generate a steady stream of false-positive issues that erode signal.

A few options to consider (any one of these would help):

Add retry: 2 (or similar) at the vitest level for canary runs, e.g. via a vitest.canary.config.ts or an inline --retry flag, so the smoke test only fails after multiple consecutive attempts.

Wrap the smoke-test step in a small bash/actions/retry loop that re-runs the whole npm install -g … && vitest … flow on failure before falling through to the issue-creation step.

Require N consecutive failed scheduled runs before opening an issue (e.g., write a marker to repo state / artifacts and only open the issue on the 2nd or 3rd consecutive failure).

Drop the severity to something less alarming (e.g., a separate canary label without high-severity) until you've calibrated the noise level for a week or two.

Happy with any of these — just want to avoid waking people up for transient AWS hiccups before the canary has earned its alert budget.

Good call, added 3 retries via CLI.

agentcore-devx-automation · 2026-06-09T00:13:18Z

Claude Security Review: no high-confidence findings. (run)

agentcore-devx-automation · 2026-06-09T13:31:18Z

Claude Security Review: no high-confidence findings. (run)

aidandaly24

One comment that can be a follow up + a question.

aidandaly24 · 2026-06-09T14:00:30Z

        description: 'AWS region for deployment'
        default: 'us-east-1'
-  schedule:
-    - cron: '0 14 * * 1' # Every Monday at 9 AM EST (14:00 UTC)


This removes the Monday cron from e2e-full-test which in my opinion shrinks our coverage pretty significantly. The new canary replaces it with only strands-bedrock.test.ts (1 of 30 e2e suites). For catching CDK schema drift this is fine. But a regression specific to, say, LangGraph+Gemini or container builds is now only caught on push-to-main, not on a schedule. Is this intentional?

Yes, intentional for a few reasons:

the scheduled monday cron job didn't have a notification mechanism, so we didn't know when it failed.

running the entire test suite means we need to reduce the frequency we run it. This change proposed here is to lean towards frequent smoke tests instead of full e2e tests to justify a high frequency.

full e2e tests are already run on every commit to main, which IMO is where we're more likely to see regressions.

aidandaly24 · 2026-06-09T14:11:34Z

    beforeAll(async () => {
      if (!canRun) return;

      await cleanupStaleCredentialProviders();

      testDir = join(tmpdir(), `agentcore-e2e-${randomUUID()}`);
      await mkdir(testDir, { recursive: true });

-      agentName = `E2e${cfg.framework.slice(0, 4)}${cfg.modelProvider.slice(0, 4)}${String(Date.now()).slice(-8)}`;
+      agentName = `E2e${cfg.framework.slice(0, 4)}${cfg.modelProvider.slice(0, 4)}${randomUUID().replace(/-/g, '').slice(0, 8)}`;


beforeAll calls cleanupStaleCredentialProviders(), which sweeps orphaned E2e* credential providers from runs that died before afterAll teardown. But there's no equivalent janitor for the CloudFormation stack itself.

In the existing e2e flows this is low-risk — they run on push/PR, so a leaked stack is rare and gets noticed. The canary changes that calculus: at 4 matrix cells × hourly = ~96 deploy/teardown cycles/day, a teardown that crashes between deploy and afterAll (timeout, runner eviction, AWS flake mid-tear-down) leaks an AgentCore-E2e…-default stack with nothing to reclaim it. A single persistent teardown bug could pile up ~96 stacks/day, and each carries real resources (runtime, IAM roles, KMS key from bootstrap) — so this compounds into cost and CFN/runtime quota exhaustion before anyone sees the Slack issue.

Suggest adding a beforeAll stack sweep that mirrors the credential-provider janitor — delete AgentCore-E2e* stacks older than the same cutoff.

This can also be a follow up, should be non-blocking.

Good call, I think this is something we should do since I see 1000+ stacks siting in the CI account.

added an issue for this: #1493

feat(ci): add hourly canary for smoke test

c647e74

github-actions Bot added the size/m PR size: M label Jun 8, 2026

Hweinstock had a problem deploying to e2e-testing June 8, 2026 23:25 — with GitHub Actions Error

agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 8, 2026

github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 8, 2026

agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 8, 2026

agentcore-cli-automation reviewed Jun 8, 2026

View reviewed changes

github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 8, 2026

fix: add retries to canaries

acb6eed

github-actions Bot added size/m PR size: M and removed size/m PR size: M labels Jun 9, 2026

Hweinstock had a problem deploying to e2e-testing June 9, 2026 00:12 — with GitHub Actions Error

agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 9, 2026

agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 9, 2026

chore(ci): swap arg to args

ec3d457

Hweinstock marked this pull request as ready for review June 9, 2026 13:24

Hweinstock requested a review from a team June 9, 2026 13:24

github-actions Bot added size/m PR size: M and removed size/m PR size: M labels Jun 9, 2026

Hweinstock had a problem deploying to e2e-testing June 9, 2026 13:30 — with GitHub Actions Failure

agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 9, 2026

agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 9, 2026

aidandaly24 reviewed Jun 9, 2026

View reviewed changes

aidandaly24 approved these changes Jun 9, 2026

View reviewed changes

Hweinstock merged commit 6c57e78 into aws:main Jun 10, 2026
30 of 31 checks passed

Hweinstock deleted the canary-smoke-tests branch June 10, 2026 13:30

Conversation

Hweinstock commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

agentcore-devx-automation Bot commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Package Tarball

How to install

Uh oh!

agentcore-cli-automation left a comment

Choose a reason for hiding this comment

Uh oh!

agentcore-cli-automation Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Hweinstock Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

agentcore-devx-automation Bot commented Jun 9, 2026

Uh oh!

agentcore-devx-automation Bot commented Jun 9, 2026

Uh oh!

aidandaly24 left a comment

Choose a reason for hiding this comment

Uh oh!

aidandaly24 Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Hweinstock Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

aidandaly24 Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Hweinstock Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Hweinstock Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Hweinstock commented Jun 8, 2026 •

edited

Loading