Skip to content

feat(ci): add hourly canary for smoke test#1486

Merged
Hweinstock merged 3 commits into
aws:mainfrom
Hweinstock:canary-smoke-tests
Jun 10, 2026
Merged

feat(ci): add hourly canary for smoke test#1486
Hweinstock merged 3 commits into
aws:mainfrom
Hweinstock:canary-smoke-tests

Conversation

@Hweinstock

@Hweinstock Hweinstock commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Description

Add smoke tests once an hour that create an issue on failure. The issue will notify the team via the slack integration.

  • centralize issue creation logic into a script.
  • add script to tsbuild ignore and eslint ignore.
  • run across two dimensions: prerelease (latest main push) and released for both GA and preview.

Related Issue

#1485

Closes #

Documentation PR

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Other (please describe):

Testing

How have you tested the change?

  • I ran npm run test:unit and npm run test:integ
  • I ran npm run typecheck
  • I ran npm run lint
  • If I modified src/assets/, I ran npm run test:update-snapshots and committed the updated snapshots

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.

@github-actions github-actions Bot added the size/m PR size: M label Jun 8, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 8, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 8, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 8, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Package Tarball

aws-agentcore-0.18.0.tgz

How to install

gh release download pr-1486-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.18.0.tgz

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI plumbing looks solid overall — the script extraction is a nice cleanup, and the matrix covers the install vectors I'd expect. I have one concern about noise that's worth thinking through before merging; everything else is minor.

Comment thread .github/workflows/canary.yml Outdated
OPENAI_API_KEY: ${{ env.E2E_OPENAI_API_KEY }}
GEMINI_API_KEY: ${{ env.E2E_GEMINI_API_KEY }}
# Smoke test for now, only runs strands-bedrock.test.ts
run: npx vitest run --project e2e e2e-tests/strands-bedrock.test.ts

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hourly canary + no retry will create noisy high-severity GitHub issues on transient failures.

This job runs an e2e flow (create → deploy → invoke → teardown against real AWS) hourly across 4 matrix cells. Real e2e tests are inherently flaky — AWS API throttles, transient CFN/Lambda hiccups, network blips, occasional model-provider 5xx, etc. — and the very first such flake will:

  1. Open a high-severity issue with title Canary Failure: <variant>/<build>
  2. Notify the team via the Slack integration
  3. Stay open until someone manually closes it (which then re-arms the alert for the next flake)

There's no retry around npx vitest run … and vitest doesn't retry by default, so a single transient failure → a real page. At hourly cadence with 4 variants, that's almost certainly going to generate a steady stream of false-positive issues that erode signal.

A few options to consider (any one of these would help):

  • Add retry: 2 (or similar) at the vitest level for canary runs, e.g. via a vitest.canary.config.ts or an inline --retry flag, so the smoke test only fails after multiple consecutive attempts.
  • Wrap the smoke-test step in a small bash/actions/retry loop that re-runs the whole npm install -g … && vitest … flow on failure before falling through to the issue-creation step.
  • Require N consecutive failed scheduled runs before opening an issue (e.g., write a marker to repo state / artifacts and only open the issue on the 2nd or 3rd consecutive failure).
  • Drop the severity to something less alarming (e.g., a separate canary label without high-severity) until you've calibrated the noise level for a week or two.

Happy with any of these — just want to avoid waking people up for transient AWS hiccups before the canary has earned its alert budget.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, added 3 retries via CLI.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 8, 2026
@github-actions github-actions Bot added size/m PR size: M and removed size/m PR size: M labels Jun 9, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 9, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 9, 2026
@Hweinstock Hweinstock marked this pull request as ready for review June 9, 2026 13:24
@Hweinstock Hweinstock requested a review from a team June 9, 2026 13:24
@github-actions github-actions Bot added size/m PR size: M and removed size/m PR size: M labels Jun 9, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 9, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 9, 2026

@aidandaly24 aidandaly24 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment that can be a follow up + a question.

description: 'AWS region for deployment'
default: 'us-east-1'
schedule:
- cron: '0 14 * * 1' # Every Monday at 9 AM EST (14:00 UTC)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes the Monday cron from e2e-full-test which in my opinion shrinks our coverage pretty significantly. The new canary replaces it with only strands-bedrock.test.ts (1 of 30 e2e suites). For catching CDK schema drift this is fine. But a regression specific to, say, LangGraph+Gemini or container builds is now only caught on push-to-main, not on a schedule. Is this intentional?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, intentional for a few reasons:

  • the scheduled monday cron job didn't have a notification mechanism, so we didn't know when it failed.
  • running the entire test suite means we need to reduce the frequency we run it. This change proposed here is to lean towards frequent smoke tests instead of full e2e tests to justify a high frequency.
  • full e2e tests are already run on every commit to main, which IMO is where we're more likely to see regressions.

Comment thread e2e-tests/e2e-helper.ts
Comment on lines 71 to +79
beforeAll(async () => {
if (!canRun) return;

await cleanupStaleCredentialProviders();

testDir = join(tmpdir(), `agentcore-e2e-${randomUUID()}`);
await mkdir(testDir, { recursive: true });

agentName = `E2e${cfg.framework.slice(0, 4)}${cfg.modelProvider.slice(0, 4)}${String(Date.now()).slice(-8)}`;
agentName = `E2e${cfg.framework.slice(0, 4)}${cfg.modelProvider.slice(0, 4)}${randomUUID().replace(/-/g, '').slice(0, 8)}`;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beforeAll calls cleanupStaleCredentialProviders(), which sweeps orphaned E2e* credential providers from runs that died before afterAll teardown. But there's no equivalent janitor for the CloudFormation stack itself.

In the existing e2e flows this is low-risk — they run on push/PR, so a leaked stack is rare and gets noticed. The canary changes that calculus: at 4 matrix cells × hourly = ~96 deploy/teardown cycles/day, a teardown that crashes between deploy and afterAll (timeout, runner eviction, AWS flake mid-tear-down) leaks an AgentCore-E2e…-default stack with nothing to reclaim it. A single persistent teardown bug could pile up ~96 stacks/day, and each carries real resources (runtime, IAM roles, KMS key from bootstrap) — so this compounds into cost and CFN/runtime quota exhaustion before anyone sees the Slack issue.

Suggest adding a beforeAll stack sweep that mirrors the credential-provider janitor — delete AgentCore-E2e* stacks older than the same cutoff.

This can also be a follow up, should be non-blocking.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I think this is something we should do since I see 1000+ stacks siting in the CI account.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added an issue for this: #1493

@Hweinstock Hweinstock merged commit 6c57e78 into aws:main Jun 10, 2026
30 of 31 checks passed
@Hweinstock Hweinstock deleted the canary-smoke-tests branch June 10, 2026 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/m PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants