awslabs · zxkane · Feb 15, 2026 · Feb 15, 2026 · Feb 15, 2026 · Feb 15, 2026
@@ -11,7 +11,7 @@
   "plugins": [
     {
       "category": "deployment",
-      "description": "Deploy applications to AWS with architecture recommendations, cost estimates, and IaC deployment.",
+      "description": "Deploy applications to AWS with architecture recommendations, cost estimates, CDK best practices, monitoring setup, and IaC deployment.",
       "keywords": [
         "aws",
         "aws agent skills",
@@ -20,17 +20,20 @@
         "cdk",
         "cloudformation",
         "infrastructure",
-        "pricing"
+        "pricing",
+        "monitoring",
+        "cloudwatch"
       ],
       "name": "deploy-on-aws",
       "source": "./plugins/deploy-on-aws",
       "tags": [
         "aws",
         "deploy",
         "infrastructure",
-        "cdk"
+        "cdk",
+        "monitoring"
       ],
-      "version": "1.0.0"
+      "version": "1.1.0"
     }
   ]
 }
@@ -2,18 +2,20 @@
   "author": {
     "name": "Amazon Web Services"
   },
-  "description": "Deploy applications to AWS with architecture recommendations, cost estimates, and IaC deployment.",
+  "description": "Deploy applications to AWS with architecture recommendations, cost estimates, CDK best practices, monitoring setup, and IaC deployment.",
   "homepage": "https://github.com/awslabs/agent-plugins",
   "keywords": [
     "aws",
     "deploy",
     "infrastructure",
     "cdk",
     "cloudformation",
-    "pricing"
+    "pricing",
+    "monitoring",
+    "cloudwatch"
   ],
   "license": "Apache-2.0",
   "name": "deploy-on-aws",
   "repository": "https://github.com/awslabs/agent-plugins",
-  "version": "1.0.0"
+  "version": "1.1.0"
 }
@@ -17,14 +17,19 @@ straightforward services. Don't ask questions with obvious answers.
 1. **Analyze** - Scan codebase for framework, database, dependencies
 2. **Recommend** - Select AWS services, concisely explain rationale
 3. **Estimate** - Show monthly cost before proceeding
-4. **Generate** - Write IaC code with [security defaults](references/security.md) applied
-5. **Deploy** - Run security checks, then execute with user confirmation
+4. **Generate** - Write IaC code following [CDK best practices](references/cdk-best-practices.md)
+   with [security defaults](references/security.md) applied
+5. **Validate** - Run synthesis, security scans, and
+   [validation script](scripts/validate-stack.sh)
+6. **Deploy** - Execute with user confirmation
+7. **Monitor** - Set up [monitoring](references/monitoring.md) for deployed resources
 
 ## Defaults
 
 See [defaults.md](references/defaults.md) for the complete service selection matrix.
 
-Core principle: Default to **dev-sized** (cost-conscious: small instance sizes, minimal redundancy, and non-HA/single-AZ defaults) unless user says "production-ready".
+Core principle: Default to **dev-sized** (cost-conscious: small instance sizes, minimal
+redundancy, and non-HA/single-AZ defaults) unless user says "production-ready".
 
 ## MCP Servers
 
@@ -48,17 +53,76 @@ for query patterns.
 Consult for IaC best practices. Use when writing CDK/CloudFormation/Terraform
 to ensure patterns follow AWS recommendations.
 
+## CDK Best Practices
+
+When generating IaC (default: CDK TypeScript), follow these rules:
+
+- **No explicit resource names** — let CDK generate unique names
+- **Use grant methods** for IAM — `table.grantReadWriteData(fn)` not raw policies
+- **Use language-specific Lambda constructs** — `NodejsFunction`, `PythonFunction`
+- **Prefer L2/L3 constructs** over L1 (`CfnXxx`)
+- **Add cdk-nag** for automated best-practice validation
+
+See [cdk-best-practices.md](references/cdk-best-practices.md) for patterns and examples.
+
+## Pre-Deployment Validation
+
+Before deploying, run these checks in order:
+
+1. Build — ensure compilation succeeds
+2. Tests — run existing test suite
+3. `cdk synth` — validate synthesis (with cdk-nag if configured)
+4. Security scan — `checkov` or `cfn-nag` on generated templates
+5. Secret detection — scan for hardcoded credentials
+
+Use [validate-stack.sh](scripts/validate-stack.sh) to automate synthesis validation
+and template analysis (steps 3). Run `checkov` or `cfn-nag` separately for step 4.
+
+## Error Handling
+
+### Validation Failures
+
+If `cdk synth` or validation script fails:
+
+- Show the error output to the user
+- Identify and fix the issue in generated code
+- Re-run validation before proceeding to deploy
+- DO NOT deploy with failing validation
+
+### Deployment Failures
+
+If `cdk deploy` fails:
+
+- Show the CloudFormation error event
+- Suggest fix based on error type
+- Stack will auto-rollback — no manual cleanup needed
+
+## Post-Deployment Monitoring
+
+After successful deployment, set up monitoring appropriate to the environment:
+
+- **Dev**: Basic error alerting (Lambda errors, Fargate task failures)
+- **Production**: Full observability (alarms, dashboards, structured logging)
+
+See [monitoring.md](references/monitoring.md) for CloudWatch alarm patterns by service.
+
 ## Principles
 
 - Concisely explain why each service was chosen
 - Always show cost estimate before generating code
-- Apply [security defaults](references/security.md) automatically (encryption, private subnets, least privilege)
+- Apply [security defaults](references/security.md) automatically (encryption,
+  private subnets, least privilege)
+- Follow [CDK best practices](references/cdk-best-practices.md) when generating IaC
 - Run IaC security scans (cfn-nag, checkov) before deployment
-- Don't ask "Lambda or Fargate?" - just pick the obvious one
+- Set up [monitoring](references/monitoring.md) after deployment
+- Don't ask "Lambda or Fargate?" — just pick the obvious one
 - If genuinely ambiguous, then ask
 
 ## References
 
 - [Service defaults](references/defaults.md)
 - [Security defaults](references/security.md)
 - [Cost estimation patterns](references/cost-estimation.md)
+- [CDK best practices](references/cdk-best-practices.md)
+- [Monitoring and observability](references/monitoring.md)
+- [Validation script](scripts/validate-stack.sh)
@@ -0,0 +1,63 @@
+# CDK Best Practices
+
+Patterns for generating CDK IaC in the deploy workflow.
+
+## Resource Naming
+
+**DO NOT** explicitly specify resource names. Let CDK generate unique names:
+
+```typescript
+// ✅ Let CDK generate: StackName-MyFunctionXXXXXX
+new lambda.Function(this, 'MyFunction', { /* no functionName */ });
+```
+
+**Why**: Enables reusable patterns, parallel deployments, and stack isolation.
+
+## Lambda Constructs
+
+Use language-specific constructs for automatic bundling:
+
+- **TypeScript**: `NodejsFunction` from `aws-cdk-lib/aws-lambda-nodejs`
+- **Python**: `PythonFunction` from `@aws-cdk/aws-lambda-python-alpha`
+
+Benefits: Automatic dependency resolution, transpilation, and packaging.
+
+## IAM Permissions
+
+Use grant methods instead of raw policies:
+
+```typescript
+table.grantReadWriteData(handler);  // ✅
+// NOT: handler.addToRolePolicy({ actions: ['dynamodb:*'], resources: ['*'] })
+```
+
+## Construct Levels
+
+Prefer L3 (`LambdaRestApi`) > L2 (`Function`) > L1 (`CfnFunction`).
+
+## Validation
+
+1. Add **cdk-nag** for automated best-practice checks during synthesis
+2. Run `cdk synth` to validate
+3. Suppress findings with documented reasons via `NagSuppressions`
+
+## Testing
+
+- **Snapshot tests**: `expect(template.toJSON()).toMatchSnapshot()`
+- **Assertions**: `template.hasResourceProperties('AWS::Lambda::Function', { ... })`
+
+## Stack Organization
+
+- Split at ~200 resources per stack
+- Separate stateful (DB, S3) from stateless (compute) resources
+- Export values via `CfnOutput` for cross-stack references
+
+## Anti-Patterns
+
+| Anti-Pattern                  | Fix                                     |
+| ----------------------------- | --------------------------------------- |
+| Hardcoded resource names      | Let CDK generate names                  |
+| `actions: ['*']` in IAM       | Use grant methods                       |
+| Manual Lambda bundling        | Use `NodejsFunction` / `PythonFunction` |
+| Missing environment variables | Pass via `environment` prop             |
+| No stack outputs              | Add `CfnOutput` for API URLs, ARNs      |
@@ -0,0 +1,69 @@
+# Monitoring and Observability
+
+Post-deployment monitoring patterns. Set up after successful deployment.
+
+## When to Add Monitoring
+
+- **Always**: Error alerting for deployed compute (Fargate, Lambda)
+- **Production**: Full observability (alarms + dashboards + logs)
+- **Dev**: Basic error alerting only
+
+## Lambda Alarms
+
+| Metric          | Threshold      | Periods |
+| --------------- | -------------- | ------- |
+| Errors (Sum)    | 10 per 5 min   | 1       |
+| Duration (Max)  | 80% of timeout | 2       |
+| Throttles (Sum) | 5 per 5 min    | 1       |
+
+## ECS/Fargate Alarms
+
+| Metric                 | Threshold     | Periods |
+| ---------------------- | ------------- | ------- |
+| CPU Utilization        | 80%           | 3       |
+| Memory Utilization     | 85%           | 2       |
+| Running Task Count < 1 | 1 (less-than) | 2       |
+
+## ALB Alarms
+
+| Metric               | Threshold    | Periods |
+| -------------------- | ------------ | ------- |
+| 5XX Error Count      | 10 per 5 min | 1       |
+| Unhealthy Host Count | 1            | 2       |
+| Response Time p99    | 1 second     | 2       |
+
+## RDS/Aurora Alarms
+
+| Metric               | Threshold  | Periods |
+| -------------------- | ---------- | ------- |
+| CPU Utilization      | 80%        | 3       |
+| Free Storage Space   | < 10 GB    | 1       |
+| Database Connections | 80% of max | 2       |
+
+## Alarm Notification
+
+Use SNS topic with email subscription for alarm actions:
+
+```typescript
+const topic = new sns.Topic(this, 'AlarmTopic');
+topic.addSubscription(new subscriptions.EmailSubscription('ops@example.com'));
+alarm.addAlarmAction(new actions.SnsAction(topic));
+```
+
+## Threshold Guidelines
+
+| Category    | Warning      | Critical    |
+| ----------- | ------------ | ----------- |
+| CPU/Memory  | 70-80%       | 80-90%      |
+| Error rate  | Based on SLA | 2× warning  |
+| Latency p99 | 80% of SLA   | 100% of SLA |
+| Storage     | 70% used     | 85% used    |
+
+## Production Dashboard
+
+Include these widget groups:
+
+1. **Service Overview**: Request rate, error %, latency (p50/p95/p99)
+2. **Resource Utilization**: CPU, memory, network by service
+3. **Cost Metrics**: Daily spend, month-to-date
+4. **Errors**: Error counts by type, recent logs