Project: AWS Lambda ARM vs x86 Benchmark
This document tracks all significant architectural and technical decisions that affect the codebase structure, implementation approach, and benchmark methodology.
- D001: Project Scope - Comprehensive ARM vs x86 benchmark
- D002: Infrastructure Tool - AWS CDK (TypeScript)
- D003: Workload Types - 3 single-threaded synthetic workloads
- D004: Testing Strategy - Forced cold starts + warm start testing
- D005: Data Storage - DynamoDB + CloudWatch
- D006: Analysis Tools - Python (pandas, matplotlib, scipy)
- D007: Security - CDK Nag for automated checks
- D008: Runtime Versions - Python 3.13/3.12/3.11, Node.js 22/20, Rust
- D009: [CRITICAL] Zero-Overhead Data Collection - CloudWatch REPORT parsing
- D010: Node.js Language - TypeScript with esbuild
- D011: [CRITICAL] AWS SDK Strategy - No SDK for CPU/Memory, runtime SDK for Light workload
- D012: Testing Strategy - No unit tests for benchmark code
- D015: [CRITICAL] Optimized Deployment Strategy - Dynamic memory configuration at runtime
- D016: [SUPERSEDED] Graduated Memory Allocation - Replaced by fixed 100 MB array (see D018)
- D017: Rust Runtime Support - Add Rust via cargo-lambda-cdk construct
- D018: [CRITICAL] Fixed Memory Workload - Memory-intensive uses constant 100 MB array
Date: 2025-10-25 | Status: Approved | Updated: 2025-11-14
Decision: Benchmark AWS Lambda ARM (Graviton) vs x86 performance across multiple runtimes, architectures, memory configurations, and workload types.
Rationale: Updates 2023 AWS blog measurements with current runtime versions.
Date: 2025-10-25 | Status: Approved
Decision: Use AWS CDK (TypeScript) for all infrastructure.
Rationale: CDK provides type safety, reusable constructs, and programmatic generation of resources.
Date: 2025-10-25 | Status: Approved | Updated: 2025-11-07
Decision: Implement 3 single-threaded synthetic workloads:
- CPU-Intensive - SHA-256 hashing loop (pure compute, no I/O)
- Memory-Intensive - Large array generation and sorting (tests memory scaling)
- Light - DynamoDB batch write (5 items) + batch read (I/O-bound baseline)
Rationale: Single-threaded workloads provide clearest architecture comparison without multi-threading complexity. Covers key performance dimensions: CPU-bound, memory-bound, and I/O-bound operations.
Related Files:
cdk/lib/config/lambda-config.ts- Workload definitions and memory configslambdas/python/cpu-intensive/handler.py- CPU workload (NO SDK)lambdas/python/memory-intensive/handler.py- Memory workload (NO SDK)lambdas/python/light/handler.py- I/O workload (uses boto3)lambdas/nodejs/{workload}/handler.ts- Node.js equivalents
Date: 2025-10-25 | Status: Approved | Updated: 2025-11-14
Decision: Use forced cold starts via UpdateFunctionConfiguration API instead of waiting 15+ minutes between invocations.
Implementation:
- Forced Cold Start: Update function config with env var → wait for update → invoke
- Warm Start: Sequential invocations (function stays warm)
- Time Savings: Reduces full benchmark from 60-120 hours to ~2-2.5 hours
Rationale: Waiting 15+ minutes per cold start is impractical. Configuration update forces Lambda to reinitialize the execution environment.
Credit: Technique from AJ Stuyvenberg's cold-start-benchmarker
Related Files:
scripts/benchmark_orchestrator.py- Implements forced cold start logic
Date: 2025-10-25 | Status: Approved | Updated: 2025-11-14
Decision:
- DynamoDB for results storage (three entity types:
result,aggregate,test-run) - CloudWatch Logs with 3-day retention
- No S3 backup
Rationale: DynamoDB pay-per-request is cost-effective for sporadic usage. Pre-calculated aggregates avoid expensive scans of raw results.
Related Files:
cdk/lib/constructs/results-table.ts- Table definitiondocs/dynamodb-schema.md- Schemascripts/benchmark_orchestrator.py- Writes results and aggregates
Date: 2025-10-25 | Status: Approved
Decision: Python data science stack (pandas, matplotlib/seaborn, plotly, scipy)
Rationale: Standard tooling for data manipulation and visualization.
Date: 2025-10-25 | Status: Approved
Decision:
- CDK Nag for automated security checks
- IAM least privilege
- Encryption at rest
- No hardcoded secrets
- No VPC
Rationale: Standard AWS security practices.
Date: 2025-10-25 | Status: Approved | Updated: 2025-11-15
Decision:
- Python: 3.13, 3.12, 3.11
- Node.js: 22.x, 20.x
- Rust: provided.al2023 (added via D017)
Research Source: AWS Documentation via MCP aws-docs server
Rationale:
- Python 3.13: Latest runtime, AL2023, deprecation June 2029
- Python 3.12: LTS, AL2023, deprecation October 2028
- Python 3.11: LTS, AL2, deprecation June 2026 (baseline comparison)
- Node.js 22: Latest LTS, AL2023, deprecation April 2027
- Node.js 20: Active LTS, AL2023, deprecation April 2026
Excluded:
- Python 3.10: Redundant with 3.11 (both deprecate June 2026)
- Python 3.9: Deprecating December 2025 (too soon)
- Node.js 18: Already deprecated
Key Finding: All selected runtimes support both ARM64 and x86_64
Date: 2025-10-25 | Status: Approved | Updated: 2025-11-14
Decision: Parse CloudWatch REPORT logs for performance metrics instead of in-function instrumentation.
Alternatives Considered:
- In-function DynamoDB writes: Adds network latency overhead
- Custom timing logic: Adds CPU overhead
- AWS X-Ray: Adds overhead and complexity
- Lambda Telemetry API: Requires extension (adds init overhead)
- CloudWatch REPORT parsing: Selected (no overhead)
Implementation:
- Lambda functions run only the workload (no timing code)
- Orchestrator extracts metrics using
LogType='Tail'parameter - REPORT line parsed with regex to extract: duration, billed duration, memory used, init duration
Metrics Collected:
durationMs- Execution timebilledDurationMs- Rounded for billingmaxMemoryUsedMB- Peak memory usageinitDurationMs- Cold start init time (cold starts only)lambdaRequestId- Request ID
Related Files:
scripts/benchmark_orchestrator.py- Implements LogType='Tail' extractionlambdas/python/cpu-intensive/handler.py- NO SDK imports (pure compute)lambdas/python/memory-intensive/handler.py- NO SDK imports (pure compute)docs/metrics-collection-implementation.md- Detailed REPORT parsing documentation
Date: 2025-10-25 | Status: Approved
Decision: TypeScript for Node.js Lambda functions, bundled with esbuild.
Configuration:
- Target: ES2022
- Bundler: esbuild
- Exclude AWS SDK v3 from bundle (use runtime SDK)
Date: 2025-10-25 | Status: Approved
Decision: No AWS SDK for CPU/Memory workloads. Runtime-provided SDK for Light workload.
Implementation:
- CPU-Intensive: No SDK imports (pure computation)
- Memory-Intensive: No SDK imports (pure memory operations)
- Light: Use runtime-provided SDK (boto3 for Python, exclude @aws-sdk from Node.js bundle)
Rationale: SDK initialization adds overhead that would contaminate CPU/Memory benchmarks.
Reference: SDK overhead data from Aaron Stuyvenberg: https://aaronstuyvenberg.com/posts/aws-sdk-comparison
Related Files:
lambdas/python/cpu-intensive/handler.py- No SDKlambdas/python/memory-intensive/handler.py- No SDKlambdas/python/light/handler.py- boto3 onlylambdas/nodejs/light/handler.ts- @aws-sdk/client-dynamodb only
Date: 2025-11-14 | Status: Approved
Decision: No unit tests for benchmark code.
Rationale: Simple, single-use measurement tool. Correctness verified by running actual benchmarks.
Validation: Run test mode benchmarks to verify infrastructure works correctly.
Date: 2025-10-25 | Status: Approved | Updated: 2025-11-15
Decision: Deploy 36 base Lambda functions and use UpdateFunctionConfiguration API to change memory dynamically during testing.
Approach:
- Deploy base functions (one per runtime/architecture/workload combination)
- Test multiple memory sizes (6-12 per workload) by updating configuration between tests
- Parallelize across all functions
Benefits:
- Small number of deployed functions instead of ~1,000+
- Fast deployment (~5-10 minutes)
- Easy to add/remove memory configurations
Trade-off: ~5-10 seconds overhead per memory configuration update
Related Files:
cdk/lib/config/lambda-config.ts- Defines base function configurationscdk/lib/cdk-stack.ts- Creates functions from configscripts/benchmark_orchestrator.py- Implements dynamic memory updates
Date: 2025-11-14 | Status: SUPERSEDED by D018
Note: This decision has been replaced. See D018 for current approach (fixed 100 MB array).
Decision: Memory-intensive workload uses graduated allocation ratios based on Lambda memory configuration instead of a fixed percentage.
Problem: Initial implementation used a flat 40% ratio for all memory sizes, causing severe performance issues at small Lambda configurations (128-512 MB). Tests at 128 MB and 512 MB were taking 20+ minutes per invocation due to excessive garbage collection and memory swapping.
Implementation:
Graduated allocation strategy in get_memory_intensive_size_mb():
- 128-256 MB: 15% of Lambda memory (conservative to avoid GC thrashing)
- 512 MB: 20% of Lambda memory (still conservative)
- 1024 MB: 30% of Lambda memory (moderate stress)
- 1769-2048 MB: 40% of Lambda memory (significant stress)
- 4096 MB+: 60% of Lambda memory (aggressive stress testing)
- Safety cap: 70% maximum to prevent OOM errors
Rationale:
- Small Lambda configs need conservative ratios to complete in reasonable time
- Large Lambda configs benefit from aggressive ratios (60-70%) to properly stress memory subsystem
- Graduated approach better represents real-world usage patterns
- Percentage-based safety cap (70%) automatically scales with Lambda max memory
Impact:
- 128 MB: Array size reduced from 51 MB → 19 MB (63% reduction)
- 512 MB: Array size reduced from 205 MB → 102 MB (50% reduction)
- 4096 MB: Array size increased from 1638 MB → 2458 MB (50% increase)
- 8192 MB: Array size increased from 3277 MB → 4915 MB (50% increase)
Related Files:
scripts/benchmark_orchestrator.py- Implements graduated allocation logicdocs/benchmark-design.md- Documents workload allocation strategydocs/handler-api-spec.md- Documents payload size calculations
Date: 2025-11-15 | Status: Approved
Decision: Add Rust as a sixth runtime to the benchmark suite using the cargo-lambda-cdk construct library.
Context: AWS officially announced Rust Lambda support on November 14, 2025, providing native Rust runtime support via the provided.al2023 runtime. The cargo-lambda-cdk library provides CDK constructs that automatically compile Rust code using cargo-lambda during synthesis.
Implementation:
- Runtime:
provided.al2023with bootstrap binary - Build tool: cargo-lambda (automatic via cargo-lambda-cdk)
- CDK integration:
RustFunctionconstruct from cargo-lambda-cdk - Workspace structure: Rust workspace with 3 binary crates (cpu-intensive, memory-intensive, light)
- Total functions: 36 (6 runtimes × 2 architectures × 3 workloads)
Workload implementations:
- CPU-intensive: SHA-256 hashing loop using
sha2crate (NO AWS SDK) - Memory-intensive:
Vec<i64>array generation withStdRng::from_entropy()(non-deterministic) andsort_unstable()(NO AWS SDK) - Light: DynamoDB batch write + batch read using
aws-sdk-dynamodbcrate (compiled into binary)
Rationale:
- Rust is increasingly used for Lambda functions due to performance and memory efficiency
- Official AWS support makes Rust a first-class Lambda runtime
- Adds systems programming language perspective to benchmark
- cargo-lambda-cdk provides seamless CDK integration with automatic compilation
References:
Related Files:
cdk/lib/config/lambda-config.ts- Added RUST_RUNTIMES configurationcdk/lib/constructs/benchmark-function.ts- Added RustFunction handlingcdk/package.json- Added cargo-lambda-cdk dependencylambdas/rust/Cargo.toml- Rust workspace configurationlambdas/rust/{workload}/src/main.rs- Rust handler implementationsdocs/benchmark-design.md- Updated with Rust implementation details
Status: Approved Date: 2025-11-16 Stakeholders: Benchmark design team
Context:
The original D016 design used graduated memory allocation ratios (15%-60% of Lambda memory) for the memory-intensive workload. This approach had a critical flaw: it conflated two variables (workload size AND resource size), making results difficult to interpret. As Lambda memory increased, the workload became exponentially larger, causing:
- Unclear results: Did performance improve due to more CPU/memory, or degrade due to larger workload?
- Extreme execution times: Python at 8192 MB allocated 4.9 GB arrays, taking 10+ minutes per invocation
- No clear plateau: Impossible to visualize when adding more resources stops helping
Decision:
Use a fixed 100 MB array for the memory-intensive workload across ALL Lambda memory configurations (128 MB to 10240 MB).
Implementation:
- Python:
FIXED_ARRAY_SIZE_MB = 100(hardcoded in handler) - Node.js:
FIXED_ARRAY_SIZE_MB = 100(hardcoded in handler) - Rust:
FIXED_ARRAY_SIZE_MB = 100(hardcoded in handler) - Orchestrator: No longer calculates
sizeMB, passes empty payload{} - Remove
get_memory_intensive_size_mb()function entirely
Rationale:
- Separates variables: Constant workload + variable resources = pure scaling measurement
- Clear plateau visualization: Shows exactly when 1 vCPU (1769 MB) stops improving performance
- Faster benchmarks: No more 5GB arrays; all configs complete in reasonable time
- Apples-to-apples comparison: Same work across all memory configs reveals resource efficiency
Benefits:
- ~10x faster benchmark execution for high-memory configs
- Clearer performance plateau graphs
- Easier to answer: "What's the optimal Lambda memory for sorting 100 MB?"
- Simpler code (no graduated ratio calculations)
Trade-offs:
- No longer tests "can this runtime handle massive arrays at high memory?"
- Fixed 100 MB may not stress 10240 MB Lambda's full capabilities
- Accepted: This benchmark focuses on resource scaling, not workload scaling
Impact:
Code Changes:
lambdas/python/memory-intensive/handler.py- UseFIXED_ARRAY_SIZE_MB, remove validationlambdas/nodejs/memory-intensive/handler.ts- UseFIXED_ARRAY_SIZE_MB, remove validationlambdas/rust/memory-intensive/src/main.rs- UseFIXED_ARRAY_SIZE_MB, remove validationscripts/benchmark_orchestrator.py- Removeget_memory_intensive_size_mb()scripts/benchmark_utils.py- ReplaceMEMORY_INTENSIVE_MAX_RATIOwithMEMORY_INTENSIVE_ARRAY_SIZE_MB
Documentation Updates:
docs/benchmark-design.md- Replace graduated allocation section with fixed 100 MB rationaledocs/handler-api-spec.md- Update memory-intensive payload spec (now{})CLAUDE.md- Update project overview and invariants
Migration:
- New test results NOT comparable with D016 results (different workload sizes)
- Start fresh test runs after deploying D018 changes
- Archive D016 results separately if comparing approaches
References:
- Inspired by observation that benchmark slowed down as memory increased (counter-intuitive)
- Standard practice: fixed workload for performance scaling analysis
Related Decisions:
- Supersedes D016 (Graduated Memory Allocation)
- Complements D009 (Zero-Overhead Data Collection)
- Complements D015 (Dynamic Memory Configuration)
Related Files:
- All
lambdas/*/memory-intensive/handler files scripts/benchmark_orchestrator.pyscripts/benchmark_utils.pydocs/benchmark-design.mddocs/handler-api-spec.md
End of Decision Log
Last updated: 2025-11-16
For non-architectural decisions (budget, publication, etc.), see PROJECT_STATUS.md or README.md.