This document explains the DynamoDB schema used to store test results and analyze test runs.
The BenchmarkResults DynamoDB table stores results for the Lambda ARM vs x86 benchmark suite.
This is a single-table, multi-entity database that is optimized for:
- Fast, cheap queries over aggregated statistics (avoids excessive reads).
- Occasional deep dives into raw invocation results.
- Tracking and comparing multiple benchmark runs over time.
Because this table is for test result data and isn't mission-critical or production data, we don't have backups enabled for this table.
The table is designed around these primary access patterns:
-
AP1 – Get test run summary + all aggregates for a run
“Show me metadata and stats for test run X.” -
AP2 – Get a single aggregate for a specific config in a run
“Give me cold-start stats forpython3.13-arm64-cpu-512in run X.” -
AP3 – Get all raw results for a test run
“Fetch all samples for run X so I can re-aggregate / debug.” -
AP4 – Get all runs for a specific configuration
“Show howpython3.13-arm64-cpu-512has changed over time.” -
AP5 – List all test runs
“Show me all benchmark runs with their status, notes, and timestamps.” -
AP6 (optional / advanced) – Detect missing aggregates for a run
“Given the test matrix for run X, which aggregates are missing?”
The schema and GSIs exist to satisfy these patterns directly.
All entities share one table:
result– individual invocation dataaggregate– pre-calculated statistics per configurationtest-run– metadata and test matrix for a run
Every item includes:
itemType {"result","aggregate","test-run"}– required entity type declarationtestRunId– UUID that ties all data for a run together
The base table uses a generic primary key:
- Partition key:
pk - Sort key:
sk
Item key patterns:
-
Test run items
pk = "TESTRUN#{testRunId}"sk = "TESTRUN#{testRunId}"
-
Aggregate items
pk = "TESTRUN#{testRunId}"sk = "AGGREGATE#{configId}#{invocationType}"
-
Result items
pk = "{testRunId}#{configId}"sk = "{invocationType}#{invocationNumber}"
Note
Aggregates and the test-run item for a run share the samepk.
This allows a single query onpk = "TESTRUN#{testRunId}"to return:
- One
test-runitem (metadata)- Many
aggregateitems (stats)
Key pattern
pk = "TESTRUN#{testRunId}"sk = "TESTRUN#{testRunId}"
Purpose
- One item per benchmark run.
- Stores metadata and the test matrix.
Typical attributes
-
Identity & timestamps:
testRunIdtimestamp(start time)startTime,endTime(optional)
-
Status:
status–"in_progress","completed", or"failed"failedInvocations(optional)errorSummary(optional)
-
Run configuration:
mode–"test"or"production"totalConfigurationstotalInvocationscoldStartsPerConfigwarmStartsPerConfig
-
Documentation:
notes– human-readable description of what changed / why
-
Test matrix (recommended):
- High-level lists:
runtimesarchitecturesworkloadTypes
- Detailed:
configurations[]with:runtimearchitectureworkloadTypememorySizes[]
- High-level lists:
The test matrix enables reconstruction of expected (configId, invocationType) aggregates and detection of missing data, and it supports partial runs (for example, filtered by runtime or memory).
Key pattern
pk = "TESTRUN#{testRunId}"sk = "AGGREGATE#{configId}#{invocationType}"
Purpose
- One item per
(testRunId, configId, invocationType). - Stores pre-calculated statistics for fast, low-cost analysis.
Typical attributes
-
Identity:
testRunIdconfigIdinvocationType–"cold" | "warm"timestamp(when the aggregate was created)
-
Configuration dimensions:
runtimearchitectureworkloadTypememorySizeMB
-
Sample metadata:
sampleCountfailedCountallSuccessful
-
Statistics objects (names can evolve):
durationMsStatsbilledDurationMsStatsmemoryMBStatsinitDurationMsStats(cold only)
Each stats object typically includes a combination of: mean, median, min, max, percentiles (p90, p95, p99), sampleCount, and optional flags such as outliersRemoved.
Key pattern
pk = "{testRunId}#{configId}"sk = "{invocationType}#{invocationNumber}"
Purpose
- Raw per-invocation data used for deep debugging or re-computing stats.
Typical attributes
-
Identity:
testRunIdconfigIdinvocationType–"cold" | "warm"invocationNumber– integertimestamp
-
Metrics:
- Duration metrics (for example,
durationMs,billedDurationMs) - Memory usage (for example,
maxMemoryUsedMB) - Optional cold-start metric:
initDurationMs - Any additional metrics captured (for example, client latency)
- Duration metrics (for example,
-
Lambda metadata:
functionNamelambdaRequestId(or equivalent request ID)
Schema
- Partition key:
configId - Sort key:
timestamp
Supports
- AP4 – Get all runs for a specific configuration
Typical questions
- “Show all runs where
python3.13-arm64-cpu-512was tested.” - “Get the most recent N runs for this configuration.”
Schema
- Partition key:
testRunId - Sort key:
timestamp
Supports
- AP3 – Get all raw results for a test run
Typical questions
- “Fetch all samples for run
abc-123.” - “Recompute stats for run
abc-123using custom logic.”
The table below summarizes which part of the model to use for each access pattern.
| Access pattern | Target | Key condition / usage |
|---|---|---|
| AP1 – Test run summary + all aggregates for a run | Base table | pk = "TESTRUN#{testRunId}". Filter itemType = test-run / aggregate. |
AP2 – Single aggregate for (testRunId, configId, type) |
Base table (GetItem) | pk = "TESTRUN#{testRunId}", sk = "AGGREGATE#{configId}#{invocationType}". |
| AP3 – All raw results for a test run | TestRunIndex (GSI2) | testRunId = {testRunId}; use timestamp sort key as needed. |
| AP4 – All runs for a given configuration | ConfigIndex (GSI1) | configId = {configId}; sort / filter by timestamp. |
| AP5 – List all test runs | Base table (Scan) | Filter on itemType = "test-run", then sort by timestamp. |
| AP6 – Detect missing aggregates for a run | Base table + primary-key gets | 1) Read test-run item; 2) for each expected (configId, type) perform GetItem on aggregate. |