Design Spec: azure-cosmosdb-utilization-health
Parent: #102
Target: rw-cli-codecollection
Spec
codebundle_name: "azure-cosmosdb-utilization-health"
target_collection: "rw-cli-codecollection"
display_name: "Azure Cosmos DB Utilization and Sizing Health"
author: "rw-codebundle-agent"
purpose: |
Evaluates historical and point-in-time utilization for Azure Cosmos DB: request units (RU),
throttling, latency, and storage growth to support right-sizing and throughput planning.
tasks:
- name: "Analyze Cosmos DB Normalized RU Consumption Trends for Account `${COSMOSDB_ACCOUNT_NAME}` in Resource Group `${AZURE_RESOURCE_GROUP}`"
description: "Pulls Azure Monitor time series for normalized RU consumption to detect sustained hot partitions or headroom loss."
script_name: "cosmosdb-normalized-ru-trends.sh"
expected_issue_severity: [3, 4]
access_level: "read-only"
data_type: "metrics"
- name: "Analyze Cosmos DB Total Request Units Consumed for Account `${COSMOSDB_ACCOUNT_NAME}` in Resource Group `${AZURE_RESOURCE_GROUP}`"
description: "Aggregates Total Request Units over the lookback window for workload growth and chargeback-style signals."
script_name: "cosmosdb-total-ru-consumed.sh"
expected_issue_severity: [3, 4]
access_level: "read-only"
data_type: "metrics"
- name: "Check Cosmos DB Throttling and HTTP 429 Rate for Account `${COSMOSDB_ACCOUNT_NAME}` in Resource Group `${AZURE_RESOURCE_GROUP}`"
description: "Correlates throttled requests / 429 indicators with provisioned throughput to flag undersizing."
script_name: "cosmosdb-throttling-429.sh"
expected_issue_severity: [3, 4]
access_level: "read-only"
data_type: "metrics"
- name: "Analyze Cosmos DB Server-side Latency for Account `${COSMOSDB_ACCOUNT_NAME}` in Resource Group `${AZURE_RESOURCE_GROUP}`"
description: "Reviews server-side latency metrics for regressions that often precede saturation or hot keys."
script_name: "cosmosdb-server-latency.sh"
expected_issue_severity: [3, 4]
access_level: "read-only"
data_type: "metrics"
- name: "Analyze Cosmos DB Data and Index Storage Utilization for Account `${COSMOSDB_ACCOUNT_NAME}` in Resource Group `${AZURE_RESOURCE_GROUP}`"
description: "Tracks data and index storage growth; flags rapid expansion that may drive partition count and cost."
script_name: "cosmosdb-storage-utilization.sh"
expected_issue_severity: [3, 4]
access_level: "read-only"
data_type: "metrics"
- name: "Analyze Cosmos DB Provisioned Throughput vs Consumed Load for Account `${COSMOSDB_ACCOUNT_NAME}` in Resource Group `${AZURE_RESOURCE_GROUP}`"
description: "Compares autoscale or manual provisioned RU/s against consumed RU patterns for oversizing and undersizing recommendations."
script_name: "cosmosdb-throughput-sizing.sh"
expected_issue_severity: [3, 4]
access_level: "read-only"
data_type: "metrics"
scope:
level: "ResourceGroup"
qualifiers:
- AZ_SUBSCRIPTION
- AZURE_RESOURCE_GROUP
iteration_pattern: |
Same as config bundle: filter by `${COSMOSDB_ACCOUNT_NAME}` or discover all accounts in the group.
resource_types:
- "azure_cosmosdb_database_account"
generation_strategy: |
Pair with azure-cosmosdb-config-health generation: same resource match and SLX qualifiers so
operators can run configuration and utilization task sets side by side per account.
env_vars:
- name: AZ_SUBSCRIPTION
description: "Azure subscription ID (UUID)"
required: true
- name: AZURE_RESOURCE_GROUP
description: "Resource group containing the Cosmos DB account(s)"
required: true
- name: COSMOSDB_ACCOUNT_NAME
description: "Cosmos DB account name, or All for every account in the resource group"
required: false
default: "All"
- name: METRICS_LOOKBACK_DAYS
description: "Days of historical metrics to analyze for trends"
required: false
default: "14"
- name: NORMALIZED_RU_THRESHOLD_PCT
description: "Percentage of normalized RU consumption above which to raise sizing issues"
required: false
default: "80"
- name: THROTTLE_EVENTS_THRESHOLD
description: "Minimum throttled request count (or rate) in the window to flag undersizing"
required: false
default: "1"
secrets:
- name: azure_credentials
description: "Service principal used by Azure CLI"
format: |
JSON object with: CLIENT_ID, TENANT_ID, CLIENT_SECRET, SUBSCRIPTION_ID
platform:
name: "azure"
cli_tools:
- "az"
- "jq"
auth_methods:
- "Service Principal (azure_credentials)"
api_docs: "https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/microsoft-documentdb-databaseaccounts-metrics"
related_bundles:
- name: "azure-cosmosdb-config-health"
relationship: "complements"
notes: "Configuration and DR/network checks; this bundle focuses on utilization and sizing signals."
- name: "azure-appservice-plan-health"
relationship: "complements"
notes: "Similar weekly utilization and capacity pattern for Azure PaaS; reuse metric query idioms."
- name: "azure-db-health"
relationship: "complements"
notes: "Cloud Custodian bundle in another collection lists broad DB CPU/memory patterns; Cosmos-specific RU metrics are not covered there."
test_scenarios:
- name: "right_sized_account"
description: "Normalized RU below threshold, no throttling, stable latency"
expected_issues: 0
- name: "undersized_account"
description: "High normalized RU with throttling events in the window"
expected_issues: 2
expected_severities: [3, 4]
notes: |
Intake wording referenced historical CPU and memory; for Cosmos DB, implementers should map those
intents to service-native metrics (normalized RU, total RU consumed, server-side latency, storage)
rather than host CPU/RAM, which are not customer-visible in the same way as IaaS.
Use `az monitor metrics list` with appropriate metric names and dimensions (DatabaseName,
CollectionName, Region) where APIs allow. Keep tasks under ~60s combined where possible by
narrowing time grain and parallelizing account loops.
Design Spec: azure-cosmosdb-utilization-health
Parent: #102
Target:
rw-cli-codecollectionSpec