Design Spec: mongodb-atlas-cluster-health
Parent: #107
Target: rw-cli-codecollection
Spec
codebundle_name: "mongodb-atlas-cluster-health"
target_collection: "rw-cli-codecollection"
display_name: "MongoDB Atlas Cluster Health"
author: "rw-codebundle-agent"
purpose: |
Read-only health checks for MongoDB Atlas clusters within a project: inventory,
deployment/replica availability signals, and key workload metrics (connections,
replication lag, resource pressure) via the Atlas Admin API. Helps operators
catch instability before customer-facing incidents.
tasks:
- name: "Gather MongoDB Atlas Cluster Inventory for Project `${ATLAS_PROJECT_ID}`"
description: "Lists clusters and captures edition, MongoDB version, region(s), instance sizes, and paused/maintenance flags for baseline triage context."
script_name: "gather-atlas-cluster-inventory.sh"
expected_issue_severity: [1, 2]
access_level: "read-only"
data_type: "config"
- name: "Check MongoDB Atlas Cluster State for Project `${ATLAS_PROJECT_ID}`"
description: "Evaluates cluster/replica set operational state from Atlas APIs (e.g. transitional states, unhealthy members, upgrade operations) and raises issues when availability is degraded."
script_name: "check-atlas-cluster-state.sh"
expected_issue_severity: [3, 4]
access_level: "read-only"
data_type: "config"
- name: "Analyze MongoDB Atlas Cluster Metrics for Project `${ATLAS_PROJECT_ID}`"
description: "Pulls recent Atlas process/hardware measurements (connections, CPU/disk pressure, replication lag where exposed) against configurable thresholds."
script_name: "analyze-atlas-cluster-metrics.sh"
expected_issue_severity: [2, 4]
access_level: "read-only"
data_type: "metrics"
scope:
level: "Project"
qualifiers:
- ATLAS_ORG_ID
- ATLAS_PROJECT_ID
iteration_pattern: |
One SLX per Atlas project (ATLAS_PROJECT_ID). Optional CLUSTER_FILTER env var
narrows checks to named cluster(s); default evaluates all clusters in the project.
resource_types:
- "mongodb_atlas_cluster"
generation_strategy: |
Match discovered mongodb_atlas_cluster resources in RunWhen; qualifier chain includes
organization/project/cluster identifiers. Alternatively scope SLX to project-level
discovery when cluster-level assets are not indexed—tasks iterate clusters via Atlas API.
env_vars:
- name: ATLAS_PROJECT_ID
description: "MongoDB Atlas project ID (24-char hex)"
required: true
- name: ATLAS_ORG_ID
description: "MongoDB Atlas organization ID for audit context and multi-project setups"
required: false
- name: CLUSTER_FILTER
description: "Comma-separated cluster names to limit scope; empty means all clusters in project"
required: false
default: ""
- name: CONNECTION_THRESHOLD
description: "Issue when normalized connection utilization exceeds this percentage"
required: false
default: "85"
- name: DISK_UTIL_THRESHOLD
description: "Issue when disk utilization exceeds this percentage"
required: false
default: "85"
- name: REPLICATION_LAG_MS_THRESHOLD
description: "Issue when replica lag exceeds this many milliseconds (where measurable)"
required: false
default: "5000"
secrets:
- name: atlas_api_key_credentials
description: "MongoDB Atlas programmatic API key pair"
format: |
JSON or env mapping with ATLAS_PUBLIC_API_KEY and ATLAS_PRIVATE_API_KEY (Atlas Admin API)
platform:
name: "mongodb_atlas"
cli_tools:
- "curl"
- "jq"
- "mongocli"
auth_methods:
- "Atlas Admin API key digest auth (public + private key)"
api_docs: "https://www.mongodb.com/docs/atlas/reference/api-resources-spec/v2/"
related_bundles:
- name: "mongodb-health-gcp-promql"
relationship: "complements"
notes: "GCP Prometheus/GMP bundle observes MongoDB workloads on GKE; this bundle targets Atlas-hosted clusters via Atlas APIs."
- name: "mongodb-atlas-operations-health"
relationship: "complements"
notes: "Companion bundle covers alerts, backups, and network access while this bundle focuses on runtime cluster health."
test_scenarios:
- name: "healthy_project"
description: "All clusters ACTIVE, metrics within thresholds"
expected_issues: 0
- name: "degraded_replica"
description: "Atlas reports elevated replication lag or transitional cluster state"
expected_issues: 2
expected_severities: [3, 4]
notes: |
Prefer Atlas Admin API v2 over scraping UI. Respect Atlas rate limits; batch metric
windows conservatively (e.g. last 15–60 minutes). Implementation should emit structured
JSON issues compatible with RW.Core.Add Issue patterns used in sibling azure-* bundles.
Before merge, run `python -m scorer.score <bundle_path>` from codebundle-farm (threshold 70).
Design Spec: mongodb-atlas-cluster-health
Parent: #107
Target:
rw-cli-codecollectionSpec