Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: runwhen.com/v1
kind: GenerationRules
spec:
platform: mongodb_atlas
generationRules:
- resourceTypes:
- mongodb_atlas_cluster
matchRules:
- type: pattern
pattern: ".+"
properties: [name]
mode: substring
slxs:
- baseName: mongo-atlas-cluster-hlth
qualifiers: [organization, project, cluster]
baseTemplateName: mongodb-atlas-cluster-health
levelOfDetail: basic
outputItems:
- type: slx
- type: sli
- type: runbook
templateName: mongodb-atlas-cluster-health-taskset.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
apiVersion: runwhen.com/v1
kind: ServiceLevelIndicator
metadata:
name: {{slx_name}}
labels:
{% include "common-labels.yaml" %}
annotations:
{% include "common-annotations.yaml" %}
spec:
displayUnitsLong: OK
displayUnitsShort: ok
locations:
- {{default_location}}
description: Samples MongoDB Atlas Admin API responsiveness, IDLE envelopes, and lightweight PRIMARY-derived measurements for alerting.
codeBundle:
{% if repo_url %}
repoUrl: {{repo_url}}
{% else %}
repoUrl: https://github.com/runwhen-contrib/rw-cli-codecollection.git
{% endif %}
{% if ref %}
ref: {{ref}}
{% else %}
ref: main
{% endif %}
pathToRobot: codebundles/mongodb-atlas-cluster-health/sli.robot
intervalStrategy: intermezzo
intervalSeconds: 300
configProvided:
- name: ATLAS_PROJECT_ID
value: "{{ match_resource.resource.atlas_project_id | default(match_resource.resource.project_id) }}"
- name: CLUSTER_FILTER
value: "{{ match_resource.resource.name | default('') }}"
- name: CONNECTION_THRESHOLD
value: "{{custom.atlas_connections_threshold | default('85')}}"
- name: CPU_UTIL_THRESHOLD
value: "{{custom.atlas_cpu_util_threshold | default('92')}}"
- name: SLI_MAX_MEASUREMENT_PROCESSES
value: "{{custom.atlas_sli_max_measurement_processes | default('8')}}"
secretsProvided:
{% if wb_version %}
{% include "mongodb_atlas-auth.yaml" ignore missing %}
{% else %}
- name: atlas_api_key_credentials
workspaceKey: AUTH DETAILS NOT FOUND
{% endif %}
alertConfig:
tasks:
persona: eager-edgar
sessionTTL: 10m
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
apiVersion: runwhen.com/v1
kind: ServiceLevelX
metadata:
name: {{slx_name}}
labels:
{% include "common-labels.yaml" %}
annotations:
{% include "common-annotations.yaml" %}
spec:
imageURL: https://storage.googleapis.com/runwhen-nonprod-shared-images/icons/postgresql.svg
alias: MongoDB Atlas Health for {{ match_resource.resource.name | default(match_resource.alias) }}
asMeasuredBy: Composite digest-authenticated probes for Atlas cluster inventory responsiveness, IDLE posture minus pauses, and PRIMARY-derived workload samples.
configProvided:
- name: ATLAS_PROJECT_DISPLAY
value: "{{match_resource.resource.name}}"
owners:
- {{workspace.owner_email}}
statement: MongoDB Atlas hosted clusters retain healthy automation state and capacity headroom in project scope.
additionalContext:
{% include "mongodb_atlas-hierarchy.yaml" ignore missing %}
qualified_name: "{{match_resource.qualified_name}}"
tags:
{% include "mongodb_atlas-tags.yaml" ignore missing %}
- name: cloud
value: atlas
- name: datastore
value: mongodb
- name: scope
value: project
- name: access
value: read-only
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
apiVersion: runwhen.com/v1
kind: Runbook
metadata:
name: {{slx_name}}
labels:
{% include "common-labels.yaml" %}
annotations:
{% include "common-annotations.yaml" %}
spec:
location: {{default_location}}
description: Lists MongoDB Atlas inventory, verifies operational envelopes, and samples cluster metrics inside a scoped project footprint.
codeBundle:
{% if repo_url %}
repoUrl: {{repo_url}}
{% else %}
repoUrl: https://github.com/runwhen-contrib/rw-cli-codecollection.git
{% endif %}
{% if ref %}
ref: {{ref}}
{% else %}
ref: main
{% endif %}
pathToRobot: codebundles/mongodb-atlas-cluster-health/runbook.robot
configProvided:
- name: ATLAS_PROJECT_ID
value: "{{ match_resource.resource.atlas_project_id | default(match_resource.resource.project_id) }}"
- name: ATLAS_ORG_ID
value: "{{ match_resource.resource.organization_id | default(match_resource.labels.atlas_org_id | default('')) }}"
- name: CLUSTER_FILTER
value: "{{ match_resource.resource.name | default('') }}"
- name: CONNECTION_THRESHOLD
value: "{{custom.atlas_connections_threshold | default('85')}}"
- name: DISK_UTIL_THRESHOLD
value: "{{custom.atlas_disk_util_threshold | default('85')}}"
- name: REPLICATION_LAG_MS_THRESHOLD
value: "{{custom.atlas_replication_lag_ms | default('5000')}}"
- name: CPU_UTIL_THRESHOLD
value: "{{custom.atlas_cpu_util_threshold | default('92')}}"
secretsProvided:
{% if wb_version %}
{% include "mongodb_atlas-auth.yaml" ignore missing %}
{% else %}
- name: atlas_api_key_credentials
workspaceKey: AUTH DETAILS NOT FOUND
{% endif %}
17 changes: 17 additions & 0 deletions codebundles/mongodb-atlas-cluster-health/.test/Taskfile.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
version: "3"

tasks:
default:
desc: "Validate Atlas CodeBundle structure locally"
cmds:
- task: validate-structure

validate-structure:
desc: "Run static checks without live Atlas credentials"
cmds:
- ./validate-atlas-bundle-structure.sh

clean:
desc: "Remove local scaffold outputs"
cmds:
- rm -rf output workspaceInfo.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env bash
# Structure validation for mongodb-atlas-cluster-health (no live Atlas project required).
set -euo pipefail
ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
test -f "$ROOT/runbook.robot"
test -f "$ROOT/sli.robot"
test -f "$ROOT/README.md"
test -f "$ROOT/.runwhen/generation-rules/mongodb-atlas-cluster-health.yaml"
test -f "$ROOT/.runwhen/templates/mongodb-atlas-cluster-health-slx.yaml"
test -f "$ROOT/.runwhen/templates/mongodb-atlas-cluster-health-taskset.yaml"
test -f "$ROOT/.runwhen/templates/mongodb-atlas-cluster-health-sli.yaml"

for f in \
gather-atlas-cluster-inventory.sh \
check-atlas-cluster-state.sh \
analyze-atlas-cluster-metrics.sh \
sli-mongodb-atlas-quick-check.sh \
atlas-api-common.inc.sh
do
test -x "$ROOT/$f"
done

echo "mongodb-atlas-cluster-health bundle structure OK"
49 changes: 49 additions & 0 deletions codebundles/mongodb-atlas-cluster-health/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# MongoDB Atlas Cluster Health

Operators use this bundle to watch MongoDB Atlas clusters through digest-authenticated HTTPS calls against Atlas Admin API v2. Responses focus on actionable inventory breadcrumbs, transitional automation envelopes, Atlas replica process cues, and short-window workload telemetry that matches escalation expectations from parent issue #107.

## Overview

- **Baseline inventory**: Print provider footprints, tiers, majors, disks, pause toggles, and live `stateName` values for every Atlas cluster honoring optional name filters before deeper debugging.
- **Operational posture**: Correlate transitional automation states plus Atlas-published MongoDB replica `healthStatus` hints (when Atlas returns them) to separate planned maintenance from regressions affecting availability.
- **Workload metrics**: Stretch compact measurement queries across replica processes to compare CONNECTIVITY_PERCENT, NORMALIZED_SYSTEM_CPU_USER, DISK PARTITION data usage vs `diskSizeGB`, and replication lag surrogates (`OPLOG_SLAVE_LAG_MASTER_TIME`) against tunable envelopes.

Discovery templates assume discovered `mongodb_atlas_cluster` resources expose `match_resource.resource.atlas_project_id` (or fallback `project_id`), optional `organization_id`, and canonical names for `CLUSTER_FILTER`. Adjust template paths if workspace metadata varies.

## Configuration

### Required Variables

- `ATLAS_PROJECT_ID`: 24 hexadecimal characters identifying the Atlas project/group for every REST path segment.

### Optional Variables

- `ATLAS_ORG_ID`: Organizational identifier surfaced in inventories for auditors (informational annotations only).
- `CLUSTER_FILTER`: Comma-separated Atlas cluster names; leave blank or unset to iterate every Atlas cluster enumerated for the scoped project API call.
- `CONNECTION_THRESHOLD`: Percent ceiling evaluated when CONNECTIVITY_PERCENT samples exist per process (defaults to `85`).
- `DISK_UTIL_THRESHOLD`: Modeled occupancy percent comparing maximum `DISK_PARTITION_SPACE_USED_DATA` samples with declared `diskSizeGB` totals (defaults to `85`).
- `REPLICATION_LAG_MS_THRESHOLD`: Milliseconds tolerated for `OPLOG_SLAVE_LAG_MASTER_TIME` spikes (defaults to `5000`).
- `CPU_UTIL_THRESHOLD`: Applies to BOTH the deep metric sweep and bundled SLIs for NORMALIZED_SYSTEM_CPU_USER bursts (defaults to `92`).
- `SLI_MAX_MEASUREMENT_PROCESSES`: Bounds how many PRIMARY hosts the SLI script samples during each heartbeat to stay within Atlas rate envelopes (defaults to `8`).
- `ATLAS_API_BASE`: Sovereign/private endpoint overrides (defaults to `https://cloud.mongodb.com/api/atlas/v2`).
- `ATLAS_ACCEPT_HEADER`: API contract header (defaults to `application/vnd.atlas.2025-02-19+json`; rotate when Atlas documents a successor version).
- `ATLAS_METRICS_MEASUREMENT_DELAY_MS`: Millisecond delay between sequential measurement curls for chatty fleets (defaults to `200`; set `0` to disable).
- `ATLAS_PUBLIC_API_KEY` plus `ATLAS_PRIVATE_API_KEY` may replace the bundled secret whenever RunWhen injects raw halves instead of JSON.

### Secrets

- `atlas_api_key_credentials`: JSON pairing `ATLAS_PUBLIC_API_KEY` / `ATLAS_PRIVATE_API_KEY` (or `publicKey` / `privateKey`) emitted by Atlas for digest-authenticated callers. Grant **Project Read Only** scopes at minimum.

## Tasks & Features

### Gather MongoDB Atlas Cluster Inventory for Project `${ATLAS_PROJECT_ID}`

Lists paused clusters plus clusters whose `stateName` drifts outside `IDLE` while unpaused.

### Check MongoDB Atlas Cluster State for Project `${ATLAS_PROJECT_ID}`

Flags paused clusters separately from automation transitions, investigates MongoDB replica `healthStatus` mismatches whenever Atlas returns that field.

### Analyze MongoDB Atlas Cluster Metrics for Project `${ATLAS_PROJECT_ID}`

Aggregates condensed measurement windows respecting operator thresholds; CONNECTION counts fall back to raw scalars without percent semantics when CONNECTIVITY_PERCENT is unavailable—threshold comparisons activate only when percent samples exist.
Loading