Skip to content

feat: add maintenance window MCP tools for proactive deploy suppression#21

Merged
caballeto merged 2 commits into
mainfrom
feat/maintenance-window-tools
May 6, 2026
Merged

feat: add maintenance window MCP tools for proactive deploy suppression#21
caballeto merged 2 commits into
mainfrom
feat/maintenance-window-tools

Conversation

@caballeto
Copy link
Copy Markdown
Member

Summary

Adds five maintenance-window MCP tools so AI coding assistants (Cursor, Claude Desktop, Windsurf, etc.) can proactively suppress alerts during risky operations and clear them once the work succeeds.

This closes one of the highest-value gaps in the launch-story tool surface: an agent that runs a deploy script today either pages on-call when monitors briefly fail mid-rollout (noisy) or leaves the user to remember to silence things by hand (error-prone). With these tools the agent can do it itself.

The tools

Tool Purpose
list_maintenance_windows Inspect active or upcoming windows; supports monitor_id and status (active / upcoming) filters
get_maintenance_window Fetch a single window by ID with full details
create_maintenance_window Schedule a window — call BEFORE running a deploy / migration / scheduled task
update_maintenance_window Push the endsAt back when a deploy runs long (full PUT)
cancel_maintenance_window Clear the window so alerts resume — call AFTER a deploy succeeds

Time fields use ISO 8601 / RFC 3339 with explicit timezone (UTC preferred, e.g. 2026-05-15T14:00:00Z). Setting monitorId=null on create produces an org-wide window that suppresses every monitor.

Sample agent workflow

The docstrings spell this out for the LLM, but the canonical flow is:

  1. Agent receives a "deploy v0.7.3 to prod" instruction.
  2. Agent calls create_maintenance_window with startsAt = now, endsAt = now + 30min, reason = "v0.7.3 deploy".
  3. Agent runs the deploy script. Monitors briefly fail. No pages.
  4. On success, agent calls cancel_maintenance_window(window_id) so any post-deploy regression pages on-call immediately.
  5. If the deploy runs long, agent calls update_maintenance_window with a later endsAt instead of letting the window expire and pages flooding the channel.

Implementation notes

  • SDK coupling: the parallel SDK PR adds client.maintenance_windows later. Until that ships, the tools call /api/v1/maintenance-windows through the SDK's existing low-level helpers (api_get / api_post / api_put / api_delete) plus the generated CreateMaintenanceWindowRequest / MaintenanceWindowDto / UpdateMaintenanceWindowRequest Pydantic models. Once the SDK ships the resource, every tool body collapses to a one-liner; the public tool surface stays unchanged.
  • SDK lock bump: bumps the locked devhelm to 0.6.3 (latest on PyPI) to pick up the generated maintenance-window models. The pyproject pin (devhelm>=0.6.0) is unchanged — this is a lock-only bump.
  • No managedBy on this resource: unlike Monitor, the maintenance-window API doesn't carry a managedBy column. Surface attribution still happens via X-DevHelm-Surface: mcp on every request, so dashboard filters by surface continue to work; we just don't have a per-row attribution channel here. The schema-hygiene test pins the absence so a future SDK regen that bolts the field on can't silently expose it to the LLM.
  • api_token hidden from the LLM: every tool keeps the api_token kwarg for path-style /{api_key}/mcp clients but the field is stripped from the inputSchema by the existing _strip_internal_schema_fields lifespan hook (P2.Bug7). The tests pin this for all five new tools.

Test plan

  • uv sync — pulls devhelm 0.6.3 from PyPI
  • uv run ruff check src/ tests/ — clean
  • uv run ruff format --check src/ tests/ — clean
  • uv run mypy src/ — clean (Python 3.11 and 3.13)
  • uv run pytest tests/ -x — 135 passed (110 baseline + 25 new) on Python 3.11 and 3.13
  • Tool registration: every tool surfaces with non-empty description
  • HTTP wire contract: every tool builds the right path / method / body, including camelCase aliases (startsAt, endsAt, monitorId) and filter / monitorId query keys
  • Schema hygiene: api_token not in any inputSchema; managedBy not on create / update body schemas
  • Error surfacing: upstream DevhelmApiError propagates as isError=True with the formatted ApiError envelope (P1.Bug3)

Out of scope

  • pyproject.toml / serverInfo.version bumps — release engineering owns those.
  • Switching to client.maintenance_windows.create(...) — that's a follow-up for v0.7.3 polish once the SDK PR merges and a new devhelm release ships.

Made with Cursor

caballeto and others added 2 commits May 6, 2026 12:16
Five new tools so AI coding assistants can suppress alerts around risky
operations:

- list_maintenance_windows  — inspect active / upcoming windows
- get_maintenance_window    — fetch one window with full details
- create_maintenance_window — schedule downtime BEFORE a deploy
- update_maintenance_window — extend ``endsAt`` when a deploy runs long
- cancel_maintenance_window — clear suppression after success

The tools wrap ``/api/v1/maintenance-windows`` directly through the
SDK's low-level helpers because the parallel ``client.maintenance_windows``
SDK PR hasn't shipped yet. Once it does, every tool body collapses to a
one-liner against the resource without changing the public tool surface.

Bumps the locked ``devhelm`` SDK to 0.6.3 to pick up the generated
``CreateMaintenanceWindowRequest`` / ``MaintenanceWindowDto`` /
``UpdateMaintenanceWindowRequest`` Pydantic models. The pyproject pin
(``devhelm>=0.6.0``) is unchanged; this is a lock-only bump.

Co-authored-by: Cursor <cursoragent@cursor.com>
Adds 25 tests across five concerns:

- registration: every new tool surfaces in ``mcp.list_tools`` with a
  non-empty description (LLM docs)
- HTTP contract: each tool builds the right path / method / body via
  patched ``api_get`` / ``api_post`` / ``api_put`` / ``api_delete``,
  including the camelCase aliases on the wire (``startsAt``,
  ``endsAt``, ``monitorId``) and the ``filter`` / ``monitorId``
  query keys for the list endpoint
- schema hygiene: regression that ``api_token`` never leaks into the
  LLM-facing ``inputSchema`` (P2.Bug7 contract) and that the
  ``CreateMaintenanceWindowRequest`` / ``UpdateMaintenanceWindowRequest``
  body schemas don't expose ``managedBy`` — pinning now means a future
  SDK regen that bolts the field on can't silently surface it
- error surfacing: upstream ``DevhelmApiError`` propagates to the
  client as ``isError=True`` with the formatted ApiError envelope
  (P1.Bug3 contract)
- expected-tools list: keeps ``test_tools.py`` in sync so the count
  assertion reflects the new five tools

Total suite is now 135 tests (110 baseline + 25 new), all green on
Python 3.11 and 3.13.

Co-authored-by: Cursor <cursoragent@cursor.com>
@caballeto caballeto merged commit b7624bf into main May 6, 2026
4 checks passed
@caballeto caballeto deleted the feat/maintenance-window-tools branch May 6, 2026 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant