feat(monitoring): wire HealthCollector state-change events to workflow trigger + notification pipeline#3415
Conversation
…w trigger + notification pipeline (#3404) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Code reviewFound 2 issues.
In AutoBot-AI/autobot-slm-backend/slm/agent/health_collector.py Lines 154 to 165 in b92a9f6
Every other enum member has a name that matches its value root: AutoBot-AI/autobot-backend/services/notification_service.py Lines 73 to 74 in b92a9f6 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
…ist; fix SERVICE_FAILED enum value - Early-return on TimeoutExpired/FileNotFoundError/Exception before calling _detect_and_publish_state_changes to prevent false transitions on truncated lists - SERVICE_FAILED value corrected from "service_failure" to "service_failed" to match name/value convention (WORKFLOW_FAILED="workflow_failed", STEP_FAILED="step_failed") - Updated test assertion and workflow template event key to match Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
✅ SSOT Configuration Compliance: Passing🎉 No hardcoded values detected that have SSOT config equivalents! |
Code reviewFound 1 issue (both issues from the prior review are already fixed in the latest commit). Fixed since the prior review:
Remaining issue —
# health_collector.py — inside _publish_state_change
try:
from autobot_shared.redis_client import get_redis_client
client = get_redis_client(database="main")
Fix: move the import to module level (preferred — consistent with AutoBot-AI/autobot-slm-backend/slm/agent/health_collector.py Lines 317 to 319 in dc02d70 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Summary
health_collector.py: tracks per-service last-known state; publishes{"service", "prev_state", "new_state", "error_context"}toautobot:services:{name}:state_changeon real state transitions only (first-observation and no-change are suppressed); Redis failures are caught and logged, never propagatednotification_service.py: addedSERVICE_FAILED = "service_failure"toNotificationEventenum with default templateworkflow_templates/service_health_monitor.yaml: new template — REDIS_PUBSUB trigger onautobot:services:*:state_change, filters onfailed/crash-loop, sends notificationdocs/examples/service_failure_monitoring.py: runnable standalone demo usingredis.pubsub().psubscribe()+NotificationServicedocs/user/guides/workflows.md: added "Monitor a Linux Service" sectionCloses #3404