Skip to content

feat(monitoring): wire HealthCollector state-change events to workflow trigger + notification pipeline #3404

@mrveiss

Description

@mrveiss

Context7 Score: 73/100

Test query: "Implement a real-time monitoring task that triggers a notification when a specific Linux service enters a failed state."

Audit Findings

What EXISTS

  • autobot-slm-backend/slm/agent/health_collector.py — polls systemd via systemctl, maps states (active/failed/crash-loop/inactive)
  • autobot-backend/services/notification_service.py — 4 channels (email, Slack, webhook, in-app)
  • autobot-backend/services/trigger_service.py — REDIS_PUBSUB trigger type ready
  • docs/guides/realtime-monitoring-notifications.md — 1300-line guide with working example

What is MISSING

  1. HealthCollector doesn't publish to Redis — detects state changes but never emits to autobot:services:{name}:state_change pub/sub channel
  2. No SERVICE_FAILED event type in NotificationEvent enum — only workflow lifecycle events exist
  3. No service monitoring workflow template — users must manually wire everything
  4. No end-to-end example connecting HealthCollector → REDIS_PUBSUB trigger → notification

Acceptance Criteria

  • HealthCollector publishes {"service": name, "prev_state": ..., "new_state": ..., "error_context": ...} to autobot:services:{name}:state_change on every state transition
  • NotificationEvent.SERVICE_FAILED added with default template: "Service {service} entered {state} state"
  • Workflow template: autobot-backend/workflow_templates/service_health_monitor.yaml — trigger: REDIS_PUBSUB on autobot:services:*:state_change, step: send notification
  • Example: docs/examples/service_failure_monitoring.py — complete runnable demo
  • docs/user/guides/workflows.md updated with "Monitor a Linux Service" section
  • Tests: state transitions, Redis pub/sub publish, notification dispatch

Files to Touch

  • autobot-slm-backend/slm/agent/health_collector.py
  • autobot-backend/services/notification_service.py
  • autobot-backend/workflow_templates/service_health_monitor.yaml (new)
  • docs/examples/service_failure_monitoring.py (new)
  • docs/user/guides/workflows.md

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions