Skip to content

feat: inclusion calendar and relative SLA for irregular pipelines#65

Merged
dwsmith1983 merged 4 commits intomainfrom
feat/irregular-relative-sla
Mar 11, 2026
Merged

feat: inclusion calendar and relative SLA for irregular pipelines#65
dwsmith1983 merged 4 commits intomainfrom
feat/irregular-relative-sla

Conversation

@dwsmith1983
Copy link
Copy Markdown
Owner

@dwsmith1983 dwsmith1983 commented Mar 10, 2026

Summary

  • Inclusion calendar (schedule.include.dates) for pipelines on known irregular dates (monthly close, quarterly filing). Watchdog detects missed inclusion dates and publishes IRREGULAR_SCHEDULE_MISSED with grace period support via Schedule.Time and timezone awareness.
  • Relative SLA (sla.maxDuration) for ad-hoc pipelines with no predictable schedule. Clock starts at first sensor arrival and covers the entire lifecycle through completion. Warning at 75% of maxDuration (or breach minus expectedDuration). New events: RELATIVE_SLA_WARNING, RELATIVE_SLA_BREACH.
  • Edge case hardening: timezone-aware grace periods, cross-day T+1 arrival key lookup, multi-date inclusion lookback (capped at 3), resolveTimezone deduplication.

New types and events

  • InclusionConfig struct, ScheduleConfig.Include field
  • SLAConfig.MaxDuration field
  • SLAMonitorInput.MaxDuration and SensorArrivalAt fields
  • 3 event types: IRREGULAR_SCHEDULE_MISSED, RELATIVE_SLA_WARNING, RELATIVE_SLA_BREACH

Config validation

  • Cron and Include mutually exclusive
  • Inclusion dates must be valid YYYY-MM-DD
  • MaxDuration must be valid Go duration, capped at 24h, requires trigger

Adds two capabilities for pipelines that don't fit regular cron schedules:

Inclusion calendar (schedule.include.dates) for known irregular dates —
monthly close, quarterly filing, specific business dates. Watchdog detects
missed inclusion dates and publishes IRREGULAR_SCHEDULE_MISSED.

Relative SLA (sla.maxDuration) for ad-hoc pipelines with no predictable
schedule. Clock starts at first sensor arrival, covers the entire lifecycle
through completion. Warning at 75% or breach minus expectedDuration.

New types, 3 event types, config validation, store methods, stream-router
first-sensor-arrival tracking, sla-monitor relative path, watchdog
defense-in-depth, and ASL parameter passthrough.
Inclusion calendar grace period: detectMissedInclusionSchedules now
respects Schedule.Time before alerting, matching the existing cron path.
Resolves today in the pipeline's timezone to handle UTC date != local
date boundaries.

Cross-day relative SLA: detectRelativeSLABreaches checks both today and
yesterday for first-sensor-arrival keys, covering T+1 sensor pipelines.
Extracted checkRelativeSLAForDate helper.

Multi-date inclusion lookback: PastInclusionDates returns all past dates
(capped at 3 internally). Watchdog now checks each past date instead of
only the most recent.

Cleanup: replaced 4 inline timezone resolution patterns with
resolveTimezone() calls. Moved maxInclusionLookback cap into
PastInclusionDates.
@github-actions github-actions bot added tests Test changes lambda Lambda handlers deploy Deployment and ASL docs Documentation types Public types (pkg/types) labels Mar 10, 2026
@dwsmith1983 dwsmith1983 self-assigned this Mar 10, 2026
Fix gofmt formatting in sla_monitor_test.go. Invert if-condition pattern
to reduce nesting in watchdog_test.go event assertion loops (gocritic
nestingReduce).
Glue can report SUCCEEDED when Spark actually failed (driver exits 0
despite exception). Add Check 2 to verifyGlueRCA: scan driver output
stream for log4j ERROR/FATAL severity markers. Automatically catches
any error the application logs without maintaining a list of specific
failure patterns. Driver stdout (not stderr) avoids JVM noise false
positives.
@github-actions github-actions bot added the triggers Trigger types label Mar 11, 2026
@dwsmith1983 dwsmith1983 merged commit 4924e86 into main Mar 11, 2026
6 checks passed
@dwsmith1983 dwsmith1983 deleted the feat/irregular-relative-sla branch March 11, 2026 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deploy Deployment and ASL docs Documentation lambda Lambda handlers tests Test changes triggers Trigger types types Public types (pkg/types)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant