feat: inclusion calendar and relative SLA for irregular pipelines#65
Merged
dwsmith1983 merged 4 commits intomainfrom Mar 11, 2026
Merged
feat: inclusion calendar and relative SLA for irregular pipelines#65dwsmith1983 merged 4 commits intomainfrom
dwsmith1983 merged 4 commits intomainfrom
Conversation
Adds two capabilities for pipelines that don't fit regular cron schedules: Inclusion calendar (schedule.include.dates) for known irregular dates — monthly close, quarterly filing, specific business dates. Watchdog detects missed inclusion dates and publishes IRREGULAR_SCHEDULE_MISSED. Relative SLA (sla.maxDuration) for ad-hoc pipelines with no predictable schedule. Clock starts at first sensor arrival, covers the entire lifecycle through completion. Warning at 75% or breach minus expectedDuration. New types, 3 event types, config validation, store methods, stream-router first-sensor-arrival tracking, sla-monitor relative path, watchdog defense-in-depth, and ASL parameter passthrough.
Inclusion calendar grace period: detectMissedInclusionSchedules now respects Schedule.Time before alerting, matching the existing cron path. Resolves today in the pipeline's timezone to handle UTC date != local date boundaries. Cross-day relative SLA: detectRelativeSLABreaches checks both today and yesterday for first-sensor-arrival keys, covering T+1 sensor pipelines. Extracted checkRelativeSLAForDate helper. Multi-date inclusion lookback: PastInclusionDates returns all past dates (capped at 3 internally). Watchdog now checks each past date instead of only the most recent. Cleanup: replaced 4 inline timezone resolution patterns with resolveTimezone() calls. Moved maxInclusionLookback cap into PastInclusionDates.
Fix gofmt formatting in sla_monitor_test.go. Invert if-condition pattern to reduce nesting in watchdog_test.go event assertion loops (gocritic nestingReduce).
Glue can report SUCCEEDED when Spark actually failed (driver exits 0 despite exception). Add Check 2 to verifyGlueRCA: scan driver output stream for log4j ERROR/FATAL severity markers. Automatically catches any error the application logs without maintaining a list of specific failure patterns. Driver stdout (not stderr) avoids JVM noise false positives.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
schedule.include.dates) for pipelines on known irregular dates (monthly close, quarterly filing). Watchdog detects missed inclusion dates and publishesIRREGULAR_SCHEDULE_MISSEDwith grace period support viaSchedule.Timeand timezone awareness.sla.maxDuration) for ad-hoc pipelines with no predictable schedule. Clock starts at first sensor arrival and covers the entire lifecycle through completion. Warning at 75% of maxDuration (or breach minus expectedDuration). New events:RELATIVE_SLA_WARNING,RELATIVE_SLA_BREACH.New types and events
InclusionConfigstruct,ScheduleConfig.IncludefieldSLAConfig.MaxDurationfieldSLAMonitorInput.MaxDurationandSensorArrivalAtfieldsIRREGULAR_SCHEDULE_MISSED,RELATIVE_SLA_WARNING,RELATIVE_SLA_BREACHConfig validation