Fix generating custom cutoffs for quarters#185
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug in quarterly data handling where end-of-quarter dates caused incorrect custom cutoff generation. The issue occurred because subtracting months doesn't account for varying month lengths (e.g., September 30th minus 6 months equals March 30th, not March 31st). The fix switches from a subtraction-based to an addition-based approach for calculating cutoffs.
Changes:
- Modified
generate_custom_cutoffsto use addition instead of subtraction when calculating maximum cutoff - Updated
cross_validationvalidation logic to check if max cutoff plus horizon exceeds end date - Added test cases for quarterly end-of-month scenarios
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| runtime/databricks/automl_runtime/forecast/utils.py | Changed max_cutoff calculation from subtraction to addition-based approach |
| runtime/databricks/automl_runtime/forecast/prophet/diagnostics.py | Updated validation logic to use addition instead of subtraction for consistency |
| runtime/tests/automl_runtime/forecast/utils_test.py | Added test case for quarterly data with end-of-quarter dates |
| runtime/tests/automl_runtime/forecast/prophet/diagnostics_test.py | Added test case for cross-validation with month-end cutoffs |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| cutoffs = generate_custom_cutoffs(df, horizon=7, frequency_unit="QS", split_cutoff=pd.Timestamp('2020-07-12 00:00:00')) | ||
| self.assertEqual([pd.Timestamp('2020-07-12 00:00:00'), pd.Timestamp('2020-10-12 00:00:00')], cutoffs) | ||
|
|
||
| def test_generate_custom_cutoffs_success_quaterly_end(self): |
There was a problem hiding this comment.
Corrected spelling of 'quaterly' to 'quarterly'.
There was a problem hiding this comment.
Typo is in other places too. Will fix in followup PR.
| @@ -219,8 +219,8 @@ def generate_custom_cutoffs(df: pd.DataFrame, horizon: int, frequency_unit: str, | |||
| # First cutoff is the cutoff bewteen splits | |||
There was a problem hiding this comment.
Corrected spelling of 'bewteen' to 'between'.
There was a problem hiding this comment.
Will fix in followup PR
Summary:
Currently we have a bug with quarterly data based on end-of-quarter dates due to months not having the same number of days (September 30th minus 6 months is March 30th, which is less than March 31st breaking custom cutoff generation logic). To combat this, we adopt a addition instead of a subtraction based approach, as dates past the last date of a month will be rounded down to the last date of that month.
Test plan:
New tests in diagnostics_test and utils_test.