Reset retention timer when schedule is unpaused#9862
Reset retention timer when schedule is unpaused#9862gurudatta-patil wants to merge 13 commits intotemporalio:mainfrom
Conversation
|
Thank you for you contribution! Could update the |
Co-authored-by: Alex Stanfield <13949480+chaptersix@users.noreply.github.com>
Co-authored-by: Alex Stanfield <13949480+chaptersix@users.noreply.github.com>
|
Updated the chasm/lib/scheduler path |
|
run |
Co-authored-by: Alex Stanfield <13949480+chaptersix@users.noreply.github.com>
|
Everything else LGTM. |
|
Also, |
|
Also noticed that last change. |
|
Sorry I thought I deleted it, might have accidentally typed incomplete rm command. |
There was a problem hiding this comment.
The script to generate history doesn't exercise the exit-on-idle functionality so this history won't test anything useful. You'd have to add some logic for that to the script.
There was a problem hiding this comment.
Good point. I can modify generate_history.sh to include a long pause followed by unpause so that the retention timer reset is actually exercised. Or if doesn't add any value, we could delete it.
| NextTimeCacheV2Size: 14, // see note below | ||
| SpecFieldLengthLimit: 10, | ||
| Version: TriggerImmediatelyTimestamp, | ||
| Version: ResetRetentionOnUnpause, |
There was a problem hiding this comment.
In general, you can't add a new version and switch to it in the same change, that could cause non-determinism errors on server rollback. There has to be at least one release in between the new logic and changing the default. (It could be backporting the change to the previous minor release.)
In this case.. it might be possible to make an argument that the only workflows that would hit a nondeterminism error on rollback have exited due to retention. I'm not totally sure that works though
There was a problem hiding this comment.
That's a fair concern. Any workflow that would hit a non-determinism error on rollback (i.e., one that unpaused after a long pause and had ResetRetentionOnUnpause execute) would have already exited via the retention path, so there'd be nothing to roll back into. If you'd prefer to follow the standard two-release pattern to be safe, I'm happy to split this: introduce ResetRetentionOnUnpause = 13 in this PR but keep Version: TriggerImmediatelyTimestamp as the default, then bump the version in a follow-up.
What should we move forward with?
What changed?
Fixed schedule retention on unpause.
If a schedule stayed paused longer than retention, unpausing could close the scheduler workflow right away.
Now unpause resets the retention timer so retention starts from unpause time.
Also added a regression unit test for this case.
Why?
Paused time should not cause the schedule to age out immediately after unpause.
This matches expected schedule behavior.
How did you test it?
Ran the test before change and after change. Since minor change, built and created a unit test
Potential risks
Small behavior change around retention timing after unpause.
Fixes #9752