Skip to content

Reset retention timer when schedule is unpaused#9862

Open
gurudatta-patil wants to merge 13 commits intotemporalio:mainfrom
gurudatta-patil:fix/schedule-retention-on-unpause
Open

Reset retention timer when schedule is unpaused#9862
gurudatta-patil wants to merge 13 commits intotemporalio:mainfrom
gurudatta-patil:fix/schedule-retention-on-unpause

Conversation

@gurudatta-patil
Copy link
Copy Markdown

@gurudatta-patil gurudatta-patil commented Apr 8, 2026

What changed?

Fixed schedule retention on unpause.

If a schedule stayed paused longer than retention, unpausing could close the scheduler workflow right away.
Now unpause resets the retention timer so retention starts from unpause time.

Also added a regression unit test for this case.

Why?

Paused time should not cause the schedule to age out immediately after unpause.
This matches expected schedule behavior.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Ran the test before change and after change. Since minor change, built and created a unit test

Potential risks

Small behavior change around retention timing after unpause.

Fixes #9752

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 8, 2026

CLA assistant check
All committers have signed the CLA.

Comment thread service/worker/scheduler/workflow_test.go Outdated
Comment thread service/worker/scheduler/workflow.go Outdated
@chaptersix
Copy link
Copy Markdown
Contributor

chaptersix commented Apr 8, 2026

Thank you for you contribution! Could update the chasm/lib/scheduler path as well?

gurudatta-patil and others added 3 commits April 8, 2026 17:38
Co-authored-by: Alex Stanfield <13949480+chaptersix@users.noreply.github.com>
Co-authored-by: Alex Stanfield <13949480+chaptersix@users.noreply.github.com>
@gurudatta-patil
Copy link
Copy Markdown
Author

gurudatta-patil commented Apr 8, 2026

Updated the chasm/lib/scheduler path

Comment thread chasm/lib/scheduler/scheduler.go Outdated
Comment thread service/worker/scheduler/workflow.go
@chaptersix
Copy link
Copy Markdown
Contributor

run make lint locally to view and fix lint errors.

gurudatta-patil and others added 2 commits April 13, 2026 18:38
Co-authored-by: Alex Stanfield <13949480+chaptersix@users.noreply.github.com>
@chaptersix
Copy link
Copy Markdown
Contributor

Everything else LGTM.
@lina-temporal could you take a look as well?

@chaptersix
Copy link
Copy Markdown
Contributor

Also, testdata/replay_1776127911.json.gz is empty (valid gzip wrapping zero bytes — likely a failed temporal workflow show capture). It should be deleted; TestReplays will fail trying to parse it.

@chaptersix
Copy link
Copy Markdown
Contributor

chaptersix commented Apr 16, 2026

Also noticed that testdata/replay_1776135924.json.gz is still in the tree — it has the same content as the newly added replay_with_reset_retention_on_unpause.json.gz (same blob hash). The old timestamp-named file can be deleted.

last change.

@gurudatta-patil
Copy link
Copy Markdown
Author

Sorry I thought I deleted it, might have accidentally typed incomplete rm command.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script to generate history doesn't exercise the exit-on-idle functionality so this history won't test anything useful. You'd have to add some logic for that to the script.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I can modify generate_history.sh to include a long pause followed by unpause so that the retention timer reset is actually exercised. Or if doesn't add any value, we could delete it.

NextTimeCacheV2Size: 14, // see note below
SpecFieldLengthLimit: 10,
Version: TriggerImmediatelyTimestamp,
Version: ResetRetentionOnUnpause,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, you can't add a new version and switch to it in the same change, that could cause non-determinism errors on server rollback. There has to be at least one release in between the new logic and changing the default. (It could be backporting the change to the previous minor release.)

In this case.. it might be possible to make an argument that the only workflows that would hit a nondeterminism error on rollback have exited due to retention. I'm not totally sure that works though

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair concern. Any workflow that would hit a non-determinism error on rollback (i.e., one that unpaused after a long pause and had ResetRetentionOnUnpause execute) would have already exited via the retention path, so there'd be nothing to roll back into. If you'd prefer to follow the standard two-release pattern to be safe, I'm happy to split this: introduce ResetRetentionOnUnpause = 13 in this PR but keep Version: TriggerImmediatelyTimestamp as the default, then bump the version in a follow-up.
What should we move forward with?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unpaused schedules should not be immediately deleted

4 participants