add 1-hour and 2-hour retries for all tasks except pp-starter#243
Conversation
…ter. Also, set runahead limit lower due to cylc error.
|
@singhd789 thank you for holding your nose on the lack of tests for these rapid fixes. True to form, that last PR was buggy. 99999 was too high but cylc validate didn't catch it. I saw it when I tested the retries. |
|
|
||
| [[pp-starter]] | ||
| inherit = PP-STARTER | ||
| # Note: pp-starter is a fake task and should be rephrased as a proper trigger. |
There was a problem hiding this comment.
Wait how is it a "fake" task?
There was a problem hiding this comment.
It does zero productive work. It's merely a trigger for the other tasks. Cylc has proper mechanisms for triggers, and we just never got there yet. You can see the modern triggers in Alex's workflows
There was a problem hiding this comment.
Ah I see. I guess I saw this as a task because it does work to resolve the target file. Either way, it's just a comment I suppose
| [runtime] | ||
| [[root]] | ||
| # retry all tasks twice- once after an hour, second after two hours | ||
| execution retry delays = PT1H, PT2H |
There was a problem hiding this comment.
Hmm, does this mean that if a task stalls/fails, the workflow will be hanging for an hour or 2 for retries?
There was a problem hiding this comment.
Indeed. This is not desired behavior for the testing pipelines, is it?
There was a problem hiding this comment.
It miiiight be ok for the test_cloud_runner actually because we have this: https://github.com/NOAA-GFDL/fre-workflows/blob/main/for_gh_runner/runscript.sh#L121
There was a problem hiding this comment.
I was just thinking that might not be desired by users or maybe for the local test, but if we need it, we can work with it
There was a problem hiding this comment.
Hm wait, the default behavior might already be an hour
There was a problem hiding this comment.
There's no default retry. I do think that this would be bad feature for the test pipelines, probably wasting cloud resources.
Describe your changes
Also, set runahead limit lower due to cylc error. (In Chris's defense,
cylc validatedid not flag the 99999 as too high)Issue ticket number and link (if applicable)
#242 #241
Checklist before requesting a review
Manual Pipeline Run Details
Was the manual pipeline (
test_cloud_runner) triggered for this PR?Result of manual pipeline run:
(Paste relevant logs, output, or a link to the workflow run here)
How to trigger the manual pipeline:
The
test_cloud_runnerpipeline is not automatically associated as a required check with the PR; it must be triggered to test changes in a full post-processing run.To trigger the manual pipeline:
Follow the link to the
test_cloud_runneractions tab hereClick the dropdown "Run workflow":
a. If trying to merge from a branch on fre-workflows: choose branch from the first drop down, leave the next 2 inputs blank, and choose the fre-cli branch to test
b. If trying to merge from a fre-workflows fork: can skip first branch selection, input the fork name (ex: [user]/fre-workflows), input the fork's branch name, and choose the fre-cli branch to test
Click "Run workflow"
Note: you may need to reload the page to see your running workflow.