At least while we do not have support for tasks, we need an update to the flux-operator prompt that ensures that it at least has some criteria for assessing the completeness of a job. It ideally is a completed or failed state. I just hit one agent run (and note this is rare) that seems to be almost impatient, creating multiple jobs with different problem sizes instead of deciding to do another sleep with increased time. Every other time I've run this (and last year) the agent knew to retry.
At least while we do not have support for tasks, we need an update to the flux-operator prompt that ensures that it at least has some criteria for assessing the completeness of a job. It ideally is a completed or failed state. I just hit one agent run (and note this is rare) that seems to be almost impatient, creating multiple jobs with different problem sizes instead of deciding to do another sleep with increased time. Every other time I've run this (and last year) the agent knew to retry.