When output_dir is passed to fit() and that directory already contains a checkpoint file, training automatically resumes from that checkpoint (loads model weights, restores optimizer state, starts from the saved epoch). There is no option to opt out, if the directory has a checkpoint, it resumes.
In my opinion, calling fit() should start fresh training by default. The current behavior means re-running a script with the same output_dir gives different results than running it with a clean directory, with no explicit user opt-in.
My suggestion would be to completely remove the resume behavior, it is pretty uncommon to need this when finetuning and in case someone needs it, can just directly provide the correct model_path manually.
When
output_diris passed tofit()and that directory already contains a checkpoint file, training automatically resumes from that checkpoint (loads model weights, restores optimizer state, starts from the saved epoch). There is no option to opt out, if the directory has a checkpoint, it resumes.In my opinion, calling
fit()should start fresh training by default. The current behavior means re-running a script with the same output_dir gives different results than running it with a clean directory, with no explicit user opt-in.My suggestion would be to completely remove the resume behavior, it is pretty uncommon to need this when finetuning and in case someone needs it, can just directly provide the correct
model_pathmanually.