Finetuning silently resumes from checkpoint when output_dir contains a previous run

When `output_dir` is passed to `fit()` and that directory already contains a checkpoint file, training automatically resumes from that checkpoint (loads model weights, restores optimizer state, starts from the saved epoch). There is no option to opt out, if the directory has a checkpoint, it resumes.

In my opinion, calling `fit()` should start fresh training by default. The current behavior means re-running a script with the same output_dir gives different results than running it with a clean directory, with no explicit user opt-in.

My suggestion would be to completely remove the resume behavior, it is pretty uncommon to need this when finetuning and in case someone needs it, can just directly provide the correct `model_path` manually.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetuning silently resumes from checkpoint when output_dir contains a previous run #814

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Finetuning silently resumes from checkpoint when output_dir contains a previous run #814

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions