Skip to content

Finetuning silently resumes from checkpoint when output_dir contains a previous run #814

@psinger-prior

Description

@psinger-prior

When output_dir is passed to fit() and that directory already contains a checkpoint file, training automatically resumes from that checkpoint (loads model weights, restores optimizer state, starts from the saved epoch). There is no option to opt out, if the directory has a checkpoint, it resumes.

In my opinion, calling fit() should start fresh training by default. The current behavior means re-running a script with the same output_dir gives different results than running it with a clean directory, with no explicit user opt-in.

My suggestion would be to completely remove the resume behavior, it is pretty uncommon to need this when finetuning and in case someone needs it, can just directly provide the correct model_path manually.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions