Skip to content

Some questions #1

@scarbain

Description

@scarbain

Hi,

Thanks for sharing the code.

Have you run a finetuning of Ella on SD1.5 ?

Also, shouldn't it be trained on only one timestep instead of a full generation ? And in the paper they mentioned a weight decay of 0.01

And perhaps using a training script from diffusers as the base could be better, it would allow using xformers, different dtypes, batch size & grad acc, adam decay, ...

I'm currently running a finetune of the existing weights for SD1.5 as a test (LR 1e-5, xformers + fp16 for the pipeline and fp16 for the T5encoder), I'll let it run for a few hours.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions