Some questions

Hi, 

Thanks for sharing the code. 

Have you run a finetuning of Ella on SD1.5 ?

Also, shouldn't it be trained on only one timestep instead of a full generation ? And in the paper they mentioned a weight decay of 0.01

And perhaps using a training script from diffusers as the base could be better, it would allow using xformers, different dtypes, batch size & grad acc, adam decay, ...

I'm currently running a finetune of the existing weights for SD1.5 as a test (LR 1e-5, xformers + fp16 for the pipeline and fp16 for the T5encoder), I'll let it run for a few hours.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Some questions #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions