I am finetuning the minDALL-E model on a self-made dataset but my tokenized text prompts are sometimes longer than 64. What would be the best technique to increase the length of the positional encodings to e.g. 128? I was thinking of keeping the original 64 embeddings and appending 64 more, which have to be trained from scratch. However, I think it might mess with the finetuning, since the embeddings are in the very first layer.
Are there better options/techniques to accomplish this?
I am finetuning the minDALL-E model on a self-made dataset but my tokenized text prompts are sometimes longer than 64. What would be the best technique to increase the length of the positional encodings to e.g. 128? I was thinking of keeping the original 64 embeddings and appending 64 more, which have to be trained from scratch. However, I think it might mess with the finetuning, since the embeddings are in the very first layer.
Are there better options/techniques to accomplish this?