Request for inference code of Emu2, Seed-Llama and minigpt5

In your paper, I saw you test these three models to do the interleaved format generation task. However, when I'm trying to use these models, they cannot do the interleaved generation.(especially emu2 which officially only support text output, seed-llama's sft model doesn't support interleaved output). So I'm curious about how did you do that. Is there any tricks? Or I should finetune the model?