Thank the authors for the great work!
I have a question regarding the Sekai-Real-HQ dataset used in Yume 1.0 and 1.5 - aside from that version 1.5 used InternVL3-78B to generate event-aware descriptions, are there any other differences? If possible, could you also share the specific prompts used?
Thanks in advance!