Hi, thanks for your great work! When doing the grounding task, will we input all the video sequences into the LLM or just an image?
Hi, thanks for your great work! When doing the grounding task, will we input all the video sequences into the LLM or just an image?