Hello, thank you for your open-source work!!
I wanted to ask about the part mentioned in your paper where the OFA model was used to generate frame-level captions. Were these captions ultimately used as training data? It seems that the captions refined by GPT already cover the video content, offering higher quality and more comprehensive descriptions. In that case, are the OFA captions still necessary?
Thanks~~
Hello, thank you for your open-source work!!
I wanted to ask about the part mentioned in your paper where the OFA model was used to generate frame-level captions. Were these captions ultimately used as training data? It seems that the captions refined by GPT already cover the video content, offering higher quality and more comprehensive descriptions. In that case, are the OFA captions still necessary?
Thanks~~