Thanks for the great work. I have a question on fair comparison with Conceptual ∪ COCO.
In the experiments on dataset source, you compared the model trained in Conceptual ∪ COCO datasets. For a fair comparison, you mentioned
for a fair comparison, we train for the same number of steps as 5 epochs on our dataset.
However, 5 epochs means the model has seen all 180M segment-transcripts pairs. As you've mentioned in the paper, there will be lesser overfitting issues.
I think the proper way should be to train your model on 3M segment-transcript pairs / 3M videos.
Thanks for the great work. I have a question on fair comparison with Conceptual ∪ COCO.
In the experiments on
dataset source, you compared the model trained in Conceptual ∪ COCO datasets. For a fair comparison, you mentionedHowever,
5 epochsmeans the model has seen all 180M segment-transcripts pairs. As you've mentioned in the paper, there will be lesser overfitting issues.I think the proper way should be to train your model on 3M segment-transcript pairs / 3M videos.