Offical repository of NerIPS 2024 Dataset and Benchmark Track paper, "ROVD: An Open-Ended Video Detection Dataset Toward Real-World Perception".
We unveil the Real-world Open-ended Video Detection (ROVD) dataset, which encompasses 100 videos, 4K resolution, 10 minutes of long duration each video, 5 urban scenes, 1.8M frames in total and 30K frames with fine annotation. In particular, such a video collection captures many real-world characteristics, offering an egocentric viewpoint and covering a range of spatial and temporal dynamics. It also includes a diverse array of scene representations, complex environmental conditions, and rich visual content, making it an invaluable resource for advancing research in realistic AI perception.
The source data and annotations can be accessed from Google Drive. Welcome to access ROVD via this link.

