The Lecture Video Visual Objects (LVVO) dataset is a benchmark designed for object detection in lecture video frames. It provides high-quality annotations of visual content such as tables, charts, images, and illustrations in real university lecture recordings.
📄 Our Arxiv Paper can be found here: Lecture Video Visual Objects (LVVO) Dataset:
A Benchmark for Visual Object Detection in Educational Videos
📥 The dataset can be downloaded here: LVVO Dataset Download
-
Total Images: 4,000 unique frames extracted from lecture videos
-
Manually Annotated Subset (LVVO 1k): 1,000 frames
-
Automatically Labeled Subset (LVVO 3k): 3,000 frames
-
Source: Lecture recordings from videopoints.org, covering 8 instructors across 13 courses and 3 domains (Biology, Computer Science, Geosciences)
-
Annotation Tool: VoTT by Microsoft
Each visual object in the images is labeled with one of the following categories:
| Category ID | Name |
|---|---|
| 1 | Table |
| 2 | Chart-Graph |
| 3 | Photographic-image |
| 4 | Visual-illustration |
To help users quickly explore the dataset structure and format, we provide a sample version containing 10 annotated images.
The complete dataset is hosted on Google Drive and includes three files:
| File Name | Description |
|---|---|
LVVO 1k withCategories.zip |
1,000 manually annotated images with categories |
LVVO 1k.zip |
Same images, single-class annotations |
LVVO 3k.zip |
3,000 images with automatic bounding boxes |
Each version follows this internal structure:
LVVO_x/
├── images/ # Contains all .jpg images
├── labels/ # Contains corresponding annotation files (.json)
└── dataset_info.json # Metadata: category names, image ID mappings
Each file in the labels/ folder is a JSON annotation corresponding to an image in images/.
It contains:
-
asset: Image metadataname: Image file name (e.g.,i116_c425_v7624_i_0146.jpg)image_id: Unique integer identifiersize: Dictionary with imagewidthandheightin pixels
-
objects: A list of annotated visual elements, each containing:class: Category ID
(1= Table,2= Chart-Graph,3= Photographic-image,4= Visual-illustration)boundingBox: Bounding box details, including:xmin,ymin: Top-left cornerxmax,ymax: Bottom-right cornerwidth,height: Box dimensions in pixels (optional but included)
Note: For the LVVO_3k Automatically Labeled Subset, category information is not available.
All objects are labeled withclass = 1, where1simply denotes “object” as a general category.
If you use this dataset in your research, please cite:
@article{biswas2025lvvo,
title={Lecture Video Visual Objects (LVVO) Dataset: A Benchmark for Visual Object Detection in Educational Videos},
author={Dipayan Biswas and Shishir Shah and Jaspal Subhlok},
journal={arXiv preprint arXiv:2406.00123},
year={2025}
}
This repository includes metadata derived from two external datasets (LDD and LPM), adapted using a consistent filtering pipeline for compatibility with the LVVO dataset.
For details on how these external datasets were processed and which files were retained or excluded, refer to:
📄 meta-files/README.metadata.md
This repository includes multiple datasets and metadata files, each under a separate license:
- LVVO Dataset: licensed under CC BY 4.0.
- LDD-derived metadata (
meta-files/ldd_filename_selected.csv):- Derived from the Lecture Design Dataset, licensed under CC BY 4.0.
- LPM-derived metadata (
meta-files/lpm_filename_selected.csv):- Derived from the LPM Dataset, licensed under CC BY-NC-SA 4.0.
See LICENSE.txt for full details.
For questions or additional information, please contact the author at dipayan1109033@gmail.com.