Institute of Artificial Intelligence (TeleAI), China Telecom
This is the official repository for paper "UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding". [paper] [UWBench]
Please share a STAR ⭐ if this project does help~
This is an ongoing project. We will be working on improving it.
- 📦 All model employments tutorial coming soon! 🚀
- 📄 Training & Inference results of all model will be published! 🚀
- Feb-27-2026: Underwater Understanding Dataset UWBench is released. [huggingface] 🔥🔥
- Oct-10-2025: paper is released. 🔥🔥
UWBench is a comprehensive benchmark specifically designed for underwater vision-language understanding. It comprises 15K high-resolution underwater images captured across diverse aquatic environments, encompassing oceans, coral reefs, and deep-sea habitats. Each image is enriched with human-verified annotations including 15,281 object referring expressions that precisely describe marine organisms and underwater structures, and 124,983 question-answer pairs covering diverse reasoning capabilities from object recognition to ecological relationship understanding. The dataset captures rich variations in visibility, lighting conditions, and water turbidity, providing a realistic testbed for model evaluation.
This pipeline initiates with multi-source underwater image acquisition via web mining, public datasets, and in-situ photography. Subsequent attribute extraction systematically categorizes environmental, taxonomic, and morphological features. Prompt engineering then directs GPT-5 to synthesize comprehensive captions, referring expressions, and visual QA pairs. Finally, a rigorous three-stage validation protocol ensures annotation fidelity, yielding a robust, ecologically representative underwater vision-language dataset.
We have released V1, which only reports the test results. Our work is still ongoing, and the next version including training details will be coming soon.
The download link of the UWBench is here! 🚀
Download link: https://huggingface.co/datasets/da1018/UWBench
1.Image Captioning
2.Object Referring
3.Visual Question Answering
1.Image Captioning
2.Object Referring
3.Visual Question Answering
@article{zhang2025uwbench,
title={UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding},
author={Zhang, Da and Rong, Chenggang and Li, Bingyu and Wang, Feiyu and Zhao, Zhiyuan and Gao, Junyu and Li, Xuelong},
journal={arXiv preprint arXiv:2510.18262},
year={2025}
}We are thankful to VRSBench and CLAIR for releasing their models and code as open-source contributions.
If you have any questions about this project, please feel free to contact zhangda1018@126.com.









