Skip to content

zhangda1018/UWBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding

Author: Xuelong Li, Da Zhang, Chenggang Rong, Zhiyuan Zhao, Junyu Gao

Institute of Artificial Intelligence (TeleAI), China Telecom

This is the official repository for paper "UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding". [paper] [UWBench]

Please share a STAR ⭐ if this project does help~

📢 Latest Updates

This is an ongoing project. We will be working on improving it.

  • 📦 All model employments tutorial coming soon! 🚀
  • 📄 Training & Inference results of all model will be published! 🚀
  • Feb-27-2026: Underwater Understanding Dataset UWBench is released. [huggingface] 🔥🔥
  • Oct-10-2025: paper is released. 🔥🔥

UWBench Description

UWBench is a comprehensive benchmark specifically designed for underwater vision-language understanding. It comprises 15K high-resolution underwater images captured across diverse aquatic environments, encompassing oceans, coral reefs, and deep-sea habitats. Each image is enriched with human-verified annotations including 15,281 object referring expressions that precisely describe marine organisms and underwater structures, and 124,983 question-answer pairs covering diverse reasoning capabilities from object recognition to ecological relationship understanding. The dataset captures rich variations in visibility, lighting conditions, and water turbidity, providing a realistic testbed for model evaluation.

🔨 UWBench Construction

This pipeline initiates with multi-source underwater image acquisition via web mining, public datasets, and in-situ photography. Subsequent attribute extraction systematically categorizes environmental, taxonomic, and morphological features. Prompt engineering then directs GPT-5 to synthesize comprehensive captions, referring expressions, and visual QA pairs. Finally, a rigorous three-stage validation protocol ensures annotation fidelity, yielding a robust, ecologically representative underwater vision-language dataset.

🚀 Training and Inference

We have released V1, which only reports the test results. Our work is still ongoing, and the next version including training details will be coming soon.

🌋 UWBench Download

The download link of the UWBench is here! 🚀

Download link: https://huggingface.co/datasets/da1018/UWBench

📦 Performance

1.Image Captioning

2.Object Referring

3.Visual Question Answering

👁️ Visualization

1.Image Captioning

2.Object Referring

3.Visual Question Answering

📜 Citation

@article{zhang2025uwbench,
  title={UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding},
  author={Zhang, Da and Rong, Chenggang and Li, Bingyu and Wang, Feiyu and Zhao, Zhiyuan and Gao, Junyu and Li, Xuelong},
  journal={arXiv preprint arXiv:2510.18262},
  year={2025}
}

🙏 Acknowledgement

We are thankful to VRSBench and CLAIR for releasing their models and code as open-source contributions.

🤖 Contact

If you have any questions about this project, please feel free to contact zhangda1018@126.com.

About

UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages