Skip to content

wanghao9610/X2SAM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

✨X2SAM✨

Any Segmentation in Images and Videos

Hao Wang1,2, Limeng Qiao3, Chi Zhang3, Guanglu Wan3, Lin Ma3, Xiangyuan Lan2📧, Xiaodan Liang1📧

1 Sun Yat-sen University, 2 Peng Cheng Laboratory, 3 Meituan Inc.

📧 Corresponding author

👀 Notice

Note: X2SAM is under active development, and we will continue to update the code and documentation. Please check TODO to get our development schedule.

We strongly recommend that everyone uses English to communicate in issues. This helps developers from around the world discuss, share experiences, and answer questions together.

If you have any questions or would like to collaborate, please feel free to open an issue or reach out to me at wanghao9610@gmail.com.

💥 Updates

  • 2026-03-20: We create the X2SAM repository.

🚀 Highlights

This repository provides the official PyTorch implementation, pre-trained models, training, evaluation, visualization, and demo code of X2SAM:

  • TODO

  • TODO

  • TODO

📖 Table of Contents

🔖 Abstract

This is the abstract of X2SAM.

🔍 Framework

Figure 1: The overview of X2SAM.

📊 Benchmarks

🏁 Quickstart

1. Structure

We provide a detailed project structure for X-SAM. Please follow this structure to organize the project.

📁 Project (Click to collapse)

2. Enviroment


3. Dataset

Please refer to datasets.md for detailed instructions on data preparation.

4. Model

Download our pre-trained models from HuggingFace and place them in the inits directory.

5. Training

⏳ Coming soon...

6. Evaluation

Image and Video Segmentation Benchmarks Evaluation

⏳ Coming soon...

Image and Video Chat Benchmarks Evaluation

⏳ Coming soon...

💻 Demo

Local Demo

🏞️ / 🎥 Inference(Click to collapse)

Web Demo

🛠️ Deployment (Click to collapse)

✅ TODO

  • Release the paper on arXiv.
  • Release the pre-trained models.
  • Release the demo website.
  • Release the demo instructions.
  • Release the evaluation code.
  • Release the training code.

😊 Acknowledge

This project has referenced some excellent open-sourced repos (xtuner, VLMEvalKit, X-SAM). Thanks for their wonderful works and contributions to the community!

📌 Citation

If you find X2SAM and X-SAM are helpful for your research or applications, please consider giving us a star 🌟 and citing the following papers by the following BibTex entry.

@article{wang2026x2sam,
  title={X2SAM: Any Segmentation in Images and Videos},
  author={Wang, Hao and Qiao, Limeng and Zhang, Chi and Wan, Guanglu and Ma, Lin and Lan, Xiangyuan and Liang, Xiaodan},
  journal={arXiv preprint arXiv:2603.00000},
  year={2026}
}

@article{wang2025xsam,
  title={X-SAM: From Segment Anything to Any Segmentation},
  author={Wang, Hao and Qiao, Limeng and Jie, Zequn and Huang, Zhijian and Feng, Chengjian and Zheng, Qingfang and Ma, Lin and Lan, Xiangyuan and Liang, Xiaodan},
  journal={arXiv preprint arXiv:2508.04655},
  year={2025}
}

About

X2SAM: Any Segmentation in Images and Videos

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors