This is a repository for visual language models in remote sensing, including advanced methods and commonly used datasets in different applications, such as image-text retrieval, visual question answering, pretraining, etc.
If you find any relevant papers that are not included here, please feel free to pull requests at any time.
| Paper | Published in | Code/Project |
|---|---|---|
| Vision-Language Models in Remote Sensing: Current Progress and Future Trends | arxiv 2023 | - |
| The Potential of Visual ChatGPT For Remote Sensing | arxiv 2023 | - |
| Brain-inspired Remote Sensing Foundation Models and Open Problems: A Comprehensive Survey | JSTARG 2023 | - |
| Paper | Published in | Code/Project |
|---|---|---|
| RSGPT: A Remote Sensing Vision Language Model and Benchmark | arxiv 2023 | code |
| RemoteGLM | 2023 | code |
| Tree-GPT: Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive Analysis | arxiv 2023 | - |
| Towards Automatic Satellite Images Captions Generation Using Large Language Models | arxiv 2023 | - |
| GeoChat: Grounded Large Vision-Language Model for Remote Sensing | arxiv 2023 | code |
| SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing | AAAI 2024 | code |
| Paper | Published in | Code/Project |
|---|---|---|
| S-CLIP: Semi-supervised Vision-Language Pre-training using Few Specialist Captions | arxiv 2023 | code |
| RemoteCLIP: A Vision Language Foundation Model for Remote Sensing | arxiv 2023 | code |
| RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model | arxiv 2023 | Project |
| Paper | Published in | Code/Project |
|---|---|---|
| Retro-Remote Sensing: Generating Images From Ancient Texts | J-STARS 2019 | - |
| Remote sensing image augmentation based on text description for waterside change detection | Remote Sensing 2021 | - |
| Text-to-remote-sensing-image generation with structured generative adversarial networks | GRSL 2021 | - |
| Txt2img-MHN:Remote sensing image generation from text using modern hopfield network | arxiv 2022 | code |
| Paper | Published in | Code/Project |
|---|---|---|
| Visual Grounding in Remote Sensing Images | ACMMM 2022 | data |
| RSVG: Exploring data and models for visual grounding on remote sensing data | TGRS 2023 | code |
| Paper | Published in | Code/Project |
|---|---|---|
| Text semantic fusion relation graph reasoning for few-shot object detection on remote sensing images | Remote Sensing 2023 | - |
| Few-shot object detection in aerial imagery guided by textmodal knowledge | TGRS 2023 | - |
| Dataset | Home/Github | Download link |
|---|---|---|
| RSICD | Github | [BaiduYun] [Google Drive] |
| Sydney-Captions | Github | [BaiduYun] |
| UCM-Captions | Github | [BaiduYun] |
| NWPU-RESISC45 | Github | [BaiduYun] [OneDrive] |
| DIOR-Captions | - | - |
| RS-5M | Github | [HuggingFace] |
| LEVIR-CC | Github | Google Drive |
| SkyScript | github |
| Dataset | Home/Project | Download link |
|---|---|---|
| RSITMD | Github | [BaiduYun] [Google Drive] |
| Dataset | Home/Project | Download link |
|---|---|---|
| RSVQA | Home | [data] |
| RSVQA×BEN | [Github] [Home] | - |
| RSIVQA | Github | - |
| CDVQA | Github | - |
| Dataset | Home/Project | Download link |
|---|---|---|
| DIOR-RSVG | Github | [Google Drive] |
| Dataset | Home/Project | Download link |
|---|---|---|
| NWPU-RESISC45 | Home | [OneDrive] [BaiduYun] |
| AID | Home | [OneDrive] [BaiduYun] |
| UC Merced Land-Use(UCM) | Home | - |
| SATIN | Home | [HuggingFace] |
| Dataset | Home/Project | Download link |
|---|---|---|
| NWPU VHR-10 | Home | [OneDrive] [BaiduYun] |
| DIOR | Home | [Google Drive] [BaiduYun] |
| FAIR1M | - | [BaiduYun] |
| Dataset | Home/Project | Download link |
|---|---|---|
| Vaihingen | Home | [BaiduYun] |
| Potsdam | Home | [BaiduYun] |
| Toronto | Home | - |
| GID | Home | [BaiduYun code:GID5] [OneDrive] |