SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery
The official repo for [CVPR'24] SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery .
Xin Guo*, Jiangwei Lao*, Bo Dang*, Yingying Zhang, Lei Yu, Lixiang Ru, Liheng Zhong, Ziyuan Huang, Kang Wu, Dingxiang Hu, Huimei He, Jian Wang, Jingdong Chen, Ming Yang †, Yongjun Zhang, Yansheng Li †
Updates | Abstract | Method | Installation | Usage | License | Citation
- 2025.08.04: Our latest work, SkySense++, has been accepted by Nature Machine Intelligence. The pretrained weights and code are available at this repository.
- 2024.06.17: SkySense has been accepted to CVPR2024. The pretrained weight is available at this repository.
- 2023.12.01: A collection of papers, datasets, benchmarks, code, and pretrained weights for Remote Sensing Foundation Models (RSFMs) is available here.
SkySense, a multi-modal remote sensing foundation model (MM-RSFM), features a modular design capable of handling diverse tasks ranging from single- to multi-modal, static to temporal, and classification to localization. The design incorporates three novel technical components: a) A factorized multi-modal spatiotemporal encoder to effectively process multi-modal temporal remote sensing imagery; b) Multi-Granularity Contrastive Learning that learns features at various levels of granularities to facilitate different tasks; and c) Geo-Context Prototype Learning to extract region-aware geo-context clue to enable implicit geo-knowledge integration. Extensive comparisons with 18 recently published RSFMs reveal that SkySense achieves the state-of-the-art performance.
😊 We hope the release of pre-trained weights will contribute to the Remote Sensing community and facilitate future research. 🚀🚀🚀
Step 1. Create a conda environment and activate it. Install gdal 3.4.0.
conda create --name skysense python=3.8
conda activate skysense
conda install -c conda-forge gdal=3.4.0
Step 2. Install PyTorch following official instructions. Pytorch 1.13.1 is recommend.
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 torchtext==0.14.1 pytorch-cuda=11.6 -c pytorch -c nvidia
Step 3. Install MMCV 1.7.1, MMDetection 2.28.2, MMCls 0.25.0 and MMSegmentation 0.30.0 using MIM.
pip install -U openmim
mim install mmcv-full==1.7.1
pip install mmcls==0.25.0 mmsegmentation==0.30.0 mmdet==2.28.2 yapf==0.40.1 timm==0.6.13 rasterio==1.2.10 scikit-learn==1.2.2
The following describes how to utilize SkySense's pluggable components to adapt to high-resolution RGB imagery, Sentinel-2 multispectral imagery, Sentinel-1 SAR imagery, and so on.
# For high-resolution RGB imagery (band order: R, G, B) or RGBNIR imagery (band order: R, G, B, NIR).
# Model architecture: Swin Tranformer v2 - Huge
import torch
from models.swin_transformer_v2 import SwinTransformerV2
checkpoint = torch.load('skysense_model_backbone_hr.pth')
checkpoint = {k.replace('backbone.', ''): v for k, v in checkpoint.items() if k.startswith('backbone.')}
swinv2_model = SwinTransformerV2()
msg = swinv2_model.load_state_dict(checkpoint, strict=False)
# missing_keys=['stages.0.blocks.0.attn.w_msa.relative_coords_table', ...], unexpected_keys=['mask_token']
swinv2_model = swinv2_model.cuda()
# For Sentinel-2 imagery (band order: B2, B3, B4, B5, B6, B7, B8, B8A, B11, B12) or Sentinel-1 imagery (band order: VV, VH).
# Model architecture: Vision Transformer - Large
import torch
from models.vision_transformer import VisionTransformer
checkpoint = torch.load('skysense_model_backbone_s2.pth')
vit_model = VisionTransformer()
msg = vit_model.load_state_dict(checkpoint, strict=False)
# missing_keys=[], unexpected_keys=['ctpe']
vit_model = vit_model.cuda()
Note: All results were obtained using NVIDIA A100 GPUs (80GB).
Dataset | Metric | Performance (%) | Config/Code |
---|---|---|---|
iSAID | mIoU | 70.91 | Config |
Dataset | Metric | Performance (%) | Config/Code |
---|---|---|---|
DIOR | mAP50 | 78.73 | Config |
The pre-trained model weights are only available for the non-commercial research. For any commercial use or cooperation, please contact Yansheng Li at Wuhan University (e-mail: [email protected]).
If you find our repo useful, please consider giving a star and citation:
@InProceedings{Guo_2024_CVPR,
author = {Guo, Xin and Lao, Jiangwei and Dang, Bo and Zhang, Yingying and Yu, Lei and Ru, Lixiang and Zhong, Liheng and Huang, Ziyuan and Wu, Kang and Hu, Dingxiang and He, Huimei and Wang, Jian and Chen, Jingdong and Yang, Ming and Zhang, Yongjun and Li, Yansheng},
title = {SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {27672-27683}
}
@article{wu2025semantic,
author = {Wu, Kang and Zhang, Yingying and Ru, Lixiang and Dang, Bo and Lao, Jiangwei and Yu, Lei and Luo, Junwei and Zhu, Zifan and Sun, Yue and Zhang, Jiahao and Zhu, Qi and Wang, Jian and Yang, Ming and Chen, Jingdong and Zhang, Yongjun and Li, Yansheng},
title = {A semantic‑enhanced multi‑modal remote sensing foundation model for Earth observation},
journal = {Nature Machine Intelligence},
year = {2025},
doi = {10.1038/s42256-025-01078-8},
url = {https://doi.org/10.1038/s42256-025-01078-8}
}
@inproceedings{zhu2025skysense,
title={Skysense-o: Towards open-world remote sensing interpretation with vision-centric visual-language modeling},
author={Zhu, Qi and Lao, Jiangwei and Ji, Deyi and Luo, Junwei and Wu, Kang and Zhang, Yingying and Ru, Lixiang and Wang, Jian and Chen, Jingdong and Yang, Ming and others},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={14733--14744},
year={2025}
}
@article{luo2024skysensegpt,
title={Skysensegpt: A fine-grained instruction tuning dataset and model for remote sensing vision-language understanding},
author={Luo, Junwei and Pang, Zhen and Zhang, Yongjun and Wang, Tingzhu and Wang, Linlin and Dang, Bo and Lao, Jiangwei and Wang, Jian and Chen, Jingdong and Tan, Yihua and others},
journal={arXiv preprint arXiv:2406.10100},
year={2024}
}
For any other questions please contact [email protected].