Skip to content

Jack-bo1220/Awesome-Remote-Sensing-Foundation-Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 

Repository files navigation

Maintenance Awesome GitHub watchers GitHub stars GitHub forks

Awesome Remote Sensing Foundation Models

🌟A collection of papers, datasets, benchmarks, code, and pre-trained weights for Remote Sensing Foundation Models (RSFMs).

📢 Latest Updates

🔥🔥🔥 Last Updated on 2025.08.07 🔥🔥🔥

  • 2025.08.04: Our recent work, SkySense++, a follow-up to our SkySense model, is accepted by Nature Machine Intelligence. We have released the code and pretrained weights at this repository.

Table of Contents

Remote Sensing Vision Foundation Models

Abbreviation Title Publication Paper Code & Weights
GeoKR Geographical Knowledge-Driven Representation Learning for Remote Sensing Images TGRS2021 GeoKR link
- Self-Supervised Learning of Remote Sensing Scene Representations Using Contrastive Multiview Coding CVPRW2021 Paper link
GASSL Geography-Aware Self-Supervised Learning ICCV2021 GASSL link
SeCo Seasonal Contrast: Unsupervised Pre-Training From Uncurated Remote Sensing Data ICCV2021 SeCo link
DINO-MM Self-supervised Vision Transformers for Joint SAR-optical Representation Learning IGARSS2022 DINO-MM link
SatMAE SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery NeurIPS2022 SatMAE link
RS-BYOL Self-Supervised Learning for Invariant Representations From Multi-Spectral and SAR Images JSTARS2022 RS-BYOL null
GeCo Geographical Supervision Correction for Remote Sensing Representation Learning TGRS2022 GeCo null
RingMo RingMo: A remote sensing foundation model with masked image modeling TGRS2022 RingMo Code
RVSA Advancing plain vision transformer toward remote sensing foundation model TGRS2022 RVSA link
RSP An Empirical Study of Remote Sensing Pretraining TGRS2022 RSP link
MATTER Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks CVPR2022 MATTER null
CSPT Consecutive Pre-Training: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain RS2022 CSPT link
- Self-supervised Vision Transformers for Land-cover Segmentation and Classification CVPRW2022 Paper link
BFM A billion-scale foundation model for remote sensing images Arxiv2023 BFM null
TOV TOV: The original vision model for optical remote sensing image understanding via self-supervised learning JSTARS2023 TOV link
CMID CMID: A Unified Self-Supervised Learning Framework for Remote Sensing Image Understanding TGRS2023 CMID link
RingMo-Sense RingMo-Sense: Remote Sensing Foundation Model for Spatiotemporal Prediction via Spatiotemporal Evolution Disentangling TGRS2023 RingMo-Sense null
IaI-SimCLR Multi-Modal Multi-Objective Contrastive Learning for Sentinel-1/2 Imagery CVPRW2023 IaI-SimCLR null
CACo Change-Aware Sampling and Contrastive Learning for Satellite Images CVPR2023 CACo link
SatLas SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding ICCV2023 SatLas link
GFM Towards Geospatial Foundation Models via Continual Pretraining ICCV2023 GFM link
Scale-MAE Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning ICCV2023 Scale-MAE link
DINO-MC DINO-MC: Self-supervised Contrastive Learning for Remote Sensing Imagery with Multi-sized Local Crops Arxiv2023 DINO-MC link
CROMA CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders NeurIPS2023 CROMA link
Cross-Scale MAE Cross-Scale MAE: A Tale of Multiscale Exploitation in Remote Sensing NeurIPS2023 Cross-Scale MAE link
DeCUR DeCUR: decoupling common & unique representations for multimodal self-supervision ECCV2024 DeCUR link
Presto Lightweight, Pre-trained Transformers for Remote Sensing Timeseries Arxiv2023 Presto link
CtxMIM CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding Arxiv2023 CtxMIM null
FG-MAE Feature Guided Masked Autoencoder for Self-supervised Learning in Remote Sensing Arxiv2023 FG-MAE link
Prithvi Foundation Models for Generalist Geospatial Artificial Intelligence Arxiv2023 Prithvi link
RingMo-lite RingMo-lite: A Remote Sensing Multi-task Lightweight Network with CNN-Transformer Hybrid Framework Arxiv2023 RingMo-lite null
- A Self-Supervised Cross-Modal Remote Sensing Foundation Model with Multi-Domain Representation and Cross-Domain Fusion IGARSS2023 Paper null
EarthPT EarthPT: a foundation model for Earth Observation NeurIPS2023 CCAI workshop EarthPT link
USat USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery Arxiv2023 USat link
AIEarth Analytical Insight of Earth: A Cloud-Platform of Intelligent Computing for Geospatial Big Data Arxiv2023 AIEarth link
- Self-Supervised Learning for SAR ATR with a Knowledge-Guided Predictive Architecture Arxiv2023 Paper link
Clay Clay Foundation Model - null link
Hydro Hydro--A Foundation Model for Water in Satellite Imagery - null link
U-BARN Self-Supervised Spatio-Temporal Representation Learning of Satellite Image Time Series JSTARS2024 Paper link
GeRSP Generic Knowledge Boosted Pre-training For Remote Sensing Images Arxiv2024 GeRSP GeRSP
SwiMDiff SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image Arxiv2024 SwiMDiff null
OFA-Net One for All: Toward Unified Foundation Models for Earth Vision IGARSS2024 OFA-Net null
SMLFR Generative ConvNet Foundation Model With Sparse Modeling and Low-Frequency Reconstruction for Remote Sensing Image Interpretation TGRS2024 SMLFR link
SpectralGPT SpectralGPT: Spectral Foundation Model TPAMI2024 SpectralGPT link
S2MAE S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data CVPR2024 S2MAE null
SatMAE++ Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery CVPR2024 SatMAE++ link
msGFM Bridging Remote Sensors with Multisensor Geospatial Foundation Models CVPR2024 msGFM link
SkySense SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery CVPR2024 SkySense link
MTP MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining IEEE JSTARS2024 MTP link
DOFA Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities Arxiv2024 DOFA link
MMEarth MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning ECCV2024 MMEarth link
LeMeViT LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation IJCAI2024 LeMeViT link
SoftCon Multi-Label Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining TGRS2024 SoftCon link
RS-DFM RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks Arxiv2024 RS-DFM null
A2-MAE A2-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder Arxiv2024 A2-MAE null
OmniSat OmniSat: Self-Supervised Modality Fusion for Earth Observation ECCV2024 OmniSat link
MM-VSF Towards a Knowledge guided Multimodal Foundation Model for Spatio-Temporal Remote Sensing Applications Arxiv2024 MM-VSF null
MA3E Masked Angle-Aware Autoencoder for Remote Sensing Images ECCV2024 MA3E link
SpectralEarth SpectralEarth: Training Hyperspectral Foundation Models at Scale Arxiv2024 SpectralEarth null
SenPa-MAE SenPa-MAE: Sensor Parameter Aware Masked Autoencoder for Multi-Satellite Self-Supervised Pretraining Arxiv2024 SenPa-MAE link
RingMo-Aerial RingMo-Aerial: An Aerial Remote Sensing Foundation Model With A Affine Transformation Contrastive Learning Arxiv2024 RingMo-Aerial null
SAR-JEPA Predicting Gradient is Better: Exploring Self-Supervised Learning for SAR ATR with a Joint-Embedding Predictive Architecture ISPRS JPRS2024 SAR-JEPA link
PIS Pretrain a Remote Sensing Foundation Model by Promoting Intra-instance Similarity TGRS2024 PIS link
OReole-FM OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery SIGSPATIAL2024 OReole-FM null
PIEViT Pattern Integration and Enhancement Vision Transformer for Self-supervised Learning in Remote Sensing Arxiv2024 PIEViT null
SatVision-TOA SatVision-TOA: A Geospatial Foundation Model for Coarse-Resolution All-Sky Remote Sensing Imagery Arxiv2024 SatVision-TOA link
Prithvi-EO-2.0 Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications Arxiv2024 Prithvi-EO-2.0 link
WildSAT WildSAT: Learning Satellite Image Representations from Wildlife Observations Arxiv2024 WildSAT link
SeaMo SeaMo: A Multi-Seasonal and Multimodal Remote Sensing Foundation Model Information Fusion2025 SeaMo null
HyperSIGMA HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model IEEE TPAMI2025 HyperSIGMA link
FoMo FoMo: Multi-Modal, Multi-Scale and Multi-Task Remote Sensing Foundation Models for Forest Monitoring AAAI2025 FoMo link
SatMamba SatMamba: Development of Foundation Models for Remote Sensing Imagery Using State Space Models Arxiv2025 SatMamba link
Galileo Galileo: Learning Global and Local Features in Pretrained Remote Sensing Models ICML2025 Galileo link
SatDiFuser Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models? Arxiv2025 SatDiFuser null
RoMA RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing Arxiv2025 RoMA link
Panopticon Panopticon: Advancing Any-Sensor Foundation Models for Earth Observation CVPR2025 Panopticon link
HyperFree HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery CVPR2025 HyperFree link
AnySat AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities CVPR2025 AnySat link
HyperSL HyperSL: A Spectral Foundation Model for Hyperspectral Image Interpretation IEEE TGRS2025 HyperSL link
DynamicVis DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding Arxiv2025 DynamicVis link
FlexiMo FlexiMo: A Flexible Remote Sensing Foundation Model Arxiv2025 FlexiMo null
TiMo TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series Arxiv2025 TiMo link
RingMoE RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation Arxiv2025 RingMoE null
- A Complex-valued SAR Foundation Model Based on Physically Inspired Representation Learning Arxiv2025 Paper null
TerraFM TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation Arxiv2025 TerraFM link
TESSERA TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis Arxiv2025 TESSERA null
MoSAiC MoSAiC: Multi-Modal Multi-Label Supervision-Aware Contrastive Learning for Remote Sensing Arxiv2025 MoSAiC null
CGEarthEye CGEarthEye:A High-Resolution Remote Sensing Vision Foundation Model Based on the Jilin-1 Satellite Constellation Arxiv2025 CGEarthEye null
MAPEX MAPEX: Modality-Aware Pruning of Experts for Remote Sensing Foundation Models Arxiv2025 MAPEX link
FedSense Towards Privacy-preserved Pre-training of Remote Sensing Foundation Models with Federated Mutual-guidance Learning ICCV2025 FedSense null
RS-vHeat RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model ICCV2025 RS-vHeat null
Copernicus-FM Towards a Unified Copernicus Foundation Model for Earth Vision ICCV2025 Copernicus-FM link
SelectiveMAE Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset ICCV2025 SelectiveMAE link
SMARTIES SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images ICCV2025 SMARTIES link
TerraMind TerraMind: Large-Scale Generative Multimodality for Earth Observation ICCV2025 TerraMind link
SkySense V2 SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing ICCV2025 SkySense V2 null
AlphaEarth AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data Arxiv2025 AlphaEarth null
SkySense++ A semantic-enhanced multi-modal remote sensing foundation model for Earth observation Nature Machine Intelligence 2025 SkySense++ link

Remote Sensing Vision-Language Foundation Models

Abbreviation Title Publication Paper Code & Weights
RSGPT RSGPT: A Remote Sensing Vision Language Model and Benchmark Arxiv2023 RSGPT link
RemoteCLIP RemoteCLIP: A Vision Language Foundation Model for Remote Sensing IEEE TGRS2024 RemoteCLIP link
GeoRSCLIP RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model IEEE TGRS2024 GeoRSCLIP link
GRAFT Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment ICLR2024 GRAFT null
- Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs Arxiv2023 Paper link
- Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models Arxiv2024 Paper link
EarthGPT EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain Arxiv2024 EarthGPT null
SkyCLIP SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing AAAI2024 SkyCLIP link
GeoChat GeoChat: Grounded Large Vision-Language Model for Remote Sensing CVPR2024 GeoChat link
LHRS-Bot LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model ECCV2024 LHRS-Bot link
RS-LLaVA RS-LLaVA: Large Vision Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery RS2024 RS-LLaVA link
SkySenseGPT SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding Arxiv2024 SkySenseGPT link
EarthMarker EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension IEEE TGRS2024 EarthMarker link
GeoText Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching ECCV2024 Aquila link
Aquila Aquila: A Hierarchically Aligned Visual-Language Model for Enhanced Remote Sensing Image Comprehension Arxiv2024 Aquila null
LHRS-Bot-Nova LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation Arxiv2024 LHRS-Bot-Nova link
RSCLIP Pushing the Limits of Vision-Language Models in Remote Sensing without Human Annotations Arxiv2024 RSCLIP null
GeoGround GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding Arxiv2024 GeoGround link
RingMoGPT RingMoGPT: A Unified Remote Sensing Foundation Model for Vision, Language, and grounded tasks TGRS2024 RingMoGPT null
RSUniVLM RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts Arxiv2024 RSUniVLM link
UniRS UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models Arxiv2024 UniRS null
REO-VLM REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation Arxiv2024 REO-VLM null
SkyEyeGPT SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model ISPRS JPRS2025 SkyEyeGPT link
VHM VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis AAAI2025 VHM link
TEOChat TEOChat: Large Language and Vision Assistant for Temporal Earth Observation Data ICLR2025 TEOChat link
EarthDial EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues CVPR2025 EarthDial link
SkySense-O SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling CVPR2025 SkySense-O link
XLRS-Bench XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery? CVPR2025 XLRS-Bench link
GeoPix GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing IEEE GRSM2025 GeoPix link
GeoPixel GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing ICML2025 GeoPixel link
- Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models Arxiv2025 Paper null
DOFA-CLIP DOFA-CLIP: Multimodal Vision–Language Foundation Models for Earth Observation Arxiv2025 DOFA-CLIP link
Falcon Falcon: A Remote Sensing Vision-Language Foundation Model Arxiv2025 Falcon link
LRS-VQA When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning ICCV2025 LRS-VQA link
UrbanLLaVA UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding ICCV2025 UrbanLLaVA link
OmniGeo OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence Arxiv2025 OmniGeo null
EagleVision EagleVision: Object-level Attribute Multimodal LLM for Remote Sensing Arxiv2025 EagleVision link
SegEarth-R1 SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model Arxiv2025 LISAt link
RemoteSAM RemoteSAM: Towards Segment Anything for Earth Observation ACMMM2025 RemoteSAM link
DynamicVL DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding Arxiv2025 DynamicVL null
LISAt LISAt: Language- Instructed Segmentation Assistant for Satellite Imagery Arxiv2025 LISAt link
EarthMind EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models Arxiv2025 EarthMind link
- Remote Sensing Large Vision-Language Model: Semantic-augmented Multi-level Alignment and Semantic-aware Expert Modeling Arxiv2025 Paper null
RingMo-Agent RingMo-Agent: A Unified Remote Sensing Foundation Model for Multi-Platform and Multi-Modal Reasoning Arxiv2025 RingMo-Agent null

Remote Sensing Generative Foundation Models

Abbreviation Title Publication Paper Code & Weights
Seg2Sat Seg2Sat - Segmentation to aerial view using pretrained diffuser models Github null link
- Generate Your Own Scotland: Satellite Image Generation Conditioned on Maps NeurIPSW2023 Paper link
GeoRSSD RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model Arxiv2023 Paper link
DiffusionSat DiffusionSat: A Generative Foundation Model for Satellite Imagery ICLR2024 DiffusionSat link
CRS-Diff CRS-Diff: Controllable Generative Remote Sensing Foundation Model Arxiv2024 Paper null
MetaEarth MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation Arxiv2024 Paper link
CRS-Diff CRS-Diff: Controllable Generative Remote Sensing Foundation Model Arxiv2024 Paper link
HSIGene HSIGene: A Foundation Model For Hyperspectral Image Generation Arxiv2024 Paper link
Text2Earth Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model Arxiv2025 Paper link

Remote Sensing Vision-Location Foundation Models

Abbreviation Title Publication Paper Code & Weights
CSP CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations ICML2023 CSP link
GeoCLIP GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization NeurIPS2023 GeoCLIP link
SatCLIP SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery Arxiv2023 SatCLIP link
RANGE RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings CVPR2025 RANGE null
GAIR GAIR: Improving Multimodal Geo-Foundation Model with Geo-Aligned Implicit Representations Arxiv2025 GAIR null

Remote Sensing Vision-Audio Foundation Models

Abbreviation Title Publication Paper Code & Weights
- Self-supervised audiovisual representation learning for remote sensing data JAG2022 Paper link

Remote Sensing Task-specific Foundation Models

Abbreviation Title Publication Paper Code & Weights Task
SS-MAE SS-MAE: Spatial-Spectral Masked Auto-Encoder for Mulit-Source Remote Sensing Image Classification TGRS2023 Paper link Image Classification
- A Decoupling Paradigm With Prompt Learning for Remote Sensing Image Change Captioning TGRS2023 Paper link Remote Sensing Image Change Captioning
TTP Time Travelling Pixels: Bitemporal Features Integration with Foundation Model for Remote Sensing Image Change Detection Arxiv2023 Paper link Change Detection
CSMAE Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in Remote Sensing Arxiv2024 Paper link Image Retrieval
RSPrompter RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model TGRS2024 Paper link Instance Segmentation
BAN A New Learning Paradigm for Foundation Model-based Remote Sensing Change Detection TGRS2024 Paper link Change Detection
- Change Detection Between Optical Remote Sensing Imagery and Map Data via Segment Anything Model (SAM) Arxiv2024 Paper null Change Detection (Optical & OSM data)
AnyChange Segment Any Change Arxiv2024 Paper null Zero-shot Change Detection
RS-CapRet Large Language Models for Captioning and Retrieving Remote Sensing Images Arxiv2024 Paper null Image Caption & Text-image Retrieval
- Task Specific Pretraining with Noisy Labels for Remote sensing Image Segmentation Arxiv2024 Paper null Image Segmentation (Noisy labels)
RSBuilding RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model Arxiv2024 Paper link Building Extraction and Change Detection
SAM-Road Segment Anything Model for Road Network Graph Extraction Arxiv2024 Paper link Road Extraction
CrossEarth CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation Arxiv2024 Paper link Domain Generalizable Remote Sensing Semantic Segmentation
GeoGround GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding Arxiv2024 Paper link Remote Sensing Visual Grounding
SARATR-X SARATR-X: Toward Building a Foundation Model for SAR Target Recognition IEEE TIP2025 SARATR-X link SAR Target Recognition

Remote Sensing Agents

Abbreviation Title Publication Paper Code & Weights
GeoLLM-QA Evaluating Tool-Augmented Agents in Remote Sensing Platforms ICLR 2024 ML4RS Workshop Paper null
RS-Agent RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents Arxiv2024 Paper null
Change-Agent Change-Agent: Toward Interactive Comprehensive Remote Sensing Change Interpretation and Analysis TGRS2024 Paper link
GeoLLM-Engine GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots. CVPRW2024 Paper null
PEACE PEACE: Empowering Geologic Map Holistic Understanding with MLLMs CVPR2025 Paper link
- Towards LLM Agents for Earth Observation: The UnivEARTH Dataset Arxiv2025 Paper null
Geo-OLM Geo-OLM: Enabling Sustainable Earth Observation Studies with Cost-Efficient Open Language Models & State-Driven Workflows COMPASS'2025 Paper link
ThinkGeo ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks Arxiv2025 Paper link

Benchmarks for RSFMs

Abbreviation Title Publication Paper Link Downstream Tasks
- Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters Arxiv2023 Paper link Classification
GEO-Bench GEO-Bench: Toward Foundation Models for Earth Monitoring Arxiv2023 Paper link Classification & Segmentation
FoMo-Bench FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models Arxiv2023 FoMo-Bench Comming soon Classification & Segmentation & Detection for forest monitoring
PhilEO PhilEO Bench: Evaluating Geo-Spatial Foundation Models Arxiv2024 Paper link Segmentation & Regression estimation
SkySense SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery CVPR2024 SkySense Targeted open-source Classification & Segmentation & Detection & Change detection & Multi-Modal Segmentation: Time-insensitive LandCover Mapping & Multi-Modal Segmentation: Time-sensitive Crop Mapping & Multi-Modal Scene Classification
VLEO-Bench Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data Arxiv2024 VLEO-bench link Location Recognition & Captioning & Scene Classification & Counting & Detection & Change detection
VRSBench VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding NeurIPS2024 VRSBench link Image Captioning & Object Referring & Visual Question Answering
UrBench UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios Arxiv2024 UrBench link Object Referring & Visual Question Answering & Counting & Scene Classification & Location Recognition & Geolocalization
PANGAEA PANGAEA: A Global and Inclusive Benchmark for Geospatial Foundation Models Arxiv2024 PANGAEA link Segmentation & Change detection & Regression
COREval COREval: A Comprehensive and Objective Benchmark for Evaluating the Remote Sensing Capabilities of Large Vision-Language Models Arxiv2024 COREval null Perception & Reasoning
GEOBench-VLM GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks Arxiv2024 GEOBench-VLM link Scene Understanding & Counting & Object Classification & Event Detection & Spatial Relations
Copernicus-Bench Towards a Unified Copernicus Foundation Model for Earth Vision Arxiv2025 Copernicus-Bench link Segmentation & Classification & Change detection & Regression

(Large-scale) Pre-training Datasets

Abbreviation Title Publication Paper Attribute Link
fMoW Functional Map of the World CVPR2018 fMoW Vision link
SEN12MS SEN12MS -- A Curated Dataset of Georeferenced Multi-Spectral Sentinel-1/2 Imagery for Deep Learning and Data Fusion - SEN12MS Vision link
BEN-MM BigEarthNet-MM: A Large Scale Multi-Modal Multi-Label Benchmark Archive for Remote Sensing Image Classification and Retrieval GRSM2021 BEN-MM Vision link
MillionAID On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews, Guidances, and Million-AID JSTARS2021 MillionAID Vision link
SeCo Seasonal Contrast: Unsupervised Pre-Training From Uncurated Remote Sensing Data ICCV2021 SeCo Vision link
fMoW-S2 SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery NeurIPS2022 fMoW-S2 Vision link
TOV-RS-Balanced TOV: The original vision model for optical remote sensing image understanding via self-supervised learning JSTARS2023 TOV Vision link
SSL4EO-S12 SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation GRSM2023 SSL4EO-S12 Vision link
SSL4EO-L SSL4EO-L: Datasets and Foundation Models for Landsat Imagery Arxiv2023 SSL4EO-L Vision link
SatlasPretrain SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding ICCV2023 SatlasPretrain Vision (Supervised) link
CACo Change-Aware Sampling and Contrastive Learning for Satellite Images CVPR2023 CACo Vision Comming soon
SAMRS SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model NeurIPS2023 SAMRS Vision link
RSVG RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data TGRS2023 RSVG Vision-Language link
RS5M RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model Arxiv2023 RS5M Vision-Language link
GEO-Bench GEO-Bench: Toward Foundation Models for Earth Monitoring Arxiv2023 GEO-Bench Vision (Evaluation) link
RSICap & RSIEval RSGPT: A Remote Sensing Vision Language Model and Benchmark Arxiv2023 RSGPT Vision-Language Comming soon
Clay Clay Foundation Model - null Vision link
SATIN SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models ICCVW2023 SATIN Vision-Language link
SkyScript SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing AAAI2024 SkyScript Vision-Language link
ChatEarthNet ChatEarthNet: A Global-Scale, High-Quality Image-Text Dataset for Remote Sensing Arxiv2024 ChatEarthNet Vision-Language link
LuoJiaHOG LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrieval Arxiv2024 LuoJiaHOG Vision-Language null
MMEarth MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning Arxiv2024 MMEarth Vision link
SeeFar SeeFar: Satellite Agnostic Multi-Resolution Dataset for Geospatial Foundation Models Arxiv2024 SeeFar Vision link
FIT-RS SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding Arxiv2024 Paper Vision-Language link
RS-GPT4V RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding Arxiv2024 Paper Vision-Language link
RS-4M Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset Arxiv2024 RS-4M Vision link
Major TOM Major TOM: Expandable Datasets for Earth Observation Arxiv2024 Major TOM Vision link
VRSBench VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding Arxiv2024 VRSBench Vision-Language link
MMM-RS MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image Generation Arxiv2024 MMM-RS Vision-Language link
DDFAV DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark Arxiv2024 DDFAV Vision-Language link
M3LEO A Multi-Modal, Multi-Label Earth Observation Dataset Integrating Interferometric SAR and Multispectral Data NeurIPS2024 M3LEO Vision link
Copernicus-Pretrain Towards a Unified Copernicus Foundation Model for Earth Vision Arxiv2025 Copernicus-Pretrain Vision link

Relevant Projects

(TODO. This section is dedicated to recommending more relevant and impactful projects, with the hope of promoting the development of the RS community. 😄 🚀)

Title Link Brief Introduction
RSFMs (Remote Sensing Foundation Models) Playground link An open-source playground to streamline the evaluation and fine-tuning of RSFMs on various datasets.
PANGAEA link A Global and Inclusive Benchmark for Geospatial Foundation Models.
GeoFM link Evaluation of Foundation Models for Earth Observation.
EOUncertaintyGeneralization link On the Generalization of Representation Uncertainty in Earth Observation.

Survey/Commentary Papers

Title Publication Paper Attribute
Self-Supervised Remote Sensing Feature Learning: Learning Paradigms, Challenges, and Future Works TGRS2023 Paper Vision & Vision-Language
The Potential of Visual ChatGPT For Remote Sensing Arxiv2023 Paper Vision-Language
遥感大模型:进展与前瞻 武汉大学学报 (信息科学版) 2023 Paper Vision & Vision-Language
地理人工智能样本:模型、质量与服务 武汉大学学报 (信息科学版) 2023 Paper -
Brain-Inspired Remote Sensing Foundation Models and Open Problems: A Comprehensive Survey JSTARS2023 Paper Vision & Vision-Language
Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters Arxiv2023 Paper Vision
An Agenda for Multimodal Foundation Models for Earth Observation IGARSS2023 Paper Vision
Transfer learning in environmental remote sensing RSE2024 Paper Transfer learning
遥感基础模型发展综述与未来设想 遥感学报2023 Paper -
On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications Arxiv2023 Paper Vision-Language
Vision-Language Models in Remote Sensing: Current Progress and Future Trends IEEE GRSM2024 Paper Vision-Language
On the Foundations of Earth and Climate Foundation Models Arxiv2024 Paper Vision & Vision-Language
Towards Vision-Language Geo-Foundation Model: A Survey Arxiv2024 Paper Vision-Language
AI Foundation Models in Remote Sensing: A Survey Arxiv2024 Paper Vision
Foundation model for generalist remote sensing intelligence: Potentials and prospects Science Bulletin2024 Paper -
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques Arxiv2024 Paper Vision-Language
Foundation Models for Remote Sensing and Earth Observation: A Survey Arxiv2024 Paper Vision & Vision-Language
多模态遥感基础大模型:研究现状与未来展望 测绘学报2024 Paper Vision & Vision-Language & Generative & Vision-Location
When Geoscience Meets Foundation Models: Toward a general geoscience artificial intelligence system IEEE GRSM2024 Paper Vision & Vision-Language
Towards the next generation of Geospatial Artificial Intelligence JAG2025 Paper -
Vision Foundation Models in Remote Sensing: A survey IEEE GRSM2025 Paper Vision
Unleashing the potential of remote sensing foundation models via bridging data and computility islands The Innovation2025 Paper -
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality Arxiv2025 Paper -

Citation

If you find this repository useful, please consider giving a star ⭐ and citation:

@inproceedings{guo2024skysense,
  title={Skysense: A multi-modal remote sensing foundation model towards universal interpretation for earth observation imagery},
  author={Guo, Xin and Lao, Jiangwei and Dang, Bo and Zhang, Yingying and Yu, Lei and Ru, Lixiang and Zhong, Liheng and Huang, Ziyuan and Wu, Kang and Hu, Dingxiang and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={27672--27683},
  year={2024}
}

@article{li2025unleashing,
  title={Unleashing the potential of remote sensing foundation models via bridging data and computility islands},
  author={Li, Yansheng and Tan, Jieyi and Dang, Bo and Ye, Mang and Bartalev, Sergey A and Shinkarenko, Stanislav and Wang, Linlin and Zhang, Yingying and Ru, Lixiang and Guo, Xin and others},
  journal={The Innovation},
  year={2025},
  publisher={Elsevier}
}

@article{wu2025semantic,
  author = {Wu, Kang and Zhang, Yingying and Ru, Lixiang and Dang, Bo and Lao, Jiangwei and Yu, Lei and Luo, Junwei and Zhu, Zifan and Sun, Yue and Zhang, Jiahao and Zhu, Qi and Wang, Jian and Yang, Ming and Chen, Jingdong and Zhang, Yongjun and Li, Yansheng},
  title= {A semantic‑enhanced multi‑modal remote sensing foundation model for Earth observation},
  journal= {Nature Machine Intelligence},
  year= {2025},
  doi= {10.1038/s42256-025-01078-8},
  url= {https://doi.org/10.1038/s42256-025-01078-8}
}

@inproceedings{zhu2025skysense,
  title={Skysense-o: Towards open-world remote sensing interpretation with vision-centric visual-language modeling},
  author={Zhu, Qi and Lao, Jiangwei and Ji, Deyi and Luo, Junwei and Wu, Kang and Zhang, Yingying and Ru, Lixiang and Wang, Jian and Chen, Jingdong and Yang, Ming and others},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={14733--14744},
  year={2025}
}

@article{luo2024skysensegpt,
  title={Skysensegpt: A fine-grained instruction tuning dataset and model for remote sensing vision-language understanding},
  author={Luo, Junwei and Pang, Zhen and Zhang, Yongjun and Wang, Tingzhu and Wang, Linlin and Dang, Bo and Lao, Jiangwei and Wang, Jian and Chen, Jingdong and Tan, Yihua and others},
  journal={arXiv preprint arXiv:2406.10100},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published