`Awesome Remote Sensing Foundation Models`

🌟A collection of papers, datasets, benchmarks, code, and pre-trained weights for Remote Sensing Foundation Models (RSFMs).

📢 Latest Updates

🔥🔥🔥 Last Updated on 2025.08.07 🔥🔥🔥

2025.08.04: Our recent work, SkySense++, a follow-up to our SkySense model, is accepted by Nature Machine Intelligence. We have released the code and pretrained weights at this repository.

Remote Sensing Vision Foundation Models

Abbreviation	Title	Publication	Paper	Code & Weights
GeoKR	Geographical Knowledge-Driven Representation Learning for Remote Sensing Images	TGRS2021	GeoKR	link
-	Self-Supervised Learning of Remote Sensing Scene Representations Using Contrastive Multiview Coding	CVPRW2021	Paper	link
GASSL	Geography-Aware Self-Supervised Learning	ICCV2021	GASSL	link
SeCo	Seasonal Contrast: Unsupervised Pre-Training From Uncurated Remote Sensing Data	ICCV2021	SeCo	link
DINO-MM	Self-supervised Vision Transformers for Joint SAR-optical Representation Learning	IGARSS2022	DINO-MM	link
SatMAE	SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery	NeurIPS2022	SatMAE	link
RS-BYOL	Self-Supervised Learning for Invariant Representations From Multi-Spectral and SAR Images	JSTARS2022	RS-BYOL	null
GeCo	Geographical Supervision Correction for Remote Sensing Representation Learning	TGRS2022	GeCo	null
RingMo	RingMo: A remote sensing foundation model with masked image modeling	TGRS2022	RingMo	Code
RVSA	Advancing plain vision transformer toward remote sensing foundation model	TGRS2022	RVSA	link
RSP	An Empirical Study of Remote Sensing Pretraining	TGRS2022	RSP	link
MATTER	Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks	CVPR2022	MATTER	null
CSPT	Consecutive Pre-Training: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain	RS2022	CSPT	link
-	Self-supervised Vision Transformers for Land-cover Segmentation and Classification	CVPRW2022	Paper	link
BFM	A billion-scale foundation model for remote sensing images	Arxiv2023	BFM	null
TOV	TOV: The original vision model for optical remote sensing image understanding via self-supervised learning	JSTARS2023	TOV	link
CMID	CMID: A Unified Self-Supervised Learning Framework for Remote Sensing Image Understanding	TGRS2023	CMID	link
RingMo-Sense	RingMo-Sense: Remote Sensing Foundation Model for Spatiotemporal Prediction via Spatiotemporal Evolution Disentangling	TGRS2023	RingMo-Sense	null
IaI-SimCLR	Multi-Modal Multi-Objective Contrastive Learning for Sentinel-1/2 Imagery	CVPRW2023	IaI-SimCLR	null
CACo	Change-Aware Sampling and Contrastive Learning for Satellite Images	CVPR2023	CACo	link
SatLas	SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding	ICCV2023	SatLas	link
GFM	Towards Geospatial Foundation Models via Continual Pretraining	ICCV2023	GFM	link
Scale-MAE	Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning	ICCV2023	Scale-MAE	link
DINO-MC	DINO-MC: Self-supervised Contrastive Learning for Remote Sensing Imagery with Multi-sized Local Crops	Arxiv2023	DINO-MC	link
CROMA	CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders	NeurIPS2023	CROMA	link
Cross-Scale MAE	Cross-Scale MAE: A Tale of Multiscale Exploitation in Remote Sensing	NeurIPS2023	Cross-Scale MAE	link
DeCUR	DeCUR: decoupling common & unique representations for multimodal self-supervision	ECCV2024	DeCUR	link
Presto	Lightweight, Pre-trained Transformers for Remote Sensing Timeseries	Arxiv2023	Presto	link
CtxMIM	CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding	Arxiv2023	CtxMIM	null
FG-MAE	Feature Guided Masked Autoencoder for Self-supervised Learning in Remote Sensing	Arxiv2023	FG-MAE	link
Prithvi	Foundation Models for Generalist Geospatial Artificial Intelligence	Arxiv2023	Prithvi	link
RingMo-lite	RingMo-lite: A Remote Sensing Multi-task Lightweight Network with CNN-Transformer Hybrid Framework	Arxiv2023	RingMo-lite	null
-	A Self-Supervised Cross-Modal Remote Sensing Foundation Model with Multi-Domain Representation and Cross-Domain Fusion	IGARSS2023	Paper	null
EarthPT	EarthPT: a foundation model for Earth Observation	NeurIPS2023 CCAI workshop	EarthPT	link
USat	USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery	Arxiv2023	USat	link
AIEarth	Analytical Insight of Earth: A Cloud-Platform of Intelligent Computing for Geospatial Big Data	Arxiv2023	AIEarth	link
-	Self-Supervised Learning for SAR ATR with a Knowledge-Guided Predictive Architecture	Arxiv2023	Paper	link
Clay	Clay Foundation Model	-	null	link
Hydro	Hydro--A Foundation Model for Water in Satellite Imagery	-	null	link
U-BARN	Self-Supervised Spatio-Temporal Representation Learning of Satellite Image Time Series	JSTARS2024	Paper	link
GeRSP	Generic Knowledge Boosted Pre-training For Remote Sensing Images	Arxiv2024	GeRSP	GeRSP
SwiMDiff	SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image	Arxiv2024	SwiMDiff	null
OFA-Net	One for All: Toward Unified Foundation Models for Earth Vision	IGARSS2024	OFA-Net	null
SMLFR	Generative ConvNet Foundation Model With Sparse Modeling and Low-Frequency Reconstruction for Remote Sensing Image Interpretation	TGRS2024	SMLFR	link
SpectralGPT	SpectralGPT: Spectral Foundation Model	TPAMI2024	SpectralGPT	link
S2MAE	S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data	CVPR2024	S2MAE	null
SatMAE++	Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery	CVPR2024	SatMAE++	link
msGFM	Bridging Remote Sensors with Multisensor Geospatial Foundation Models	CVPR2024	msGFM	link
SkySense	SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery	CVPR2024	SkySense	link
MTP	MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining	IEEE JSTARS2024	MTP	link
DOFA	Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities	Arxiv2024	DOFA	link
MMEarth	MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning	ECCV2024	MMEarth	link
LeMeViT	LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation	IJCAI2024	LeMeViT	link
SoftCon	Multi-Label Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining	TGRS2024	SoftCon	link
RS-DFM	RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks	Arxiv2024	RS-DFM	null
A2-MAE	A2-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder	Arxiv2024	A2-MAE	null
OmniSat	OmniSat: Self-Supervised Modality Fusion for Earth Observation	ECCV2024	OmniSat	link
MM-VSF	Towards a Knowledge guided Multimodal Foundation Model for Spatio-Temporal Remote Sensing Applications	Arxiv2024	MM-VSF	null
MA3E	Masked Angle-Aware Autoencoder for Remote Sensing Images	ECCV2024	MA3E	link
SpectralEarth	SpectralEarth: Training Hyperspectral Foundation Models at Scale	Arxiv2024	SpectralEarth	null
SenPa-MAE	SenPa-MAE: Sensor Parameter Aware Masked Autoencoder for Multi-Satellite Self-Supervised Pretraining	Arxiv2024	SenPa-MAE	link
RingMo-Aerial	RingMo-Aerial: An Aerial Remote Sensing Foundation Model With A Affine Transformation Contrastive Learning	Arxiv2024	RingMo-Aerial	null
SAR-JEPA	Predicting Gradient is Better: Exploring Self-Supervised Learning for SAR ATR with a Joint-Embedding Predictive Architecture	ISPRS JPRS2024	SAR-JEPA	link
PIS	Pretrain a Remote Sensing Foundation Model by Promoting Intra-instance Similarity	TGRS2024	PIS	link
OReole-FM	OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery	SIGSPATIAL2024	OReole-FM	null
PIEViT	Pattern Integration and Enhancement Vision Transformer for Self-supervised Learning in Remote Sensing	Arxiv2024	PIEViT	null
SatVision-TOA	SatVision-TOA: A Geospatial Foundation Model for Coarse-Resolution All-Sky Remote Sensing Imagery	Arxiv2024	SatVision-TOA	link
Prithvi-EO-2.0	Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications	Arxiv2024	Prithvi-EO-2.0	link
WildSAT	WildSAT: Learning Satellite Image Representations from Wildlife Observations	Arxiv2024	WildSAT	link
SeaMo	SeaMo: A Multi-Seasonal and Multimodal Remote Sensing Foundation Model	Information Fusion2025	SeaMo	null
HyperSIGMA	HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model	IEEE TPAMI2025	HyperSIGMA	link
FoMo	FoMo: Multi-Modal, Multi-Scale and Multi-Task Remote Sensing Foundation Models for Forest Monitoring	AAAI2025	FoMo	link
SatMamba	SatMamba: Development of Foundation Models for Remote Sensing Imagery Using State Space Models	Arxiv2025	SatMamba	link
Galileo	Galileo: Learning Global and Local Features in Pretrained Remote Sensing Models	ICML2025	Galileo	link
SatDiFuser	Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models?	Arxiv2025	SatDiFuser	null
RoMA	RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing	Arxiv2025	RoMA	link
Panopticon	Panopticon: Advancing Any-Sensor Foundation Models for Earth Observation	CVPR2025	Panopticon	link
HyperFree	HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery	CVPR2025	HyperFree	link
AnySat	AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities	CVPR2025	AnySat	link
HyperSL	HyperSL: A Spectral Foundation Model for Hyperspectral Image Interpretation	IEEE TGRS2025	HyperSL	link
DynamicVis	DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding	Arxiv2025	DynamicVis	link
FlexiMo	FlexiMo: A Flexible Remote Sensing Foundation Model	Arxiv2025	FlexiMo	null
TiMo	TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series	Arxiv2025	TiMo	link
RingMoE	RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation	Arxiv2025	RingMoE	null
-	A Complex-valued SAR Foundation Model Based on Physically Inspired Representation Learning	Arxiv2025	Paper	null
TerraFM	TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation	Arxiv2025	TerraFM	link
TESSERA	TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis	Arxiv2025	TESSERA	null
MoSAiC	MoSAiC: Multi-Modal Multi-Label Supervision-Aware Contrastive Learning for Remote Sensing	Arxiv2025	MoSAiC	null
CGEarthEye	CGEarthEye:A High-Resolution Remote Sensing Vision Foundation Model Based on the Jilin-1 Satellite Constellation	Arxiv2025	CGEarthEye	null
MAPEX	MAPEX: Modality-Aware Pruning of Experts for Remote Sensing Foundation Models	Arxiv2025	MAPEX	link
FedSense	Towards Privacy-preserved Pre-training of Remote Sensing Foundation Models with Federated Mutual-guidance Learning	ICCV2025	FedSense	null
RS-vHeat	RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model	ICCV2025	RS-vHeat	null
Copernicus-FM	Towards a Unified Copernicus Foundation Model for Earth Vision	ICCV2025	Copernicus-FM	link
SelectiveMAE	Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset	ICCV2025	SelectiveMAE	link
SMARTIES	SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images	ICCV2025	SMARTIES	link
TerraMind	TerraMind: Large-Scale Generative Multimodality for Earth Observation	ICCV2025	TerraMind	link
SkySense V2	SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing	ICCV2025	SkySense V2	null
AlphaEarth	AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data	Arxiv2025	AlphaEarth	null
SkySense++	A semantic-enhanced multi-modal remote sensing foundation model for Earth observation	Nature Machine Intelligence 2025	SkySense++	link

Remote Sensing Vision-Language Foundation Models

Abbreviation	Title	Publication	Paper	Code & Weights
RSGPT	RSGPT: A Remote Sensing Vision Language Model and Benchmark	Arxiv2023	RSGPT	link
RemoteCLIP	RemoteCLIP: A Vision Language Foundation Model for Remote Sensing	IEEE TGRS2024	RemoteCLIP	link
GeoRSCLIP	RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model	IEEE TGRS2024	GeoRSCLIP	link
GRAFT	Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment	ICLR2024	GRAFT	null
-	Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs	Arxiv2023	Paper	link
-	Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models	Arxiv2024	Paper	link
EarthGPT	EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain	Arxiv2024	EarthGPT	null
SkyCLIP	SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing	AAAI2024	SkyCLIP	link
GeoChat	GeoChat: Grounded Large Vision-Language Model for Remote Sensing	CVPR2024	GeoChat	link
LHRS-Bot	LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model	ECCV2024	LHRS-Bot	link
RS-LLaVA	RS-LLaVA: Large Vision Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery	RS2024	RS-LLaVA	link
SkySenseGPT	SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding	Arxiv2024	SkySenseGPT	link
EarthMarker	EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension	IEEE TGRS2024	EarthMarker	link
GeoText	Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching	ECCV2024	Aquila	link
Aquila	Aquila: A Hierarchically Aligned Visual-Language Model for Enhanced Remote Sensing Image Comprehension	Arxiv2024	Aquila	null
LHRS-Bot-Nova	LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation	Arxiv2024	LHRS-Bot-Nova	link
RSCLIP	Pushing the Limits of Vision-Language Models in Remote Sensing without Human Annotations	Arxiv2024	RSCLIP	null
GeoGround	GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding	Arxiv2024	GeoGround	link
RingMoGPT	RingMoGPT: A Unified Remote Sensing Foundation Model for Vision, Language, and grounded tasks	TGRS2024	RingMoGPT	null
RSUniVLM	RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts	Arxiv2024	RSUniVLM	link
UniRS	UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models	Arxiv2024	UniRS	null
REO-VLM	REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation	Arxiv2024	REO-VLM	null
SkyEyeGPT	SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model	ISPRS JPRS2025	SkyEyeGPT	link
VHM	VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis	AAAI2025	VHM	link
TEOChat	TEOChat: Large Language and Vision Assistant for Temporal Earth Observation Data	ICLR2025	TEOChat	link
EarthDial	EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues	CVPR2025	EarthDial	link
SkySense-O	SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling	CVPR2025	SkySense-O	link
XLRS-Bench	XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?	CVPR2025	XLRS-Bench	link
GeoPix	GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing	IEEE GRSM2025	GeoPix	link
GeoPixel	GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing	ICML2025	GeoPixel	link
-	Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models	Arxiv2025	Paper	null
DOFA-CLIP	DOFA-CLIP: Multimodal Vision–Language Foundation Models for Earth Observation	Arxiv2025	DOFA-CLIP	link
Falcon	Falcon: A Remote Sensing Vision-Language Foundation Model	Arxiv2025	Falcon	link
LRS-VQA	When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning	ICCV2025	LRS-VQA	link
UrbanLLaVA	UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding	ICCV2025	UrbanLLaVA	link
OmniGeo	OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence	Arxiv2025	OmniGeo	null
EagleVision	EagleVision: Object-level Attribute Multimodal LLM for Remote Sensing	Arxiv2025	EagleVision	link
SegEarth-R1	SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model	Arxiv2025	LISAt	link
RemoteSAM	RemoteSAM: Towards Segment Anything for Earth Observation	ACMMM2025	RemoteSAM	link
DynamicVL	DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding	Arxiv2025	DynamicVL	null
LISAt	LISAt: Language- Instructed Segmentation Assistant for Satellite Imagery	Arxiv2025	LISAt	link
EarthMind	EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models	Arxiv2025	EarthMind	link
-	Remote Sensing Large Vision-Language Model: Semantic-augmented Multi-level Alignment and Semantic-aware Expert Modeling	Arxiv2025	Paper	null
RingMo-Agent	RingMo-Agent: A Unified Remote Sensing Foundation Model for Multi-Platform and Multi-Modal Reasoning	Arxiv2025	RingMo-Agent	null

Remote Sensing Generative Foundation Models

Abbreviation	Title	Publication	Paper	Code & Weights
Seg2Sat	Seg2Sat - Segmentation to aerial view using pretrained diffuser models	Github	null	link
-	Generate Your Own Scotland: Satellite Image Generation Conditioned on Maps	NeurIPSW2023	Paper	link
GeoRSSD	RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model	Arxiv2023	Paper	link
DiffusionSat	DiffusionSat: A Generative Foundation Model for Satellite Imagery	ICLR2024	DiffusionSat	link
CRS-Diff	CRS-Diff: Controllable Generative Remote Sensing Foundation Model	Arxiv2024	Paper	null
MetaEarth	MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation	Arxiv2024	Paper	link
CRS-Diff	CRS-Diff: Controllable Generative Remote Sensing Foundation Model	Arxiv2024	Paper	link
HSIGene	HSIGene: A Foundation Model For Hyperspectral Image Generation	Arxiv2024	Paper	link
Text2Earth	Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model	Arxiv2025	Paper	link

Remote Sensing Vision-Location Foundation Models

Abbreviation	Title	Publication	Paper	Code & Weights
CSP	CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations	ICML2023	CSP	link
GeoCLIP	GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization	NeurIPS2023	GeoCLIP	link
SatCLIP	SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery	Arxiv2023	SatCLIP	link
RANGE	RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings	CVPR2025	RANGE	null
GAIR	GAIR: Improving Multimodal Geo-Foundation Model with Geo-Aligned Implicit Representations	Arxiv2025	GAIR	null

Remote Sensing Vision-Audio Foundation Models

Abbreviation	Title	Publication	Paper	Code & Weights
-	Self-supervised audiovisual representation learning for remote sensing data	JAG2022	Paper	link

Remote Sensing Task-specific Foundation Models

Abbreviation	Title	Publication	Paper	Code & Weights	Task
SS-MAE	SS-MAE: Spatial-Spectral Masked Auto-Encoder for Mulit-Source Remote Sensing Image Classification	TGRS2023	Paper	link	Image Classification
-	A Decoupling Paradigm With Prompt Learning for Remote Sensing Image Change Captioning	TGRS2023	Paper	link	Remote Sensing Image Change Captioning
TTP	Time Travelling Pixels: Bitemporal Features Integration with Foundation Model for Remote Sensing Image Change Detection	Arxiv2023	Paper	link	Change Detection
CSMAE	Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in Remote Sensing	Arxiv2024	Paper	link	Image Retrieval
RSPrompter	RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model	TGRS2024	Paper	link	Instance Segmentation
BAN	A New Learning Paradigm for Foundation Model-based Remote Sensing Change Detection	TGRS2024	Paper	link	Change Detection
-	Change Detection Between Optical Remote Sensing Imagery and Map Data via Segment Anything Model (SAM)	Arxiv2024	Paper	null	Change Detection (Optical & OSM data)
AnyChange	Segment Any Change	Arxiv2024	Paper	null	Zero-shot Change Detection
RS-CapRet	Large Language Models for Captioning and Retrieving Remote Sensing Images	Arxiv2024	Paper	null	Image Caption & Text-image Retrieval
-	Task Specific Pretraining with Noisy Labels for Remote sensing Image Segmentation	Arxiv2024	Paper	null	Image Segmentation (Noisy labels)
RSBuilding	RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model	Arxiv2024	Paper	link	Building Extraction and Change Detection
SAM-Road	Segment Anything Model for Road Network Graph Extraction	Arxiv2024	Paper	link	Road Extraction
CrossEarth	CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation	Arxiv2024	Paper	link	Domain Generalizable Remote Sensing Semantic Segmentation
GeoGround	GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding	Arxiv2024	Paper	link	Remote Sensing Visual Grounding
SARATR-X	SARATR-X: Toward Building a Foundation Model for SAR Target Recognition	IEEE TIP2025	SARATR-X	link	SAR Target Recognition

Remote Sensing Agents

Abbreviation	Title	Publication	Paper	Code & Weights
GeoLLM-QA	Evaluating Tool-Augmented Agents in Remote Sensing Platforms	ICLR 2024 ML4RS Workshop	Paper	null
RS-Agent	RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents	Arxiv2024	Paper	null
Change-Agent	Change-Agent: Toward Interactive Comprehensive Remote Sensing Change Interpretation and Analysis	TGRS2024	Paper	link
GeoLLM-Engine	GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots.	CVPRW2024	Paper	null
PEACE	PEACE: Empowering Geologic Map Holistic Understanding with MLLMs	CVPR2025	Paper	link
-	Towards LLM Agents for Earth Observation: The UnivEARTH Dataset	Arxiv2025	Paper	null
Geo-OLM	Geo-OLM: Enabling Sustainable Earth Observation Studies with Cost-Efficient Open Language Models & State-Driven Workflows	COMPASS'2025	Paper	link
ThinkGeo	ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks	Arxiv2025	Paper	link

Benchmarks for RSFMs

Abbreviation	Title	Publication	Paper	Link	Downstream Tasks
-	Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters	Arxiv2023	Paper	link	Classification
GEO-Bench	GEO-Bench: Toward Foundation Models for Earth Monitoring	Arxiv2023	Paper	link	Classification & Segmentation
FoMo-Bench	FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models	Arxiv2023	FoMo-Bench	Comming soon	Classification & Segmentation & Detection for forest monitoring
PhilEO	PhilEO Bench: Evaluating Geo-Spatial Foundation Models	Arxiv2024	Paper	link	Segmentation & Regression estimation
SkySense	SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery	CVPR2024	SkySense	Targeted open-source	Classification & Segmentation & Detection & Change detection & Multi-Modal Segmentation: Time-insensitive LandCover Mapping & Multi-Modal Segmentation: Time-sensitive Crop Mapping & Multi-Modal Scene Classification
VLEO-Bench	Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data	Arxiv2024	VLEO-bench	link	Location Recognition & Captioning & Scene Classification & Counting & Detection & Change detection
VRSBench	VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding	NeurIPS2024	VRSBench	link	Image Captioning & Object Referring & Visual Question Answering
UrBench	UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios	Arxiv2024	UrBench	link	Object Referring & Visual Question Answering & Counting & Scene Classification & Location Recognition & Geolocalization
PANGAEA	PANGAEA: A Global and Inclusive Benchmark for Geospatial Foundation Models	Arxiv2024	PANGAEA	link	Segmentation & Change detection & Regression
COREval	COREval: A Comprehensive and Objective Benchmark for Evaluating the Remote Sensing Capabilities of Large Vision-Language Models	Arxiv2024	COREval	null	Perception & Reasoning
GEOBench-VLM	GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks	Arxiv2024	GEOBench-VLM	link	Scene Understanding & Counting & Object Classification & Event Detection & Spatial Relations
Copernicus-Bench	Towards a Unified Copernicus Foundation Model for Earth Vision	Arxiv2025	Copernicus-Bench	link	Segmentation & Classification & Change detection & Regression

(Large-scale) Pre-training Datasets

Abbreviation	Title	Publication	Paper	Attribute	Link
fMoW	Functional Map of the World	CVPR2018	fMoW	Vision	link
SEN12MS	SEN12MS -- A Curated Dataset of Georeferenced Multi-Spectral Sentinel-1/2 Imagery for Deep Learning and Data Fusion	-	SEN12MS	Vision	link
BEN-MM	BigEarthNet-MM: A Large Scale Multi-Modal Multi-Label Benchmark Archive for Remote Sensing Image Classification and Retrieval	GRSM2021	BEN-MM	Vision	link
MillionAID	On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews, Guidances, and Million-AID	JSTARS2021	MillionAID	Vision	link
SeCo	Seasonal Contrast: Unsupervised Pre-Training From Uncurated Remote Sensing Data	ICCV2021	SeCo	Vision	link
fMoW-S2	SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery	NeurIPS2022	fMoW-S2	Vision	link
TOV-RS-Balanced	TOV: The original vision model for optical remote sensing image understanding via self-supervised learning	JSTARS2023	TOV	Vision	link
SSL4EO-S12	SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation	GRSM2023	SSL4EO-S12	Vision	link
SSL4EO-L	SSL4EO-L: Datasets and Foundation Models for Landsat Imagery	Arxiv2023	SSL4EO-L	Vision	link
SatlasPretrain	SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding	ICCV2023	SatlasPretrain	Vision (Supervised)	link
CACo	Change-Aware Sampling and Contrastive Learning for Satellite Images	CVPR2023	CACo	Vision	Comming soon
SAMRS	SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model	NeurIPS2023	SAMRS	Vision	link
RSVG	RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data	TGRS2023	RSVG	Vision-Language	link
RS5M	RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model	Arxiv2023	RS5M	Vision-Language	link
GEO-Bench	GEO-Bench: Toward Foundation Models for Earth Monitoring	Arxiv2023	GEO-Bench	Vision (Evaluation)	link
RSICap & RSIEval	RSGPT: A Remote Sensing Vision Language Model and Benchmark	Arxiv2023	RSGPT	Vision-Language	Comming soon
Clay	Clay Foundation Model	-	null	Vision	link
SATIN	SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models	ICCVW2023	SATIN	Vision-Language	link
SkyScript	SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing	AAAI2024	SkyScript	Vision-Language	link
ChatEarthNet	ChatEarthNet: A Global-Scale, High-Quality Image-Text Dataset for Remote Sensing	Arxiv2024	ChatEarthNet	Vision-Language	link
LuoJiaHOG	LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrieval	Arxiv2024	LuoJiaHOG	Vision-Language	null
MMEarth	MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning	Arxiv2024	MMEarth	Vision	link
SeeFar	SeeFar: Satellite Agnostic Multi-Resolution Dataset for Geospatial Foundation Models	Arxiv2024	SeeFar	Vision	link
FIT-RS	SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding	Arxiv2024	Paper	Vision-Language	link
RS-GPT4V	RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding	Arxiv2024	Paper	Vision-Language	link
RS-4M	Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset	Arxiv2024	RS-4M	Vision	link
Major TOM	Major TOM: Expandable Datasets for Earth Observation	Arxiv2024	Major TOM	Vision	link
VRSBench	VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding	Arxiv2024	VRSBench	Vision-Language	link
MMM-RS	MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image Generation	Arxiv2024	MMM-RS	Vision-Language	link
DDFAV	DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark	Arxiv2024	DDFAV	Vision-Language	link
M3LEO	A Multi-Modal, Multi-Label Earth Observation Dataset Integrating Interferometric SAR and Multispectral Data	NeurIPS2024	M3LEO	Vision	link
Copernicus-Pretrain	Towards a Unified Copernicus Foundation Model for Earth Vision	Arxiv2025	Copernicus-Pretrain	Vision	link

Relevant Projects

（TODO. This section is dedicated to recommending more relevant and impactful projects, with the hope of promoting the development of the RS community. 😄 🚀）

Title	Link	Brief Introduction
RSFMs (Remote Sensing Foundation Models) Playground	link	An open-source playground to streamline the evaluation and fine-tuning of RSFMs on various datasets.
PANGAEA	link	A Global and Inclusive Benchmark for Geospatial Foundation Models.
GeoFM	link	Evaluation of Foundation Models for Earth Observation.
EOUncertaintyGeneralization	link	On the Generalization of Representation Uncertainty in Earth Observation.

Survey/Commentary Papers

Title	Publication	Paper	Attribute
Self-Supervised Remote Sensing Feature Learning: Learning Paradigms, Challenges, and Future Works	TGRS2023	Paper	Vision & Vision-Language
The Potential of Visual ChatGPT For Remote Sensing	Arxiv2023	Paper	Vision-Language
遥感大模型：进展与前瞻	武汉大学学报 (信息科学版) 2023	Paper	Vision & Vision-Language
地理人工智能样本：模型、质量与服务	武汉大学学报 (信息科学版) 2023	Paper	-
Brain-Inspired Remote Sensing Foundation Models and Open Problems: A Comprehensive Survey	JSTARS2023	Paper	Vision & Vision-Language
Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters	Arxiv2023	Paper	Vision
An Agenda for Multimodal Foundation Models for Earth Observation	IGARSS2023	Paper	Vision
Transfer learning in environmental remote sensing	RSE2024	Paper	Transfer learning
遥感基础模型发展综述与未来设想	遥感学报2023	Paper	-
On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications	Arxiv2023	Paper	Vision-Language
Vision-Language Models in Remote Sensing: Current Progress and Future Trends	IEEE GRSM2024	Paper	Vision-Language
On the Foundations of Earth and Climate Foundation Models	Arxiv2024	Paper	Vision & Vision-Language
Towards Vision-Language Geo-Foundation Model: A Survey	Arxiv2024	Paper	Vision-Language
AI Foundation Models in Remote Sensing: A Survey	Arxiv2024	Paper	Vision
Foundation model for generalist remote sensing intelligence: Potentials and prospects	Science Bulletin2024	Paper	-
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques	Arxiv2024	Paper	Vision-Language
Foundation Models for Remote Sensing and Earth Observation: A Survey	Arxiv2024	Paper	Vision & Vision-Language
多模态遥感基础大模型：研究现状与未来展望	测绘学报2024	Paper	Vision & Vision-Language & Generative & Vision-Location
When Geoscience Meets Foundation Models: Toward a general geoscience artificial intelligence system	IEEE GRSM2024	Paper	Vision & Vision-Language
Towards the next generation of Geospatial Artificial Intelligence	JAG2025	Paper	-
Vision Foundation Models in Remote Sensing: A survey	IEEE GRSM2025	Paper	Vision
Unleashing the potential of remote sensing foundation models via bridging data and computility islands	The Innovation2025	Paper	-
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality	Arxiv2025	Paper	-

Citation

If you find this repository useful, please consider giving a star ⭐ and citation:

@inproceedings{guo2024skysense,
  title={Skysense: A multi-modal remote sensing foundation model towards universal interpretation for earth observation imagery},
  author={Guo, Xin and Lao, Jiangwei and Dang, Bo and Zhang, Yingying and Yu, Lei and Ru, Lixiang and Zhong, Liheng and Huang, Ziyuan and Wu, Kang and Hu, Dingxiang and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={27672--27683},
  year={2024}
}

@article{li2025unleashing,
  title={Unleashing the potential of remote sensing foundation models via bridging data and computility islands},
  author={Li, Yansheng and Tan, Jieyi and Dang, Bo and Ye, Mang and Bartalev, Sergey A and Shinkarenko, Stanislav and Wang, Linlin and Zhang, Yingying and Ru, Lixiang and Guo, Xin and others},
  journal={The Innovation},
  year={2025},
  publisher={Elsevier}
}

@article{wu2025semantic,
  author = {Wu, Kang and Zhang, Yingying and Ru, Lixiang and Dang, Bo and Lao, Jiangwei and Yu, Lei and Luo, Junwei and Zhu, Zifan and Sun, Yue and Zhang, Jiahao and Zhu, Qi and Wang, Jian and Yang, Ming and Chen, Jingdong and Zhang, Yongjun and Li, Yansheng},
  title= {A semantic‑enhanced multi‑modal remote sensing foundation model for Earth observation},
  journal= {Nature Machine Intelligence},
  year= {2025},
  doi= {10.1038/s42256-025-01078-8},
  url= {https://doi.org/10.1038/s42256-025-01078-8}
}

@inproceedings{zhu2025skysense,
  title={Skysense-o: Towards open-world remote sensing interpretation with vision-centric visual-language modeling},
  author={Zhu, Qi and Lao, Jiangwei and Ji, Deyi and Luo, Junwei and Wu, Kang and Zhang, Yingying and Ru, Lixiang and Wang, Jian and Chen, Jingdong and Yang, Ming and others},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={14733--14744},
  year={2025}
}

@article{luo2024skysensegpt,
  title={Skysensegpt: A fine-grained instruction tuning dataset and model for remote sensing vision-language understanding},
  author={Luo, Junwei and Pang, Zhen and Zhang, Yongjun and Wang, Tingzhu and Wang, Linlin and Dang, Bo and Lao, Jiangwei and Wang, Jian and Chen, Jingdong and Tan, Yihua and others},
  journal={arXiv preprint arXiv:2406.10100},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 181 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

`Awesome Remote Sensing Foundation Models`

📢 Latest Updates

Table of Contents

Remote Sensing Vision Foundation Models

Remote Sensing Vision-Language Foundation Models

Remote Sensing Generative Foundation Models

Remote Sensing Vision-Location Foundation Models

Remote Sensing Vision-Audio Foundation Models

Remote Sensing Task-specific Foundation Models

Remote Sensing Agents

Benchmarks for RSFMs

(Large-scale) Pre-training Datasets

Relevant Projects

Survey/Commentary Papers

Citation

About

Uh oh!

Releases

Packages

Contributors 4

Jack-bo1220/Awesome-Remote-Sensing-Foundation-Models

Folders and files

Latest commit

History

Repository files navigation

Awesome Remote Sensing Foundation Models

📢 Latest Updates

Table of Contents

Remote Sensing Vision Foundation Models

Remote Sensing Vision-Language Foundation Models

Remote Sensing Generative Foundation Models

Remote Sensing Vision-Location Foundation Models

Remote Sensing Vision-Audio Foundation Models

Remote Sensing Task-specific Foundation Models

Remote Sensing Agents

Benchmarks for RSFMs

(Large-scale) Pre-training Datasets

Relevant Projects

Survey/Commentary Papers

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

`Awesome Remote Sensing Foundation Models`

Packages