Authors:Xiangyu Zhao, Bo Liu, Yuehan Zhang, Zelin Song, Wanghan Xu, Feng Liu, Fengxiang Wang, Ben Fei, Fenghua Ling, Wangxu Wei, Wenlong Zhang, Xiao-Ming Wu
Abstract:Large foundation models (FMs) are transforming Earth science by integrating heterogeneous multimodal data, such as multi-platform imagery, gridded reanalysis data, diverse geophysical and geochemical observations, and domain-specific text, to support tasks ranging from basic perception to advanced scientific discovery. This paper provides a unified review of Earth science foundation models (Earth FMs) through two complementary dimensions: depth, which traces the evolution of model capabilities from perception to multimodal reasoning and agentic scientific workflows, and breadth, which summarizes their expanding applications across the atmosphere, hydrosphere, lithosphere, biosphere, anthroposphere, and cryosphere, as well as coupled Earth system processes. Using this framework, we review representative multimodal Earth foundation models and compile more than 200 datasets and benchmarks spanning diverse Earth science tasks and modalities. We further discuss key challenges in multimodal data heterogeneity, scientific reliability and continual updating, scalability and sustainability, and the transition from foundation models to agentic and embodied Earth intelligence, and outline future directions toward more integrated, trustworthy, and actionable AI Earth scientists. Overall, this paper offers a structured roadmap for understanding the development of Earth foundation models from both capability depth and application breadth.
| Subjects: | Instrumentation and Methods for Astrophysics (astro-ph.IM); Earth and Planetary Astrophysics (astro-ph.EP); Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.12542 [astro-ph.IM] |
| (or arXiv:2605.12542v1 [astro-ph.IM] for this version) | |
| https://doi.org/10.48550/arXiv.2605.12542 arXiv-issued DOI via DataCite |
Submission history
From: Xiangyu Zhao [view email]
[v1]
Sat, 9 May 2026 08:34:30 UTC (8,202 KB)
