ITS1.8/CL0.2 | Machine Learning for Climate Science
EDI
Machine Learning for Climate Science
Convener: Katharina HafnerECSECS | Co-conveners: Duncan Watson-ParrisECSECS, Tom BeuclerECSECS, Blanka BaloghECSECS, Gustau Camps-Valls
Orals
| Mon, 04 May, 14:00–17:55 (CEST)
 
Room C
Posters on site
| Attendance Mon, 04 May, 10:45–12:30 (CEST) | Display Mon, 04 May, 08:30–12:30
 
Hall X5
Orals |
Mon, 14:00
Mon, 10:45
Machine learning (ML) is currently transforming data analysis and modelling of the Earth system. While statistical and data-driven models have been used for a long time, recent advances in machine learning now allow for encoding non-linear, spatio-temporal relationships robustly without sacrificing interpretability. This has the potential to accelerate climate science, by providing new physics-based modelling approaches; improving our understanding of the underlying processes; reducing and better quantifying climate signals, variability, and uncertainty; and even making predictions directly from observations across different spatio-temporal scales. The limitations of machine learning methods need to also be considered, such as requiring, in general, rather large training datasets, data leakage, and/or poor generalisation abilities, so that methods are applied where they are fit for purpose and add value.

This session aims to provide a venue to present the latest progress in the use of ML applied to all aspects of climate science and we welcome abstracts focussed on, but not limited to:
- Causal discovery and inference: causal impact assessment, interventions, counterfactual analysis
- Learning (causal) process, equations, and feature representations in observations or across models and observations
- Hybrid models (physically informed ML, emulation, data-model integration)
- Novel detection and attribution approaches, including for extreme events
- Probabilistic modelling and uncertainty quantification
- ML-based super-resolution and bias-correction for climate downscaling
- Explainable AI applications to climate data science and climate modelling
- Distributional robustness, transfer learning and/or out-of-distribution generalisation tasks in climate science

Orals: Mon, 4 May, 14:00–17:55 | Room C

The oral presentations are given in a hybrid format supported by a Zoom meeting featuring on-site and virtual presentations. The button to access the Zoom meeting appears just before the time block starts.
Chairpersons: Blanka Balogh, Gustau Camps-Valls, Duncan Watson-Parris
14:00–14:10
|
EGU26-3829
|
solicited
|
On-site presentation
Niklas Boers

Earth system models (ESMs) are key tools in projecting the reponse of the Earth's climate and ecosystems to anthropogenic forcing in terms of increasing greenhouse gas concentrations and resulting temperature increases, as well as land use change. However, ESMs continue to suffer from prononced biases when compared to observations, and exhibit limited horizontal resolution due to computational constraints, mking reliable impact assessment challenging. Generative machine learning methods, such as Generative Adversarial Networks or Diffusion models, have shown great success in bias correcting and downscaling Earth system model output [1,2]. However, so far these approaches have been applied only as a postprocessing. After summarizing advances in this context, I will present recent work addressing conceptual and technical challenges in incorporating (generative) machine learning inside the architectures of process-based ESMs. These include the need for automatic differentiability of all ESM components [3], as well as physical constraints to assure that dynamics learned by machine learning components fulfills, for example, physical conservation laws [4].  

[1] P. Hess, M. Drüke, F. Strnad, S. Petri, N. Boers: Physically constrained generative adversarial networks for improving precipitation fields from Earth system models, Nature Machine Intelligence 4, 828-839 (2022)

[2] P. Hess, M. Aich, B. Pan, N. Boers: Fast, scale-Adaptive, and uncertainty-aware downscaling of Earth system model fields with generative machine learning, Nature Machine Intelligence 7, 363–373 (2025)

[3] M. Gelbrecht, A. White, S. Bathiany, N. Boers: Differentiable Programming for Earth System Modelling, Geoscientific Model Development 16, 3123–3135 (2023)

[4] A. White, N. Kilbertus, M. Gelbrecht, N. Boers: Stabilized Neural Differential Equations, NeurIPS (2023)

How to cite: Boers, N.: Machine learning for hybrid Earth system modelling, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-3829, https://doi.org/10.5194/egusphere-egu26-3829, 2026.

14:10–14:20
|
EGU26-4158
|
ECS
|
On-site presentation
Simon Michel, Kristian Strommen, and Hannah Christensen

Uncertainty in projections of future regional climate change remains large, driven by structural differences among Earth System Models and the influence of internal climate variability. Existing uncertainty-reduction approaches, including emergent constraints and Bayesian variants, primarily focus on forced climate responses derived from simple aggregate metrics, thereby requiring strong assumptions and exploiting only low-dimensional climate information. Here we propose a data-driven deep-learning framework that directly forecasts spatially and monthly resolved decadal mean climatologies of surface temperature anomalies from the 2030s to the 2090s, using only recent monthly trajectories spanning 1980-2025. The training ensemble contains 265 historical+SSP2-4.5 simulations, distributed across 40 ESMs from 25 different families (i.e., modelling centers) over which the cross validation is performed. The architecture couples pluri-annual to multi-decadal temporal convolutions with a spatial U-Net encoder-decoder and is evaluated on CMIP6 simulations using a leave-one-model-family-out cross-validation (LOMFO-CV) design to ensure generalisation across separately developed ESMs. Predictive uncertainty is quantified via LOMFO-CV errors, yielding conservative and reliable ranges that incorporate irreducible internal variability and systematic model shifts.

To further evaluate the predictive capacity beyond the CMIP6 distribution, we evaluated the network on historical+SSP2-4.5 simulations from a recent HadGEM3-GC5 model hierarchy developed within the European Eddy-Rich ESMs (EERIE) project, the European contribution to HighResMIP2 for CMIP7. In particular, the eddy-rich GC5-HH configuration explicitly simulates mesoscale ocean dynamics that are absent in CMIP6-type models, providing a rigorous test of generalisation to richer and more realistic physical representations. Despite these substantial differences, the network successfully reproduces warming trajectories and future climate patterns for all three model configurations (GC5-LL, GC5-MM, GC5-HH), with forecast errors largely contained within empirically calibrated uncertainty bounds from the LOMFO-CV, both globally and locally. These results, notably for GC5-HH and its more realistic physics, strengthens confidence in the applicability of the framework to real-world data.

When applied to observations, the extracted end-of-century global-mean surface temperature and its uncertainty range are consistent with prior estimates from Bayesian frameworks. At local scales, the network reduces uncertainty by 40% (2030s) to 30% (2090s) on average, and by up to 75% in some regions for all future decades. Importantly, these uncertainty estimates account not only for uncertainty in the forced response (as emergent constraint methods do), but also for errors associated with predicting different realisations of internal variability, providing a physically meaningful reduction of local and global climate uncertainty.

 

How to cite: Michel, S., Strommen, K., and Christensen, H.: Short- to long-range climate forecasts with deep learning, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-4158, https://doi.org/10.5194/egusphere-egu26-4158, 2026.

14:20–14:30
|
EGU26-20966
|
ECS
|
On-site presentation
Jeff Clark, Elena Fillola, Nawid Keshtmand, Raul Santos-Rodriguez, and Matt Rigby

Surface methane emissions can be estimated from atmospheric observations using inverse modelling systems, which often rely on Lagrangian Particle Dispersion Models (LPDMs) to simulate how the gas is transported through the atmosphere using meteorological fields. However, LPDM-based techniques struggle to scale to the size of modern satellite datasets, as one LPDM run is needed for each observation, taking on the order of 10 CPU-minutes to complete. Previously, we introduced the Machine Learning model GATES (Graph-Neural-Network Atmospheric Transport Emulation System), which can replicate LPDM outputs 1000x faster than the physics-based model, and demonstrated its application to infer emissions over South America. Training GATES over other world regions and comparing cross-regional performance shows that the learnt transport is domain-specific, consistent with the strong heterogeneity in wind patterns and topography across continents. In this presentation, we discuss transfer learning techniques and characterisation of regional differences in wind patterns, topography, data availability and the shape and magnitude of LPDM outputs, to increase transfer learning performance. This work builds capabilities towards efficientglobal methane emissions emulation. 

How to cite: Clark, J., Fillola, E., Keshtmand, N., Santos-Rodriguez, R., and Rigby, M.: From regional to global emulation: characterising regional differences to increase transfer learning performance  , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-20966, https://doi.org/10.5194/egusphere-egu26-20966, 2026.

14:30–14:40
|
EGU26-21859
|
On-site presentation
Sophie Abramian, Pauluis Olivier, and Gentine Pierre

Deep convection exhibits substantial variability even under fixed large-scale forcing, challenging deterministic descriptions of convective organization. Using idealized radiative–convective equilibrium simulations with imposed low-level shear, we quantify this intrinsic variability through a reduced-order stochastic framework. Convective transport is characterized by isentropic mass flux and embedded in a low-dimensional latent space using a variational autoencoder. The temporal evolution of convection in this space is modeled as a Markov chain, yielding a data-driven representation of convective states and their transition probabilities.

This framework demonstrates that internal feedbacks alone generate a broad ensemble of admissible convective trajectories within a single environment, which we interpret as the system’s intrinsic stochasticity. The leading latent dimensions correspond to the convective life cycle and degree of organization, while state transitions identify the constrained pathways through which organized convection emerges and evolves. Comparison of individual storm trajectories in latent space exposes systematic differences in dynamical behavior that are difficult to diagnose in physical space. However, departures from strictly Markovian behavior indicate that the instantaneous state representation does not fully capture slow memory effects associated with convective organization, which likely condition transition probabilities.

These results show that organized convection is best understood as one realization drawn from a constrained distribution of possible trajectories and establish a general machine-learning-enabled framework for quantifying variability and limits of predictability in multiscale atmospheric systems.

How to cite: Abramian, S., Olivier, P., and Pierre, G.: How Organized Convection Evolves in Latent Space, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21859, https://doi.org/10.5194/egusphere-egu26-21859, 2026.

14:40–14:50
|
EGU26-594
|
ECS
|
On-site presentation
Mostafa Kiani Shahvandi, Blaž Gasparini, and Aiko Voigt

Terrestrial Water Storage (TWS) represents all forms of water on land, including the cryosphere (polar ice sheets and mountain glaciers), the biosphere (canopies), soil and subsurface water (groundwater), and other inland water bodies (reservoirs, rivers, lakes, and wetlands). Modelling TWS remains a challenge because of difficulties in representing the water cycle on land. Furthermore, TWS is the source of mass-driven sea level change, an increasingly important contributor to sea level variation across the globe in the 20th and 21st centuries, with significant implications for coastal areas.

Here, we leverage the potential of machine learning and propose a Physics-Informed Neural Networks (PINNs) framework for modeling and predicting TWS and its associated sea level impacts. Because TWS varies in space and time, we build our framework based on convLSTM, an architecture that is suitable for serially-correlated “two-dimensional images” of data. The physical constraint for our PINNs is provided by the physics of continental-ocean mass redistribution, i.e., the sea level component, as described by the gravitationally self-consistent methodology of the sea level equation. The sea level equation connects TWS and sea level change by considering the gravitational, rotational, and deformational feedbacks caused by TWS components, particularly the cryosphere.  

We train and test our PINNs based on global TWS data from 1900 up to the end of 2018 (1900-2001 for training; 2002-2018 for testing). The data have a temporal resolution of 1 year and a spatial resolution of , and were derived from an assimilation of models and satellite gravimetry observations (in the time period 2002-2018). We perform various tests and discuss the advantages and shortcomings of our PINNs framework for modeling and predicting TWS. First, we show that TWS and sea level rise can be predicted reasonably well up to 10 years ahead (relative error of less than 30% on a global scale). This might prove useful for studies of sea level rise in coastal areas. Second, we compare our predictions with those of the Ice Sheet Model Intercomparison Project (ISMIP) in CMIP6 climate models, and satellite observations of Gravity Recovery and Climate Experiment (GRACE; in the range 2015-2024). We demonstrate that our predictions are closer to GRACE observations and, therefore, more accurate (up to 40% for the lead horizon of 10 years) compared to ISMIP projections under high and low emission pathways. Finally, we discuss how the predictions could be further improved by using probabilistic deep learning approaches, particularly  so-called deep ensembles. Our results show that once trained, PINNs can provide predictions orders of magnitude faster than climate models and with better accuracy.

How to cite: Kiani Shahvandi, M., Gasparini, B., and Voigt, A.: Physics-informed neural networks predict changes in terrestrial water storage and sea level, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-594, https://doi.org/10.5194/egusphere-egu26-594, 2026.

14:50–15:00
|
EGU26-4512
|
ECS
|
On-site presentation
Maren Höver, Milan Klöwer, Christian Schroeder de Witt, and Hannah M. Christensen

Machine learning-based weather prediction is revolutionizing weather forecasting by learning from present-day climate. However, generalization to other climates remains a major challenge. With melting sea ice, land-use change and increasing ocean temperatures, boundary conditions are changing. Therefore, generalization in time will likely only be possible if generalization in space is also given. The physics of the atmosphere is invariant in space, and as such, a model should demonstrate the same to accurately represent the real world.

Here, we present three test cases to evaluate whether machine learning-based weather and climate models generalize spatially and apply them to multiple AI weather models. The tests consist of reversing the entirety of the input data and boundary conditions in latitude (Test 1), reversing them in longitude (Test 2), as well as rotating them by 180˚ in longitude (Test 3), while keeping all aspects of the simulation physically consistent. For a deterministic model that generalizes in space, each of these test cases yields the same predictions as the baseline case, only subject to a rounding error. With these test cases, we investigate whether data-driven models hardcode representations of spatial relationships in the training data into their latent space. We show that currently, both fully data-driven and hybrid general circulation models do not pass these tests, instead performing poorly with unphysical results. This implies that they have likely not learned underlying atmospheric physics principles, but instead local spatial relationships statistically dependent on geographical location. This calls into question the ability of such models to simulate a changing regional climate. As such, we propose that machine learning-based climate models be evaluated using our spatial tests during model development to reduce overfitting on present-day regional climate.

How to cite: Höver, M., Klöwer, M., Schroeder de Witt, C., and Christensen, H. M.: Spatial Generalization Tests for Machine Learning-based Weather Models as a Requirement for Climate Predictions, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-4512, https://doi.org/10.5194/egusphere-egu26-4512, 2026.

15:00–15:10
|
EGU26-9940
|
ECS
|
On-site presentation
Kai-Hendrik Cohrs, Maria Gonzalez-Calabuig, Vishal Nedungadi, Zuzanna Osika, Ruben Cartuyvels, Steffen Knoblauch, Joppe Massant, Shruti Nath, Patrick Ebel, and Vasileios Sitokonstantinou

Following recent advances of foundation models in natural language processing and computer vision, there is growing interest in leveraging geospatial foundation models (GFMs) for Earth system monitoring and climate-relevant applications. In particular, GFMs promise to support large-scale observation of climate-driven extreme events such as wildfires, floods and landslides. However, despite strong benchmark results, recent studies indicate that GFMs for land-cover modelling and hazard mapping models can behave unreliably under real-world conditions. Pretraining datasets often underrepresent rare or extreme environmental regimes, leading to degraded model performance precisely in situations where robust predictions are most critical for climate risk assessment and disaster response. Furthermore, GFMs are often surpassed by simple supervised baselines, highlighting the need for systematic reliability analysis, including out-of-distribution (OOD) detection and uncertainty quantification.

We present SHRUG-FM (systematic handling of real-world uncertainty in geospatial foundation models), a reliability-aware prediction framework that integrates three complementary signals: (1) OOD detection in the input space, (2) OOD detection in the embedding space and (3) task-specific predictive uncertainty obtained from decoder ensembles. We evaluate SHRUG-FM on climate-relevant extreme-event applications, including burn-scar, flood and landslide segmentation. Our results show that elevated OOD scores consistently co-locate with degraded model performance, while uncertainty-based indicators successfully capture many low-confidence and erroneous predictions. By linking these reliability signals to hydro-environmental descriptors from HydroATLAS, we further demonstrate that model failures cluster in distinct geographic and hydroclimatic regimes, revealing interpretable gaps in the pretraining distribution and guiding future dataset design.

SHRUG-FM delivers practical, operationally relevant diagnostics for Earth system monitoring and prediction. It enables selective prediction, rejection strategies, and reliability-aware quality control. These capabilities are essential for integrating GFMs into real-world workflows for climate impact assessment, hazard monitoring and early warning systems. Future work will extend the framework to additional foundation models and climate-driven hazards.

How to cite: Cohrs, K.-H., Gonzalez-Calabuig, M., Nedungadi, V., Osika, Z., Cartuyvels, R., Knoblauch, S., Massant, J., Nath, S., Ebel, P., and Sitokonstantinou, V.: SHRUG-FM: Reliability-Aware Foundation Models for Earth Observation, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9940, https://doi.org/10.5194/egusphere-egu26-9940, 2026.

15:10–15:20
|
EGU26-8710
|
ECS
|
Virtual presentation
Ashish Bhattarai and Youtong Zheng

Accurate initiation of deep convection remains a persistent challenge in weather and climate models. Most general circulation models (GCMs) operate at coarse resolution and therefore cannot explicitly resolve convective events; instead, they rely on convective parameterizations in which triggering is diagnosed from environmental thresholds, commonly based on convective available potential energy (CAPE). Convection-permitting models (CPMs) alleviate some of these structural limitations by resolving grid-scale convective spectrum while leaving behind sub-grid scale events. On the other hand, machine learning (ML)-based convection trigger functions have emerged, but still with uncertainty, whose causes are rarely examined. Here, we diagnose the atmospheric states associated with “blind spots” in ML predictors of deep convection initiation, leveraging the Department of Energy Atmospheric Radiation Measurement constrained variational analysis (VARANAL) product and the CPM-based CONUS404 hydroclimate dataset over the Southern Great Plains (SGP). We train a conventional artificial neural network (ANN) and a controlled abstention network (CAN), evaluate their skill in identifying deep convection, and use CAN to quantitatively isolate low-confidence samples while understanding the associated physical conditions in which the models are least reliable. ANN and CAN show comparable baseline performance, and for both models, skill increases when low-confidence samples are excluded, indicating that abstention identifies systematically difficult conditions rather than random noise. Across both VARANAL and CONUS404 datasets, low-confidence samples preferentially occur under weak-to-moderately negative mid-level vertical velocity (−10 to −5 hPa hr⁻¹) and dynamic generation rate of CAPE (dCAPE; 0–200 J kg⁻¹ hr⁻¹). Additionally, these cases are dominated by short, convective episodes that persist for only a few hours, dominantly occurring during the afternoon. These abstention samples also exhibit locally forced, non-equilibrium environments characterized by larger convective adjustment time (τ), consistent with reduced predictability relative to regimes controlled by broader synoptic forcing with smaller τ. Collectively, our results quantitatively identify the regimes and associated physical mechanisms in which ML-based convection predictors are least robust, providing actionable guidance for operational forecasters to treat predictions with greater caution when these low-confidence conditions are present.

How to cite: Bhattarai, A. and Zheng, Y.: Uncertainty-Aware Machine Learning for Deep Convection Initiation: Insights from ARM Observations and Kilometer-Scale Hydroclimate Reanalysis, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8710, https://doi.org/10.5194/egusphere-egu26-8710, 2026.

15:20–15:30
|
EGU26-14352
|
On-site presentation
Jean-Marc Delouis, Tina Odaka, and Sébastien Tétaud

Many climate variables are naturally defined on the sphere and exhibit strong anisotropy and directionality (e.g., fronts, jets, boundary currents). Yet most deep-learning forecasting models still rely on planar projections and Euclidean convolutions, which introduce geometric distortions and artificial discontinuities. Graph-based spherical models alleviate some of these issues, but typically remain isotropic and do not explicitly represent local orientation, a key ingredient to model directional transport-like patterns.

Here we introduce and evaluate a gauge-equivariant spherical U-Net implemented directly on the HEALPix grid, designed to encode local orientation consistently across the sphere. Our approach leverages gauge-equivariant convolutions that transform predictably under changes of local reference frame, allowing the network to learn directional filters while preserving spherical geometry. This provides a principled alternative to both planar U-Nets (with longitude-periodic padding) and graph U-Nets, and addresses a core limitation of most spherical models: the lack of explicit orientation handling. This work benchmarks this model against two strong baselines: a planar U-Net with longitude-periodic padding and a spherical graph U-Net defined on the same HEALPix discretization.

We apply this architecture to multi-horizon forecasting of global sea-surface temperature (SST) anomalies at NSIDE=32, using a controlled experimental design with matched training protocols and comparable parameter budgets, with emphasis on low-capacity regimes relevant to data-limited climate settings (≈30–40 years of monthly observations). We report quantitative metrics across horizons and analyze qualitative error modes, showing how gauge-equivariant spherical convolutions mitigate projection artefacts while enabling orientation-aware feature extraction on the sphere. Our results highlight when and why encoding orientation through gauge equivariance provides added value beyond “spherical-but-isotropic” baselines, and offer practical guidance for deploying spherical equivariant models in climate forecasting pipelines.

How to cite: Delouis, J.-M., Odaka, T., and Tétaud, S.: Gauge-Equivariant Spherical U-Nets on HEALPix for Global SST Forecasting: Encoding local orientation on the sphere, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14352, https://doi.org/10.5194/egusphere-egu26-14352, 2026.

15:30–15:40
|
EGU26-10825
|
ECS
|
Virtual presentation
Shahine Bouabid, Christopher Womack, Glenn Flierl, Noelle Selin, Raffaele Ferrari, Andre Souza, Paolo Giani, and Björn Lutjens
Policy targets evolve faster than the Couple Model Intercomparison Project (CMIP) cycles, complicating adaptation and mitigation planning that must often contend with outdated projections. Climate model emulators address this gap by offering inexpensive surrogates that can rapidly explore alternative futures while staying close to Earth System Model (ESM) behavior. Here we present recent advances in probabilistic climate emulation aimed to provide inputs for impact models. We show that a generative emulator can reproduce key climate variables at a small fraction of the computational cost of ESMs, while retaining skill in reproducing probability distributions, cross-variable dependencies, time of emergence, and tail behavior. The emulator is informative even for scenarios with aggressive emissions reductions to meet Paris targets. We further show how generative emulators can extend beyond traditional ESMs by directly integrating bias-correction strategies, thereby avoiding separate post-processing steps commonly used in impact assessment pipelines. Finally, we present a framework to design emission scenarios optimized for emulator training, that yields emulators with comparable or improved skill while reducing the volume of ESM simulations needed to train the emulator. We suggest that modeling centers allocate dedicated resources to such "emulator-training" experiments, enabling the rapid generation of large, impact-relevant ensembles across Shared Socioeconomic Pathways while freeing computational capacity for other scientific applications of full-scale Earth system models.

How to cite: Bouabid, S., Womack, C., Flierl, G., Selin, N., Ferrari, R., Souza, A., Giani, P., and Lutjens, B.: Advances in generative climate emulation to support impact-assessment, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-10825, https://doi.org/10.5194/egusphere-egu26-10825, 2026.

Coffee break
Chairpersons: Duncan Watson-Parris, Tom Beucler, Katharina Hafner
16:15–16:25
|
EGU26-19912
|
On-site presentation
Elena Tomasi, Gabriele Franch, Giacomo Tomezzoli, Sandro Calmanti, and Marco Cristoforetti

Global Climate Models (GCMs) provide critical insights into future climate variability, yet their coarse spatial resolution limits their utility for regional and local-scale impact assessments. AI-driven downscaling techniques have emerged in the last few years as a cost-effective and viable alternative to traditional methods to enhance the spatial resolution of climate projections. Nevertheless, establishing their reliability in unseen climate states remains a priority. This study applies and evaluates a deep generative Latent Diffusion Model, leveraging a residual approach (LDM_res, Tomasi et al., 2025) to downscale GCM outputs (~1°) to high-resolution (~4 km) 6-hourly precipitation and 2-m minimum and maximum temperature fields.

The LDM is developed as an emulator of the COSMO-CLM dynamical model, trained on VHR-REA_IT data (Raffa et al., 2021 - a dynamical downscaling of ERA5). By using aggregated ERA5 data as low-resolution predictors (along with high-resolution static data), the LDM_res model is required to learn to mimic the computationally expensive physics of dynamical downscaling. The model, trained over the past 40 years, is subsequently applied to generate high-resolution climate projections based on the input from four selected CMIP6 GCMs across four different emission scenarios. This modeling chain establishes a hybrid ML-Physics-based system to provide impact assessors with cost-effective, high-resolution climate information.

A central challenge addressed in this work is the evaluation of the model's out-of-distribution generalization—specifically its ability to perform in unseen future climate states and under predictor configurations characteristic of CMIP6 projections. We evaluate the emulator's reliability by comparing its outputs against VHR-PRO_IT, a "twin" dataset of VHR-REA_IT produced using COSMO_CLM to dynamically downscale projections (Raffa et al., 2023), providing a rigorous test of the ML system’s reliability in out-of-domain scenarios.

Furthermore, we compare the LDM_res against traditional statistical (e.g., quantile mapping) and dynamical approaches. Comparative results over the Italian peninsula indicate that while the LDM preserves large-scale seasonal signals from CMIP6 models, it significantly enhances spatial realism and local variability in topographically complex areas. Unlike purely statistical methods, the hybrid ML approach demonstrates superior ability to represent fine-scale heterogeneity in mountainous and coastal regions while maintaining consistency with the original signal.

How to cite: Tomasi, E., Franch, G., Tomezzoli, G., Calmanti, S., and Cristoforetti, M.: Deep learning for high-resolution climate projections: a Latent Diffusion Model emulating dynamical downscaling over Italy, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19912, https://doi.org/10.5194/egusphere-egu26-19912, 2026.

16:25–16:35
|
EGU26-12050
|
ECS
|
On-site presentation
Mikhail Ivanov, Ramón Fuentes Franco, and Torben Koenigk

Providing high-resolution climate information by downscaling future climate projections from the Coupled Model Intercomparison Project (CMIP6) remains a central challenge for the regional climate modeling community. CMIP6 includes a wide range of global climate model (GCM) simulations across multiple Shared Socioeconomic Pathways (SSPs), resulting in substantial computational demand for dynamical downscaling if each member is to be fully regionalized. To address this challenge, we propose a computationally efficient statistical downscaling framework based on a U-Net architecture trained over Europe. The model learns high-resolution spatial mappings directly from reanalysis data, offering a low-cost complement to regional climate models (RCMs) for large-ensemble downscaling.

We demonstrate that the climate downscaling U-Net achieves performance comparable to the HCLIM RCM when applied to unbiased EC-Earth3-Veg simulations for both the historical period and the low-emission SSP1-2.6 scenario up to 2100. The model captures spatial temperature patterns, seasonal variability, and the amplitude of warming remarkably well in these cases, providing confidence in its ability to translate GCM-scale information into higher regional climate scales.

When the U-Net is trained exclusively on reanalysis data, its extrapolation behavior under stronger forcing scenarios becomes an important aspect to evaluate. In the high-emission SSP3-7.0 scenario, after the regional climate warms by approximately +2.0 °C beyond the conditions represented in the training data, typically during 2060-2080, the model begins to diverge modestly from the warming magnitude simulated by both the driving GCM and the HCLIM downscaling. This divergence is most pronounced during summer months, while winter temperature trends remain in close agreement. These deviations are not presented as shortcomings of the method, but rather as a clear illustration of the limits of extrapolation when statistical models are trained solely on historical climate states. Highlighting these limits is essential for understanding the robustness of statistical downscaling within and beyond the training domain, particularly for applications involving strong climate-change signals.

Finally, we investigate how the model’s capabilities evolve when future regional climate information is included in the training set. Incorporating a subset of future data markedly improves the extrapolation performance, enabling the U-Net to recover long-term warming trends and seasonal patterns consistent with HCLIM even under strong forcing. This demonstrates that the U-Net architecture can effectively learn and generalize high-resolution climate transformations when provided with an extended training domain. Overall, our findings underscore the potential of deep-learning-based downscaling for scalable, ensemble-wide applications while also clarifying the conditions under which historical-only statistical training remains reliable.

How to cite: Ivanov, M., Fuentes Franco, R., and Koenigk, T.: Can ML-based statistical downscaling models reliably extrapolate into the future?, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12050, https://doi.org/10.5194/egusphere-egu26-12050, 2026.

16:35–16:45
|
EGU26-11688
|
On-site presentation
Philipp Hess, Sebastian Bathiany, and Niklas Boers

Numerical Earth system model (ESM) simulations require bias correction and downscaling to assess regional climate impacts due to their coarse resolution (50-100km) and systematic errors. Recent generative machine learning-based downscaling methods show promise in capturing small-scale spatial patterns, as well as multivariate and temporal dependencies [1,2,3]. However, making these approaches efficient and scalable to high resolutions globally remains challenging.

Here, we present a generative machine learning method for multivariate and temporally consistent downscaling of global climate fields at daily and 0.25° spatial resolution.  An autoregressive consistency model [4] is trained using Patch Diffusion [5] as an efficient probabilistic emulator of the ERA5 reanalysis and applied to downscale 8 key climate impact variables, including precipitation, temperature, wind speed, and radiation.
We downscale five 100-year simulations per ESM, including pre-industrial control,  historical, and 2K warming scenarios with and without tipping of the Atlantic meridional overturning circulation and the Amazon rainforest, from three CMIP6-class ESMs (MPI-ESM1-2-HR, HadGEM3-GC31-MM, and CESM1-CAM5).

The approach accurately reproduces small-scale variability and extremes, outperforms statistical baselines, substantially reduces biases, and preserves the large-scale response of the tipping dynamics in the ESMs.


   
[1] Mardani, M., Brenowitz, N., Cohen, Y., Pathak, J., Chen, C. Y., Liu, C. C., ... & Pritchard, M., Residual corrective diffusion modeling for km-scale atmospheric downscaling, Communications Earth & Environment, 6(1), 124, 2025. 
[2] Schmidt, J., Schmidt, L., Strnad, F. M., Ludwig, N., & Hennig, P., A generative framework for probabilistic, spatiotemporally coherent downscaling of climate simulation. npj Climate and Atmospheric Science, 8(1), 270, 2025.
[3] Hess, P., Aich, M., Pan, B., & Boers, N.,  Fast, scale-adaptive and uncertainty-aware downscaling of Earth system model fields with generative machine learning, Nature Machine Intelligence, 1-11, 2025.
[4] Wang, Z., Jiang, Y., Zheng, H., Wang, P., He, P., Wang, Z., ... & Zhou, M., Patch diffusion: Faster and more data-efficient training of diffusion models, Advances in neural information processing systems, 36, 72137-72154, 2023. 
[5] Song, Y., & Dhariwal, P., Improved techniques for training consistency models, In The Twelfth International Conference on Learning Representations, 2024.

How to cite: Hess, P., Bathiany, S., and Boers, N.: Generative Machine Learning for Dynamically Consistent Multivariate Downscaling of Tipping Point Simulations from Global Earth System Models, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11688, https://doi.org/10.5194/egusphere-egu26-11688, 2026.

16:45–16:55
|
EGU26-12008
|
ECS
|
On-site presentation
Kevin Monsalvez-Pozo, Francisco Granell-Haro, Marcos Martinez-Roig, Víctor Galván Fraile, Nuria P. Plaza-Martín, Martin Otto Paul Ramacher, Johannes Bieser, Johannes Flemming, Miha Razinger, Paula Harder, César Azorin-Molina, and Gustau Camps-Valls

Air pollution, particularly fine particulate matter (PM2.5), poses a significant risk to public health, necessitating accurate high-resolution monitoring. While global Chemical Transport Models (CTMs) like the Copernicus Atmosphere Monitoring Service (CAMS) provide continuous worldwide coverage, their coarse spatial resolution (~40 km) limits their utility for assessing local exposure relative to regional models (~10 km) that are restricted to specific domains, such as Europe. To bridge this gap, we present a novel deep learning approach for global downscaling of pollutant concentrations based on the Adaptive Fourier Neural Operator (AFNO), benchmarking its performance against a standard U-Net baseline.

We adapted the Modulated AFNO architecture for spatial super-resolution, using low-resolution CAMS Global PM2.5 and dynamic meteorological fields (wind, temperature, dew point, boundary layer height). A key innovation is integrating these inputs with high-resolution static data: orography and population density. We demonstrate that directly inputting static features into the network backbone outperforms separate spatial conditioning, effectively leveraging the Fast Fourier Transform to capture long-range dependencies while respecting local physical constraints.

The model was developed using daily forecasts from 2020 to mid-2025. Training used a sequential split into 2021–2024, preserving 2020 (COVID-19 anomalies) and 2025 as a held-out test set. The model effectively reconstructed fine-scale details and corrected global model biases. Verification against European Environment Agency observations (2020) confirmed performance comparable to high-resolution CAMS Europe regional forecasts. Crucially, the AFNO model consistently outperformed the U-Net baseline and traditional linear interpolation in spatial correlations and error rates. Finally, transferability tests in North America (AirNow data) confirmed the model generalizes effectively to unseen regions, maintaining lower errors than both the original global forecast and the baseline.

How to cite: Monsalvez-Pozo, K., Granell-Haro, F., Martinez-Roig, M., Galván Fraile, V., Plaza-Martín, N. P., Paul Ramacher, M. O., Bieser, J., Flemming, J., Razinger, M., Harder, P., Azorin-Molina, C., and Camps-Valls, G.: AFNO-based downscaling of global air pollution fields, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12008, https://doi.org/10.5194/egusphere-egu26-12008, 2026.

16:55–17:05
|
EGU26-7473
|
ECS
|
On-site presentation
Maximilian Witte, Johannes Meuer, Étienne Plésiat, and Christopher Kadow

We introduce Field-Space Attention, a novel, scalable, interpretable, and flexible attention module designed for Earth system machine learning models. The key concept involves computing attention directly within physical space on the HEALPix sphere. This approach ensures that all intermediate states remain as globally defined geophysical fields rather than as abstract latent tokens. This field-centric design maintains the physical meaning of internal representations, renders layer-wise updates interpretable, and offers a simple interface for integrating scientific constraints and prior knowledge throughout the network (see Figure). Field-Space Attention is based on a fixed, non-learned, multiscale, spherical decomposition. It learns structure-preserving deformations that coherently couple information across coarse and fine scales. This enables global context without sacrificing local detail.

We demonstrate the module's effectiveness in representative Earth system learning experiments on spherical grids. We focus on global near-surface temperature super-resolution on a HEALPix grid using ERA5 reanalysis data and benchmark it against widely used Vision Transformer and U-Net–style baselines. Our Field-Space Transformer model trains more stably, converge faster, achieve strong accuracy with substantially fewer parameters, and yield physically interpretable intermediate fields.

By keeping computation in field space and explicitly separating scales, Field-Space Attention is particularly well-suited for high-resolution Earth system modeling. It supports scale-aware inductive biases, principled cross-scale consistency, and the efficient coupling of large-scale dynamics with fine-scale variability. These properties position Field-Space Attention as a compact building block for next-generation, high-resolution Earth system prediction and generative modeling. This includes downscaling, spatiotemporal forecasting, infilling, and data assimilation under stronger physical constraints.

How to cite: Witte, M., Meuer, J., Plésiat, É., and Kadow, C.: Field-Space Attention for Structure-Preserving Earth System Transformers, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7473, https://doi.org/10.5194/egusphere-egu26-7473, 2026.

17:05–17:15
|
EGU26-7552
|
ECS
|
On-site presentation
Vytautas Jancauskas, Samuel Garske, and Daniela Espinoza Molina

The impact of droughts on vegetation is commonly assessed through correlational analysis of satellite-derived variables, such as NDVI, precipitation anomalies, soil moisture, and more (Hao & Singh 2015, Park et al. 2016, Joiner et al. 2018). However, these correlation-based approaches cannot disentangle the true causal drivers from their confounded associations (Zhang et al. 2022). This limits our ability to understand and attribute the scale of vegetation stress to specific drought mechanisms (e.g. soil moisture deficits versus irrigation resilience), and our ability to design effective interventions that address the primary drivers.

As such, we propose a novel causal inference framework to estimate the impact of drought on vegetation health using satellite time-series data, and demonstrate its application to the Iberian Peninsula. We firstly define a graphical causal model based on established eco-hydrological pathways, and then integrate multi-sensor remote sensing data (MODIS NDVI, SPEI, etc.) and climate reanalysis (ERA5). By extending traditional causal inference methods for georeferenced time-series raster data and controlling for well-established confounding variables (temperature, solar radiation, precipitation, soil moisture, land cover, and irrigation), we isolate the effect of drought severity on vegetation. We also implement novel visualisation methods to display these causal influence estimates.

While causal inference allows us to move beyond correlation and understand the impact on vegetation from each of these key variables, counterfactual intervention is also essential to understand how varying conditions would otherwise change the outcome (Schölkopf et al. 2021), i.e. the severity of the drought impact. Therefore, by leveraging these interventions, our results go from descriptive analytics to actionable insights on drought severity under the changing climate. This enables more effective drought impact assessment for scientists, policymakers, and industry experts.

References:
1. Hao, Z. and Singh, V.P., 2015. Drought characterization from a multivariate perspective: A review. Journal of Hydrology, 527, pp.668-678.
2. Park, S., Im, J., Jang, E. and Rhee, J., 2016. Drought assessment and monitoring through blending of multi-sensor indices using machine learning approaches for different climate regions. Agricultural and forest meteorology, 216, pp.157-169.
3. Joiner, J., Yoshida, Y., Anderson, M., Holmes, T., Hain, C., Reichle, R., Koster, R., Middleton, E. and Zeng, F.W., 2018. Global relationships among traditional reflectance vegetation indices (NDVI and NDII), evapotranspiration (ET), and soil moisture variability on weekly timescales. Remote Sensing of Environment, 219, pp.339-352.
4. Zhang, X., Hao, Z., Singh, V.P., Zhang, Y., Feng, S., Xu, Y. and Hao, F., 2022. Drought propagation under global warming: Characteristics, approaches, processes, and controlling factors. Science of the Total Environment, 838, p.156021.
5. Schölkopf, B., Locatello, F., Bauer, S., Ke, N.R., Kalchbrenner, N., Goyal, A. and Bengio, Y., 2021. Toward causal representation learning. Proceedings of the IEEE, 109(5), pp.612-634.

How to cite: Jancauskas, V., Garske, S., and Espinoza Molina, D.: A Causal Inference Framework for Analysing Drought Drivers, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7552, https://doi.org/10.5194/egusphere-egu26-7552, 2026.

17:15–17:25
|
EGU26-7971
|
ECS
|
On-site presentation
Tim Reichelt and Philip Stier

Understanding the driving forces behind mesoscale cloud organization is fundamental to reducing uncertainties in cloud climate feedbacks. Traditional climate models cannot explicitly resolve mesoscale cloud structures due to their limited resolution, leading to large uncertainties in cloud climate feedback estimates. Storm-resolving models that simulate the atmosphere at kilometre resolution have the potential to reduce these uncertainties. Yet, these models are still biased in their organizational structure when compared to satellite observations. Approaches constraining cloud feedbacks directly from the satellite records are promising but often rely on manually chosen cloud controlling factors (CCFs) that do not necessarily capture all the information necessary to explain mesoscale organizational structures and generally only utilise linear models to predict cloud radiative properties from CCFs.

We present CloudDiff, a probabilistic machine learning model that generates mesoscale cloud structures at kilometre resolution conditioned on environmental conditions in the atmosphere, namely the temperature and humidity profiles as well as vertical and horizontal winds. The model is trained on MODIS Level 1 satellite data and environmental conditions from ECMWF ERA5 reanalysis data. CloudDiff is able to reconstruct realistic MODIS observations from matching ERA5 environmental conditions and achieves a lower reconstruction error compared to generating MODIS observations solely from pre-defined CCFs. In CloudDiff’s generation stage, the environmental conditions are compressed into a latent representation using an attention mechanism. This latent representation can be interpreted as a set of CCFs that have been learned purely from data. We’ll discuss the properties of the learned CCFs including how they relate to existing CCFs, their geographical distribution, and their predictive power of the radiative properties of cloud fields.

How to cite: Reichelt, T. and Stier, P.: CloudDiff: A Conditional Diffusion Model to Generate Mesoscale Cloud Structures, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7971, https://doi.org/10.5194/egusphere-egu26-7971, 2026.

17:25–17:35
|
EGU26-13676
|
ECS
|
On-site presentation
Paolo Bonetti, Matteo Giuliani, Teo Bucci, Veronica Cardigliano, Alberto Maria Metelli, Marcello Restelli, and Andrea Castelletti

Drought is a slowly developing natural hazard that can affect all climatic zones and is commonly defined as a temporary but significant decrease in water availability. In Europe alone, drought impacts over the last decades have generated very large economic losses, and recent summer events have been exceptional in a long-term historical perspective. Despite extensive research on drought monitoring and management, accurately characterizing how drought drivers evolve into impacts is still a key unresolved challenge, especially when impacts result from the cumulative and interacting effects of multiple hydroclimatic anomalies rather than a single precursor.

In this work, we introduce a machine learning procedure named DRIER (Drought Detection via Regression-based Interpretable Extraction and Causal Relationships) to develop interpretable, impact-based drought indices. Unlike traditional indices that primarily look at meteorological anomalies (e.g., precipitation deficits), DRIER is designed to capture the compound nature of drought impacts, such as prolonged dry periods occurring alongside high temperatures and reduced snowpack. DRIER is a fully data-driven and automated framework that integrates: (i) non-linear feature aggregation for dimensionality reduction to preserve an interpretable representation of candidate hydro-meteorological predictors, while reducing their dimension; (ii) conditional mutual information-based feature selection to identify the most informative drought drivers; (iii) multi-task linear regression to upscale learning across multiple sub-regions, leveraging shared drought processes while preserving local heterogeneity; (iv) causal validation using the Transfer Entropy Feature Selection algorithm to confirm that the relationships identified between hydroclimatic variables and drought impacts are not merely correlative but grounded in robust causal mechanisms.

We demonstrate DRIER in the Po River Basin (Italy) by considering 10 sub-basins and using vegetation stress quantified through the Vegetation Health Index (VHI) as an impact proxy. The application shows that DRIER can capture spatially heterogeneous drought–impact relationships across sub-regions while benefiting from multi-task learning to share information where responses are correlated. Importantly, because the framework is interpretable end-to-end, the resulting impact-based index is not a black-box score: each step produces transparent, auditable outputs that identify the key hydroclimatic drivers, how they are aggregated into the index, and how they contribute (in sign and magnitude) to vegetation stress. The integrated causal discovery component further strengthens confidence in real-world use by privileging predictors consistent with robust physical mechanisms, reducing the influence of spurious correlations and supporting transferability across space and time.

How to cite: Bonetti, P., Giuliani, M., Bucci, T., Cardigliano, V., Metelli, A. M., Restelli, M., and Castelletti, A.: Impact-based drought detection via Interpretable Machine Learning and Causal Discovery, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13676, https://doi.org/10.5194/egusphere-egu26-13676, 2026.

17:35–17:45
|
EGU26-15130
|
ECS
|
On-site presentation
Guido Ascenso, Enrico Scoccimarro, and Andrea Castelletti

Tropical cyclones (TCs) are among the most destructive natural hazards worldwide. While several decades of satellite and reanalysis products now provide relatively large observational datasets of TCs, these datasets remain small by modern deep-learning standards and, crucially, are extremely imbalanced and do not sufficiently cover the tails of the distribution, with Category 5 cyclones being several orders of magnitude rarer than tropical storms. This severe data scarcity and imbalance poses fundamental limitations for supervised learning approaches to tasks such as intensity estimation, rapid intensification forecasting, or impact modeling, where performance on extremes is often the primary objective.

In this context, generative artificial intelligence offers a promising alternative. Diffusion models, in particular, have recently demonstrated state-of-the-art performance in modeling complex, high-dimensional data distributions. By learning the full probability distribution of TC-related fields rather than a single conditional mapping, diffusion models have the potential to generate physically plausible samples across the entire intensity spectrum, including rare but high-impact extremes. However, most existing applications of diffusion models—both within and outside the geosciences—are evaluated using perceptual or distributional metrics originally developed for natural images, such as visual inspection or feature-space distances. These metrics are poorly aligned with the physical constraints and scientific objectives that govern atmospheric phenomena, and may obscure important deficiencies in dynamical or thermodynamical realism.

Here, we present a diffusion-based generative framework for tropical cyclone spatial fields and propose a comprehensive evaluation strategy grounded in physically meaningful diagnostics. Rather than relying on perception-oriented scores, we assess generated samples using a suite of metrics designed to capture key aspects of TC structure and behavior, including radial symmetry, intensity–structure relationships, spatial gradients, and consistency with known climatological distributions across intensity classes. This allows us to directly interrogate whether the model reproduces physically coherent storm morphologies, particularly in the poorly sampled tails of the distribution. Beyond evaluation, we also explore multiple strategies for embedding physical realism directly into the model design. Together, these results highlight both the opportunities and the limitations of diffusion models as scientific tools for tropical cyclone research, and provide a framework for using generative AI not merely as a data-augmentation device, but as a principled instrument for studying rare and extreme atmospheric phenomena.

How to cite: Ascenso, G., Scoccimarro, E., and Castelletti, A.: Assessing physical realism in diffusion models for tropical cyclones, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-15130, https://doi.org/10.5194/egusphere-egu26-15130, 2026.

17:45–17:55
|
EGU26-13744
|
ECS
|
Virtual presentation
Tirtha Pani, Prathamesh Dinesh Joshi, Raj Abhijit Dandekar, Rajat Dandekar, and Sreedath Panat

Rapid climate scenario exploration remains constrained by a fundamental tension: General Circulation Models and Earth System Models provide comprehensive representations of atmosphere-ocean-carbon interactions but impose computational demands prohibitive for iterative policy evaluation, while Energy Balance Models offer tractability at significant cost to predictive fidelity. Conventional machine learning approaches, though computationally efficient, exhibit excessive data dependence and lack the mechanistic transparency essential for regulatory compliance and evidence-based climate policy. This methodological gap motivates our development of a scientific machine learning framework that augments coupled climate-carbon dynamics through Universal Differential Equations (UDEs), achieving simultaneous forecasting accuracy and interpretability for rapid scenario assessment.

We formulate a three-state coupled dynamical system governing surface temperature anomaly, deep ocean temperature anomaly, and atmospheric CO₂ concentration, incorporating radiative forcing, ocean-atmosphere heat exchange, and temperature-dependent carbon uptake feedback mechanisms. Our investigation proceeds through systematic experimental evaluation. First, we assess Neural Ordinary Differential Equations (Neural ODEs) as black-box dynamical system learners across three random initializations under 1% observational noise. Neural ODEs exhibit substantial forecasting errors—12.45% for surface temperature, 64.08% for ocean temperature, and 5.17% for CO₂ concentration at t=50 years—with progressive error amplification throughout the forecast horizon, demonstrating fundamental limitations in capturing climate dynamics without physical constraints.

Subsequently, we construct a UDE architecture that preserves known energy balance and carbon cycle physics while replacing the temperature-dependent carbon uptake term (βTC) with a neural network component. This hybrid formulation achieves forecasting errors below 0.2% across all climate variables for three distinct initializations, representing order-of-magnitude improvement over Neural ODEs while requiring 57.5% fewer training iterations. Comprehensive robustness analysis across six noise levels (1–25%) demonstrates exceptional stability, with percentage errors remaining below 0.74% up to 20% observational noise, degrading catastrophically only at the 25% threshold.

To ensure mechanistic transparency—critical for climate policy applications—we employ Sparse Identification of Nonlinear Dynamics (SINDy) for symbolic regression on learned neural network outputs. SINDy successfully recovers the correct functional form β·T·C across all noise regimes up to 20%, achieving 100% functional form recovery rate with average relative error of 25.22% at 1% noise. Performance metrics degrade systematically with increasing noise: R² decreases from 0.9985 (1% noise) to 0.7812 (20% noise), with complete interpretability breakdown at 25% noise (R²=0.4028). This characterizes operational bounds for symbolic recovery under realistic measurement uncertainty.

Comparative benchmarking against statistical baselines—Vector Autoregression (VAR) and Autoregressive Integrated Moving Average (ARIMA)—confirms UDE superiority in data-scarce regimes with known physical constraints. While VAR and ARIMA exhibit computational parsimony (21 and 10 parameters respectively versus 8,577 for UDE), they incur prediction errors exceeding 19% for temperature variables, rendering them unsuitable for high-fidelity forecasting. The UDE framework uniquely achieves the accuracy-efficiency-interpretability tradeoff essential for climate scenario exploration, enabling policymakers to evaluate interventions through mechanistically transparent simulations satisfying quantitative risk assessment requirements.

Our results establish that physics-informed machine learning enables accurate climate trajectory prediction while symbolic regression maintains interpretability, yielding a computationally efficient framework for rapid exploration of emission scenarios, carbon taxation policies, and adaptation strategies with explicit uncertainty quantification.

How to cite: Pani, T., Dinesh Joshi, P., Abhijit Dandekar, R., Dandekar, R., and Panat, S.: CLIMASIM — Climate Simulation with Scientific Machine Learning, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13744, https://doi.org/10.5194/egusphere-egu26-13744, 2026.

Posters on site: Mon, 4 May, 10:45–12:30 | Hall X5

The posters scheduled for on-site presentation are only visible in the poster hall in Vienna. If authors uploaded their presentation files, these files are linked from the abstracts below.
Display time: Mon, 4 May, 08:30–12:30
Chairpersons: Blanka Balogh, Katharina Hafner, Tom Beucler
X5.79
|
EGU26-13632
Gustau Camps-Valls, Roger Guimerà, Gherardo Varando, Emiliano Diaz, Kai-Hendrik Cohrs, and Marta Sales-Pardo

Reliable causal inference is a central challenge in Earth and climate sciences: observational records are limited, interventions are rare or impossible, and process representations in models rely on parametrizations that can introduce strong asymmetries between variables and the causal mechanisms [1,2]. Leveraging these asymmetries, rather than treating them as nuisances, can offer a principled route to causal discovery that is directly aligned with scientific modeling practice [2].

We address bivariate causal discovery from the standpoint of equation discovery using the Bayesian Machine Scientist (BMS) framework [3]. Our key contribution is to formalize the theoretical link between Symbolic Regression (SR) and Algorithmic Information Theory (AIT) via the Minimum Description Length (MDL) principle: the more plausible causal direction is the one that admits a shorter joint description in terms of a mechanism plus independent inputs [4]. Building on this connection, we characterize the mathematical properties of the resulting causal criterion, including identifiability and asymptotic consistency, and we analyze the role of core assumptions—most notably the Principle of Independent Causal Mechanisms (ICM)—in the context of geophysical data and climate-model parametrizations [5].

We demonstrate the approach on simulated benchmarks and on real Earth-system examples covering both i.i.d. settings and time-series climate data. The results illustrate when and why asymmetric parametrizations help disambiguate causal direction, and they provide a practical pathway to turn discovered governing equations into testable causal hypotheses for Earth and climate science.

References

[1] Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.

[2] Gustau Camps-Valls, Andreas Gerhardus, Urmi Ninad, Gherardo Varando, Georg Martius, Emili Balaguer-Ballester, Ricardo Vinuesa, Emiliano Diaz, Laure Zanna, and Jakob Runge. Discovering causal relations and equations from data. Physics Reports, 1044:1–68, 2023.

[3] Roger Guimera, Ignasi Reichardt, Antoni Aguilar-Mogas, Francesco A. Massucci, Manuel Miranda, Jordi Pallares y Marta Sales-Pardo. A Bayesian machine scientist to aid in the solution of challenging scientific problems. Science Advances, 6(5):eaav6971, 2020.

[4] Dominik Janzing, Joris Mooij, Kun Zhang, Jan Lemeire, Jakob Zscheischler, Povilas Daniūsis, Bastian Steudel und Bernhard Schölkopf. Information-geometric approach to inferring causal directions. Artificial Intelligence, 182:1–31, 2012.

[5] Sascha Xu, Sarah Mameche, and Jilles Vreeken. Information-theoretic causal discovery in topological order. In The 28th International Conference on Artificial Intelligence and Statistics, 2025.

How to cite: Camps-Valls, G., Guimerà, R., Varando, G., Diaz, E., Cohrs, K.-H., and Sales-Pardo, M.: Causal discovery from equation discovery, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13632, https://doi.org/10.5194/egusphere-egu26-13632, 2026.

X5.80
|
EGU26-4964
|
ECS
Shangshang Yang, Congyi Nai, Niklas Boers, Huiling Yuan, and Baoxiang Pan

Machine learning models have shown great success in predicting weather up to two weeks ahead, outperforming process-based benchmarks. However, existing approaches mostly focus on the prediction task, and do not incorporate the necessary data assimilation. Moreover, these models often suffer from long-term error accumulation, limiting their applicability to seasonal predictions and climate projections. Here, we introduce Generative Assimilation and Prediction (GAP), a unified deep generative framework for assimilation and prediction of both weather and climate. By learning to quantify the probabilistic distribution of atmospheric states under observational, predictive, and external forcing constraints, GAP excels in a broad range of weather-climate related tasks, including data assimilation, seamless prediction, and climate simulation. In particular, GAP delivers probabilistic weather forecasts competitive with state-of-the-art forecasting systems, while using its own assimilated initial states from a small fraction of observations. Also, it provides seasonal predictions with skill comparable to leading operational system. Finally, GAP produces stable millennial-scale climate simulations that capture variability from daily weather fluctuations to decadal oscillations.

How to cite: Yang, S., Nai, C., Boers, N., Yuan, H., and Pan, B.: GAP: a unified deep generative framework for emulating weather and climate, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-4964, https://doi.org/10.5194/egusphere-egu26-4964, 2026.

X5.81
|
EGU26-5928
|
ECS
Valerie Tsao, Marta Zaniolo, and Manolis Veveakis

A pressing problem exacerbated by climate change is the inability to prepare for extreme climate and weather events due to the limited historical record of observed extremes. While crucial for risk assessment and informed policy-making, a better representation of the distribution of "feasible" outcomes remains largely uncertain, with  predictions ranging at variously defined confidence levels that remain sensitive to the choice of metrics and physical assumptions. This question naturally lends itself to investigating how we can engender plausible realizations of extreme events, and thereby allow for mitigation efforts, before communities are forced to confront destructive realities. We present a time-conditioned generative framework based on a computer-vision-aided diffusion model trained on 1km $\times$ 1km precipitation fields and their trajectories over time. The output of this model is n future potential realizations of possible storm events that may unfold over the San Jacinto river basin in the south coast of Texas.

Beyond unconditional sampling, we introduce control variables that make generation decision-relevant: the model is trained to be conditional on a (duration, intensity) pair, enabling users to request ensembles spanning targeted severity regimes (e.g., short–extreme vs. long–moderate) while preserving realistic spatiotemporal structure. This yields a family of distributions over storm trajectories indexed by interpretable controls, allowing systematic stress testing of infrastructure and emergency-response plans under plausible but high-impact scenarios. 

We separate the evaluation of our approach into two complementary perspectives: (i) distribution matching for in-sample generations, and (ii) physics-based alignment with storm-based properties for out-of-sample generations. Spatiotemporal structure of storms is also benchmarked against strong baselines like the analog ensemble method, quantifying the performance of our model to realistically capture intense rainfalls. To extract evolving storm geometries, we employ a kNN-based (k-nearest neighbors) computer-vision algorithm that dynamically identifies storm shapes across time steps. Due to the probabilistic nature of diffusion models, more comprehensive envelopes of the storm intensity and trajectory can be obtained for uncertainty quantification purposes. 

Finally, we introduce a metric that jointly measures physical plausibility through features like intensity–duration structure and scaling, as well as novelty relative to the raw training data. This metric works by penalizing overfitting patterns while rewarding those that respect feasible dynamics, allowing us to define a principled way to compare generative models for extremes. Therefore, we can determine not only how realistic our generated storms are, but also how much physical diversity they contribute beyond the observed data. We present an open evaluation suite for controllable storm generation, including storm-tracking, intensity–duration diagnostics, and physical-novelty scoring.

How to cite: Tsao, V., Zaniolo, M., and Veveakis, M.: Synthetic Physics-Aware Storm Generation via Diffusion Models for Risk Analysis of Catastrophic Events, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5928, https://doi.org/10.5194/egusphere-egu26-5928, 2026.

X5.82
|
EGU26-6357
|
ECS
Yi-Yun Lu and Yuan-Chien Lin

Various pollutants pose significant threats to river ecosystems. This issue is particularly critical in Taiwan, where the unique geography of short, rapid rivers makes water retention difficult, necessitating rigorous water quality monitoring. Given the complex, non-linear correlations between water quality and meteorological parameters, this study investigates the impact of different feature selection techniques and predictive models on water quality forecasting for eight rivers in Taoyuan. We utilized 14 meteorological and water quality inputs to predict six key indicators, including COD, DO, EC, NH3-N, ORP, and SS. The methodology compared four feature selection strategies—Pearson Correlation, Entropy Weight Method (EWM), Combined Weights, and Mutual Information—alongside four forecasting models: Seq2Seq LSTM, ANFIS, MLP, and Transformer.The feature selection results reveal that the Entropy Weight Method yielded the highest precision (R^2 =0.9336), surpassing the Pearson method (R^2 =0.9161). This indicates that prioritizing features based on information entropy effectively minimizes information loss during screening. Regarding predictive modeling, the Transformer model demonstrated superior stability and accuracy. While other models fluctuated, the Transformer consistently achieved the best performance with an MSE of approximately 14.86 (RMSE=3.855) and an accuracy of 82.52%, significantly outperforming the MLP and ANFIS models. This study concludes that integrating entropy-based feature selection with the Transformer model establishes a superior and highly accurate framework for water quality forecasting in Taoyuan's rivers.

How to cite: Lu, Y.-Y. and Lin, Y.-C.: Integrated Analysis of Feature Selection and Deep Learning Models for Water Quality Forecasting Based on Meteorological Parameters, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-6357, https://doi.org/10.5194/egusphere-egu26-6357, 2026.

X5.83
|
EGU26-6363
|
ECS
Yen Chuang and Yuan-Chien Lin

Air pollution has emerged as one of the most critical environmental health hazards globally. According to statistics from the World Health Organization (WHO) and the Global Burden of Disease Study, approximately 7 million premature deaths occur annually due to air pollution. Fine particulate matter (PM2.5), capable of penetrating deep into the lungs and entering the bloodstream, has been confirmed to be highly correlated with ischemic heart disease, stroke, chronic obstructive pulmonary disease (COPD), and lung cancer. Given its serious threat to public health, establishing high-precision PM2.5 prediction models is critical for early warning systems and health protection.

Addressing the common issue of missing values in environmental monitoring data, this study proposes a data preprocessing framework that combines Principal Component Analysis (PCA) for feature dimensionality reduction with the GRU-D model for time-series imputation. Testing confirms that this method effectively reconstructs data features without causing excessive smoothing. In terms of predictive modeling, this study incorporates East Asian-scale atmospheric pressure field data as a key environmental variable to capture the impact of large-scale weather systems on local air pollution. The performance of three advanced deep learning models—LSTM+CNN, PatchTST, and iTransformer—is evaluated and compared.

The results indicate that, when considering multivariate factors and long- and short-term dependencies, the iTransformer model demonstrates superior predictive performance with an R2 of 0.91, exhibiting exceptional non-linear feature extraction capabilities. In comparison, both the LSTM+CNN and PatchTST models achieved an R2 of approximately 0.86. Based on the iTransformer's advantages in handling large-scale meteorological features and high-dimensional time-series data, this study employs it as the core model to further extend PM2.5 concentration predictions across Taiwan, aiming to provide a valuable scientific reference for regional air quality management.

How to cite: Chuang, Y. and Lin, Y.-C.: Spatiotemporal Prediction of PM2.5 in Taiwan Using iTransformer and Large-Scale Atmospheric Pressure Features, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-6363, https://doi.org/10.5194/egusphere-egu26-6363, 2026.

X5.84
|
EGU26-6364
|
ECS
Shih-Han Huang and Yuan-Chien Lin

In recent years, climate change has led to a clear increase in both the frequency and intensity of extreme weather events. Taiwan lies along major typhoon tracks in the western North Pacific, where typhoons represent one of the most significant natural hazards. The strong winds and heavy rainfall associated with typhoons frequently cause flooding, agricultural losses, and damage to critical infrastructure. In practice, however, the severity of typhoon-related disasters does not always correspond to traditional typhoon intensity classifications based primarily on central pressure and wind speed, indicating that wind-based classifications alone may not adequately represent actual disaster impacts.

This study utilizes hourly meteorological station observations to investigate the wind and rainfall characteristics of historical typhoon events in Taiwan. Multiple machine learning and regression models are applied, together with residual analysis, to quantify typhoon characteristics and construct a Typhoon Type Index (TTI). Based on the relative behavior of wind and rainfall during individual events, different typhoon types are further examined to identify their occurrence patterns and characteristic differences across historical cases.

The results indicate that the TTI derived from machine learning–based classification models can effectively improve upon previous TTI formulations established using regression models alone. Moreover, typhoons with different wind–rainfall characteristics are associated with distinct patterns of disaster impacts, and in some cases, rainfall intensity better reflects disaster severity than wind speed. By offering an alternative perspective to conventional intensity-based classifications, this study contributes to improved typhoon disaster risk assessment and provides useful insights for future disaster mitigation and preparedness strategies.

How to cite: Huang, S.-H. and Lin, Y.-C.: Improving the Typhoon Type Index by Integrating Strong Wind and Heavy Rainfall Using Machine Learning, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-6364, https://doi.org/10.5194/egusphere-egu26-6364, 2026.

X5.85
|
EGU26-8307
|
ECS
Maura Dewey, Laura Wilcox, Bjørn Samset, and Annica Ekman

We present a deep-kernel Gaussian process emulator (Deep-AeroGP) for predicting the climate response of surface temperature and precipitation to aerosol emission changes at high spatial and temporal resolution. Aerosols play a critical role in the climate system at both global and regional scales. Anthropogenic aerosol forcing has masked approximately 0.4 °C of global warming since the beginning of the industrial era1, and recent reductions in aerosol emissions have been linked to an acceleration of global mean temperature increase2. Because aerosol emissions are spatially heterogeneous and short-lived, changes in their magnitude and geographical distribution can drive pronounced regional and rapid climate responses, including shifts in precipitation patterns and monsoon intensity and timing3,4. Modelling these regional responses is critical for evaluating the climate consequences of air quality and environmental policy decisions; however, exploring a wide range of regional aerosol emission scenarios is computationally prohibitive with fully coupled Earth system models (ESMs). Machine-learning emulators enable the rapid exploration of large ensembles of emission scenarios, facilitating scenario development, and impact assessment. Deep-AeroGP, which builds on the recently published AeroGP5 , combines the flexibility of deep neural networks with the probabilistic framework of Gaussian processes, using a neural network as a feature extractor such that the kernel is learned from the data rather than fixed a priori. This approach allows the emulator to capture both large-scale and regional patterns of aerosol-driven climate variability while providing uncertainty estimates. We demonstrate the accuracy and usefulness of Deep-AeroGP in policy-relevant studies by investigating the nonlinearity of the climate response to multiple regional aerosol emission perturbations. 

 

1. Forster, P. & Storelvmo, T. The Earth’s energy budget, climate feedbacks, and climate sensitivity. In Working Group 1 contribution to the IPCC 6th Assessment Report (eds Masson-Delmotte, V. et al.) Ch. 7 (Cambridge University Press, 2021). 

2. Samset, B.H., Wilcox, L.J., Allen, R.J. et al.East Asian aerosol cleanup has likely contributed to the recent acceleration in global warming. Commun Earth Environ6, 543 (2025). https://doi.org/10.1038/s43247-025-02527-3 

3. López-Romero, J. M., Montávez, J. P., Jerez, S., Lorente-Plazas, R., Palacios-Peña, L., and Jiménez-Guerrero, P.: Precipitation response to aerosol–radiation and aerosol–cloud interactions in regional climate simulations over Europe, Atmos. Chem. Phys., 21, 415–430, https://doi.org/10.5194/acp-21-415-2021, 2021. 

4. Wilcox, L. J., Liu, Z., Samset, B. H., Hawkins, E., Lund, M. T., Nordling, K., Undorf, S., Bollasina, M., Ekman, A. M. L., Krishnan, S., Merikanto, J., and Turner, A. G.: Accelerated increases in global and Asian summer monsoon precipitation from future aerosol reductions, Atmos. Chem. Phys., 20, 11955–11977, https://doi.org/10.5194/acp-20-11955-2020, 2020. 

5. Dewey, M., Hansson, H.-C., Watson-Parris, D., Samset, B. H., Wilcox, L. J., Lewinschal, A., et al. (2025). AeroGP: Machine learning how aerosols impact regional climate. Journal of Geophysical Research: Machine Learning and Computation, 2, e2025JH000741. https://doi.org/10.1029/2025JH000741 

How to cite: Dewey, M., Wilcox, L., Samset, B., and Ekman, A.: Deep-AeroGP: deep kernel learning for projecting the regional climate response to anthropogenic aerosol emission changes , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8307, https://doi.org/10.5194/egusphere-egu26-8307, 2026.

X5.86
|
EGU26-9427
|
ECS
Maximilian Meindl, Miriam Kornblueh, Lukas Brunner, and Aiko Voigt

The emergence of global km-scale climate models challenges traditional model evaluation approaches, which typically rely on long climatological averages. The substantial computational costs and enormous data volumes associated with km-scale simulations often constrain simulation length, limiting the availability of long-term averages. As a result, conventional analysis methods become less practical and less informative when assessing short, high-frequency model output that is potentially dominated by internal variability. At the same time, recent advances in machine learning (ML), particularly in deep neural networks, offer new and innovative ways to efficiently extract information from large climate datasets. Building on this progress, we present an ML-based framework for evaluating climate models on a regional scale over short periods, focusing on daily near-surface air temperature fields over Europe.

We train a convolutional neural network (CNN) to distinguish spatial temperature fields from a large set of climate models. We employ 28 regional simulations from EURO-CORDEX and two global km-scale models from nextGEMS and Destination Earth. Beyond the classification based on climate model simulations, the pre-trained CNN is applied to observation-based test datasets. This setup allows us to build towards an evaluation metric, as the model, the observation-based datasets are more frequently assigned to, might be considered most similar to observed climate. Despite the regional focus of EURO-CORDEX, observation-based samples are most frequently classified as the global km-scale model IFS-FESOM. This suggests that this global km-scale model may capture regional temperature patterns more accurately than regional climate model simulations. Although our results are consistent with traditional metrics in identifying IFS-FESOM as the best-performing model, they also indicate that CNN-based evaluation provides additional information about the similarity between models and observations. 

To better understand which spatial features influence the CNN’s classification for observation-based samples, we apply explainable artificial intelligence (XAI) methods, specifically layerwise relevance propagation (LRP), to the classification outcomes. The resulting relevance patterns indicate that static features such as orography and coastlines, as well as relevance hotspots potentially linked to regions of dynamic variability, play a dominant role in the classification. This highlights that the CNN is sensitive to physically meaningful structures that define model-specific spatial fingerprints.

Using our ML-based framework, we show that a CNN can robustly distinguish between climate models on regional and short time scales as well as identify the model closest to observations. More broadly, we demonstrate that ML, combined with XAI, offers a scalable and physically interpretable approach for evaluating high-resolution climate models, thereby complementing established evaluation frameworks.

How to cite: Meindl, M., Kornblueh, M., Brunner, L., and Voigt, A.: Using Explainable AI to uncover physically meaningful features in km-scale climate models on a regional scale, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9427, https://doi.org/10.5194/egusphere-egu26-9427, 2026.

X5.87
|
EGU26-11185
|
ECS
Hugo Rougier, Bertrand Decharme, and Marc Mallet

Africa and South America together account for more than 70 % of the global burned area representing nearly 65 % of global fire-related carbon emissions (van der Werf et al., 2017). Beyond carbon release, wildfires emit large amounts of dust and aerosols that influence regional climate through radiative processes. More generally, wildfires strongly modify land surface properties, including vegetation composition, soil carbon stocks, or surface albedo, with far-reaching consequences for regional carbon, water, and energy cycles.

In the ISBA land surface model (Delire et al., 2020), burned area is currently parameterized using grid-cell surface characteristics, a fire-resistance coefficient, soil moisture, and available biomass. While computationally efficient, this simplified formulation may contribute to persistent regional biases in simulated fire activity. To overcome these limitations, we develop a data-driven fire modeling framework based on two artificial neural network architectures: one addressing a regression task and the other a classification task. The models use meteorological conditions, vegetation states, and anthropogenic factors to estimate the daily burned area fraction.

The proposed framework reproduces the spatiotemporal variability of burned areas with some fidelity. It is specially the case in important areas such as Africa, South America, and Australia. These results highlight the potential of deep learning approaches to enhance wildfire representation and prediction in Earth system models. That would be the very future of our research project.

How to cite: Rougier, H., Decharme, B., and Mallet, M.: Modeling of burned areas on a global scale using statistical learning methods, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11185, https://doi.org/10.5194/egusphere-egu26-11185, 2026.

X5.88
|
EGU26-11509
|
ECS
Hendrik Jansen, Muriel Racky, and Kira Rehfeld

Application of deep learning has proved useful in many scientific domains and has also gained increased interest as a tool for weather and climate modeling in recent years. Deep Learning weather models have already demonstrated competitive prediction performance to state-of-the-art methods while hybrid models and emulators have shown some promise for climate simulation. However, the realism of simulated climate variability, and climate modes of pure deep learning models trained only on observational or reanalysis data, has not received as much attention.
As one example of these models, we investigate DLESyM, an autoregressive deep learning model based on the U-Net architecture and originally trained on ERA5 reanalysis data from 1981 to 2017 (REF1). Unlike many weather-generating deep learning models, DLESyM does not draw on sea-surface temperatures as boundary conditions, but learns to generate ocean surface patterns. Its applications could, therefore, extend to free-running simulations. The original authors showed its ability to generate stable climate simulations for time-spans up to three millenia, with the absence of spurious drifts and unphysical smoothing in the annual cycle. Here we test how realistic the simulated climate variability of DLESyM is, focusing on interannual to centennial spatio-temporal modes of internal climate variability. We seek to identify whether it is able to generalize to the underlying physical processes of the climate system, or if it is only capable of reproducing spatio-temporal statistical patterns of its training data. We compare the unforced variability of the deep learning model to that in equilibrium simulations out of General Circulation Models out of the Coupled Model Intercomparison Project phase 6 (CMIP6 GCMs), and palaeoclimate reconstructions (REF2). We focus on regional and global power spectra of surface temperatures, and gradients between land and ocean, tropics and extratropics, as well as the high latitudes. To assess the model’s ability to generalize outside the distribution of the training data we perform simulations from varying initial conditions, and comparing them with the output of CMIP6 GCMs. Based on this we discuss potentials and limitations of such a purely data-driven model for climate simulations and future climate risk assessment, where characteristics beyond mean state and slow changes become relevant.

 

REF1 Cresswell-Clay, N., Liu, B., Durran, D. R., Liu, Z., Espinosa, Z. I., Moreno, R. A., & Karlbauer, M. (2025). A deep learning Earth system model for efficient simulation of the observed climate. AGU Advances, 6, e2025AV001706. https://doi.org/10.1029/2025AV001706

REF2 Laepple, T., Ziegler, E., Weitzel, N. et al. (2023) Regional but not global temperature variability underestimated by climate models at supradecadal timescales. Nat. Geosci., 16, 958–966. https://doi.org/10.1038/s41561-023-01299-9

How to cite: Jansen, H., Racky, M., and Rehfeld, K.: Testing the realism of interannual to centennial climate variability in a generative coupled atmosphere-ocean deep learning model, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11509, https://doi.org/10.5194/egusphere-egu26-11509, 2026.

X5.89
|
EGU26-12404
Mirta Rodriguez Pinilla, Marc Benitez Benavides, Eleftheria Exarchou, Tomas Margalef, and Javier Panadero

Wildfires pose a growing threat to populated areas of the Mediterranean basin. The hot and dry conditions caused by climate change have exacerbated the risk, extent, and severity of wildfires. The Barcelona Metropolitan Area, a large metropolis with an extended wildland-urban interface (WUI), is particularly vulnerable. 

Assessment of the impact of climate change on heat and droughts, and the cascading effects on future wildfire risk in WUI areas under different climate scenarios requires future projections of temperature and precipitation data. Current spatial resolution in standard climate projections is approximately 100km, insufficient to properly assess the spatial and temporal variability in heatwaves and drought conditions. Climate information at a much finer spatial scale is required to properly assess future climate risk at a metropolitan scale. 

To obtain km-scale future climate data we train a U-Net using two inputs: ERA5, and an elevation map (Copernicus DEM GLO-90), using as a target dataset the CHELSA Global reanalysis (https://www.chelsa-climate.org/)).  The U-Net neural network learns the relationship between coarser resolution predictors (from ERA5 at 0.25 deg, ~25 km) and the high-resolution  predicted variables (from CHELSA at 30", ~0.8 km) over the training domain. The trained U-Net is then used to infer the high-resolution surface variables (maximum and minimum daily air temperature and daily precipitation at 30”) from the coarser resolution CMIP6 future climate projections, bias corrected and statistically downscaled to 0.25 deg  (obtained from the Global Downscaled Projections for Climate Impacts Research dataset). 

We validate our results against meteorological stations in Catalonia during the historical period and find that biases and RMSE are smaller than the coarser-resolution climate data. Furthermore, the temporal trends of the downscaled climate data are preserved and identical to the original climate model trends.   

Our results demonstrate that the proposed methodology is robust to provide high-resolution heat and drought indicators. 

How to cite: Rodriguez Pinilla, M., Benitez Benavides, M., Exarchou, E., Margalef, T., and Panadero, J.: Deep Learning Downscaling of Precipitation and Temperature Climate Data for Future Wildfire Risk Assessment , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12404, https://doi.org/10.5194/egusphere-egu26-12404, 2026.

X5.90
|
EGU26-12407
|
ECS
Kevin Debeire, Veronika Eyring, and Niels Thuerey

Climate models typically operate at coarse spatial resolution (~100 km) due to computational constraints, yet many climate-change impact assessments require fine-scale information (<10 km). In this study, we systematically benchmark three state-of-the-art machine-learning approaches for statistical downscaling, using the storm-resolving ICON NextGEMS dataset as reference. All methods take coarse-resolution climate fields as input and generate physically plausible high-resolution predictions. We compare: (1) UNet, a deterministic encoder–decoder architecture; (2) CorrDiff, which augments the UNet backbone with a diffusion model to produce probabilistic ensembles; and (3) CorrDiff++, which replaces diffusion with flow-matching to improve sampling efficiency. We perform 10× downscaling (0.56° to 0.056°) over central Europe for six surface variables, including temperature, wind, and precipitation. The models are evaluated along multiple dimensions: deterministic accuracy (bias, correlation), probabilistic skill (ensemble reliability and sharpness), and physical realism (energy spectra, temporal coherence, representation of extremes). Our results highlight fundamental trade-offs between computational cost, physical consistency, and uncertainty quantification. These insights provide guidance on when the additional complexity of generative models is justified for climate science applications.

How to cite: Debeire, K., Eyring, V., and Thuerey, N.: Benchmarking Deterministic and Generative Machine Learning Models for Statistical Climate Downscaling over Europe, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12407, https://doi.org/10.5194/egusphere-egu26-12407, 2026.

X5.91
|
EGU26-12525
Tian Tian, Benjamin Richards, and David Docquier

The timing of the first ice-free Arctic summer is a key indicator of climate change, yet projections remain highly uncertain due to inter-model spread, internal variability, and systematic model biases. We develop a prototype framework that combines machine-learning-based methods with causal diagnostics to assess how different bias-correction and emulation approaches influence projections of the first year of ice-free Arctic conditions. Linear scaling is used as a statistical baseline to provide a transparent reference for evaluating more complex machine-learning-based approaches.

Building on recent analyses of the drivers of summer Arctic sea-ice extent at the interannual time scale, we analyse CMIP6 multi-model large ensembles to quantify relationships between September Arctic sea-ice extent and its dominant drivers, including preceding winter sea-ice volume, Arctic near-surface air temperature, and ocean heat transport. Machine-learning-based regression and emulation models are applied to refine model output, while causal diagnostics based on information flow are used to evaluate the physical consistency of inferred driver–response relationships.

We focus on two CMIP6 large ensembles with contrasting historical Arctic temperature biases over 1980–2014. Ensemble uncertainty is explored by partitioning ensemble members into bias-based subsets to assess the sensitivity of projected ice-free timing and inferred driver relationships. Results show that linear scaling shifts projected timing without altering causal structure, whereas machine-learning-based methods can modify ice-free year distributions and induce state-dependent changes in inferred causal relationships. These findings highlight the value of causal diagnostics for interpreting machine-learning-based climate projections and underscore the need for physically interpretable frameworks when applying data-driven methods to critical Arctic climate transitions.

How to cite: Tian, T., Richards, B., and Docquier, D.: Toward more reliable projections of an ice-free Arctic: Integrating machine learning and causal diagnostics in CMIP6 ensembles, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12525, https://doi.org/10.5194/egusphere-egu26-12525, 2026.

X5.92
|
EGU26-12592
|
ECS
Johannes Meuer, Maximilian Witte, Étiénne Plésiat, and Christopher Kadow

Probabilistic risk assessment requires large ensembles of high-resolution climate scenarios, yet generating such data is often computationally intractable. This study introduces a scalable generative framework designed to overcome the scarcity of high-fidelity climate data. We introduce the Field-Space Autoencoder, a geometric compression model that preserves the causal structure of atmospheric fields without forcing them onto regular lat-lon grids. Unlike standard deep learning approaches fixed to a single resolution, our method utilizes a multi-scale decomposition that stores a resolution-invariant latent representation. This flexibility unlocks a novel hybrid training strategy for generative diffusion: we combine the statistical robustness of multi-century, low-resolution simulations with the structural precision of limited high-resolution datasets. The resulting Compressed Field Diffusion model is capable of synthesizing atmospheric states that inherit the internal variability of the large ensemble and the spectral sharpness of the high-res ground truth. By bridging these data sources, we present a pathway to democratizing access to exascale-quality climate data through efficient, physically consistent emulation.

How to cite: Meuer, J., Witte, M., Plésiat, É., and Kadow, C.: Generative Emulation on the Sphere: Bridging the Resolution Gap with Field-Space Diffusion, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12592, https://doi.org/10.5194/egusphere-egu26-12592, 2026.

X5.93
|
EGU26-14198
Luca Schmidt, Pierre-Louis Lemaire, Nicole Ludwig, Alex Hernandez-Garcia, and David Rolnick

As climate change amplifies precipitation extremes and their societal and economic impacts, downscaling precipitation provides valuable local-scale information for risk assessment and adaptation planning.
However, deep-learning based statistical downscaling methods typically rely on high-resolution training data (e.g., radar observations), which are scarce and unevenly distributed globally, making geographic generalization a central challenge. Prior work shows large performance drops of deep-learning based downscaling models under geographic distribution shifts -- effects that remain even when considerably increasing the training data [1].
We view the geographic distribution shift as a form of subpopulation shift, where training and target samples are drawn from the same set of geographic domains but differ in their sampling frequencies. Consequently, the shift is driven primarily by changes in the prevalence of climatic regimes, rather than by changes in the conditional relationship between predictors and targets.
To improve robustness under cross-region transfer, we inject additional geographic context through Earth embeddings from geospatial foundation models (e.g., SatCLIP [2]). Potential strategies for integrating these embeddings into diffusion-based downscaling models include attention-based conditioning, feature modulation, and auxiliary conditioning networks.

[1] Harder, P., Schmidt, L., Pelletier, F., Ludwig, N., Chantry, M., Lessig, C., Hernandez-Garcia, A. and
Rolnick, D. [2025], ‘Rainshift: A benchmark for precipitation downscaling across geographies’, arXiv
preprint arXiv:2507.04930 .

[2] Klemmer, K., Rolf, E., Robinson, C., Mackey, L. and Rußwurm, M. [2025], Satclip: Global, general-
purpose location embeddings with satellite imagery, in ‘Proceedings of the AAAI Conference on Artificial
Intelligence’, Vol. 39, pp. 4347–4355.

How to cite: Schmidt, L., Lemaire, P.-L., Ludwig, N., Hernandez-Garcia, A., and Rolnick, D.: Leveraging Earth Embeddings for Generalizable Precipitation Downscaling Across Geographies, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14198, https://doi.org/10.5194/egusphere-egu26-14198, 2026.

X5.94
|
EGU26-15554
Bouchra Zellou, Fatiha Agdoud, and Hamza Ouatiki

Accurate forecasting of precipitation remains a central challenge in climate science, primarily due to the strong temporal and spatial variability of rainfall, a difficulty that is further intensified by the ongoing impacts of climate change. Recent developments in machine learning have facilitated the design of more accurate and robust predictive frameworks. In this context, the present study implements and evaluates three deep learning architectures; Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Extended Long Short-Term Memory (xLSTM); to forecast monthly precipitation at 27 meteorological stations distributed across Morocco, for lead times ranging from 1 to 4 months. The models are trained using a heterogeneous set of large-scale climatic predictors, including sea surface temperature (SST) over the Atlantic Ocean and the Mediterranean Sea, the East Atlantic pattern (EA), the Madden–Julian Oscillation (MJO), the El Niño–Southern Oscillation (ENSO), the Mediterranean Oscillation (MO), the North Atlantic Oscillation (NAO), and the Western Mediterranean Oscillation (WeMO). To identify the most influential predictors at each station, a principal component analysis (PCA)-based feature selection procedure is implemented. The results indicate that precipitation variability across the study area is predominantly controlled by the MO, NAO, and WeMO indices. Probabilistic forecasts are then generated using Monte Carlo dropout, enabling the networks to approximate Bayesian inference and thereby quantify predictive uncertainty and associated confidence intervals. Relative to conventional LSTM and GRU configurations, the xLSTM architecture exhibits superior predictive performance across all stations and lead times, with notably reduced uncertainty, particularly in the representation of extreme precipitation events. Overall, the models demonstrate robust skill in northern Morocco, with coefficients of determination (R²) ranging from 0.82 to 0.96 for a 1‑month lead time. However, predictive skill degrades toward the southern region, characterized by arid to semi-arid climatic conditions, where R² values decrease to 0.36–0.86. These results indicate that xLSTM effectively captures long-range temporal dependencies and low-frequency, high-intensity rainfall events, thereby representing a promising framework for improving probabilistic monthly precipitation forecasts in climatically heterogeneous regions such as Morocco.

How to cite: Zellou, B., Agdoud, F., and Ouatiki, H.: Probabilistic Monthly Precipitation Forecasting over Morocco Using xLSTM and Large-Scale Climate Predictors, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-15554, https://doi.org/10.5194/egusphere-egu26-15554, 2026.

X5.95
|
EGU26-16552
|
ECS
Hauke Schulz, Joel Oskarsson, and Leif Denby

Machine learning–based weather prediction models have recently surpassed traditional numerical weather prediction systems on many skill metrics at regional and global scales, yet there is limited progress towards models operating on hectometric-scale resolutions. This setting is challenging both due to the cost of generating high-quality training data and the complex dynamics of important small-scale processes.

We introduce a graph neural network with Large-Eddy Simulation (LES) capabilities, to operate at hectometer horizontal resolution and sub-hourly time steps. Using 42 days of high-resolution realistic model output for the trade-wind regime over the western Atlantic, we train and evaluate the network on its ability to reproduce key mesoscale processes, with particular emphasis on cold-pool dynamics and convective triggering.

Cold pools are a crucial driver of low-level thermodynamic variability and cloudiness, and thus provide a stringent physical consistency test for models targeting hectometer scales, as they require accurate coupling between the cloud layer and the surface. Through a targeted ablation study, we quantify the relative importance of different input variables for reproducing surface temperature perturbations associated with cold pools, offering guidance for future parameterization and data selection strategies.

Finally, we show that the model can deterministically predict the evolution of cold pools over multiple successive generations, indicating that graph-based LES emulators can robustly capture the nonlinear feedbacks governing mesoscale organization in shallow convective regimes.

How to cite: Schulz, H., Oskarsson, J., and Denby, L.: ML-LES: Modeling cold-pool dynamics with graph-based neural network at hecto-meter grid-spacings, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16552, https://doi.org/10.5194/egusphere-egu26-16552, 2026.

X5.96
|
EGU26-18374
|
ECS
Paul Borne--Pons, Alistair Francis, Mikolaj Czerkawski, Jacqueline Campbell, and Barbara Bertozzi

The majority of supervised machine learning pipelines, particularly in the popular domains of natural language processing and computer vision, rely on manually annotated data. In geoscience applications, however, reference data are not necessarily derived from human annotation but could come as the output of explicit physical models or algorithms. These algorithms typically rely on simplifying hypotheses about the underlying physical processes and may be computationally expensive or applicable only to a limited subset of observations. In such circumstances, machine learning can be used to emulate explicit algorithms, with the objective of reproducing their outputs while potentially exploiting wider information pathways present in the data.

Beyond computational considerations, this hypothesis-light, data-driven framework allows for counterfactual testing by selectively removing input information and evaluating the model’s ability to recover similar predictions. For instance, in computer vision, color information can be removed by averaging RGB channels, while semantic or contextual information can be limited by progressively reducing the input patch size or by exploiting the inductive biases of different neural network architectures. In this way, one can identify additional cues in the input data linked to the physical property of interest, but also assess whether the model reproduces biases inherent to the reference algorithm. 

We explore this approach for high-resolution cloud-top height (CTH) estimation within the Clouds Decoded project, which uses Sentinel-2 (S2) multispectral observations (originally intended for land monitoring) to retrieve cloud properties. CTH can be estimated from S2 imagery using a stereo-based method that leverages the instrument’s geometry and inter-band delays. While effective, this approach is computationally demanding and relies on assumptions that restrict its applicability across diverse cloud scenes. We assess whether a neural network can learn to approximate this stereo-based CTH retrieval and analyse which textural, spectral, high-level semantic, or even geolocation-related cues the model might use to infer cloud height.

How to cite: Borne--Pons, P., Francis, A., Czerkawski, M., Campbell, J., and Bertozzi, B.: Machine learning emulation of stereo-based cloud-top height retrieval from Sentinel-2 , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18374, https://doi.org/10.5194/egusphere-egu26-18374, 2026.

X5.97
|
EGU26-18613
|
ECS
Pascal Thiele, Katharina Baier, Kristofer Hasel, Theresa Schellander-Gorgas, Sebastian Lehner, Raphael Spiekermann, Jasmin Lampert, Annemarie Lexner, and Irene Schicker

Machine learning based dynamic numerical climate multi-model ensemble
weighting for high impact weather affecting the energy infrastructure
The infrastructure for renewable energy production and electrical grid itself is affected
by weather and climate conditions and vulnerable to high impact weather and
cascading events. A reliable representation of the meteorological conditions leading to
such events including their uncertainty is therefore needed for both weather and climate
time scales. Individual numerical weather and climate models exhibit systematic
strengths and weaknesses across scales, and geographic regions, despite the
differences in model physics and parametrizations. One way to tackle this and avoid
running single-model ensemble climate simulations are multi-model or poor-man
ensembles consisting of a set of different climate or weather models combined.
Ensemble mixing offers a way to mitigate these weaknesses while providing uncertainty
quantification. Simply ensemble averaging can dilute forecast and climate signals and
penalize outliers and rare extremes. Different approaches have been proposed to tackle
this problem by assigning non-uniform weights to individual model fields and
parameters, however, these methods often rely on domain knowledge such as model
dependencies [1,2].


Here, we propose a machine learning-based multi-model ensemble model-mixing
framework that is domain-agnostic and assigns spatially and temporally dynamic
weights, in addition to an error metric. The domain of interest is the Alps, which exhibit
challenging terrain and localized extreme events, e.g. precipitation extremes that are
difficult to capture in conventional climate models. The CERRA reanalysis data at ~5.5
km resolution serves as the target grid. We build a multi-model ensemble by combining
dynamically downscaled simulations of 2 m air temperature, precipitation, and wind
speed from COSMO-CLM (6 km) and WRF (10 km). Each regional model is driven by two
CMIP6 global climate models (MPI-ESM and EC-Earth) under two scenarios (SSP1-2.6
and SSP5-8.5), with an additional historical period used for training. Static information
such as orography and seasonal dependencies are considered. We evaluate the
ensemble’s performance on selected extreme events (e.g., heavy precipitation,
windstorms, heatwaves) that can (and did) harm energy infrastructure, such as the
European derecho 2022.


[1] Christensen, Jh, E Kjellström, F Giorgi, G Lenderink, and M Rummukainen. 2010.
‘Weight Assignment in Regional Climate Models’. Climate Research 44 (2–3): 179–94.
https://doi.org/10.3354/cr00916.


[2] Merrifield, Anna Louise, Lukas Brunner, Ruth Lorenz, Iselin Medhaug, and Reto
Knutti. 2020. ‘An Investigation of Weighting Schemes Suitable for Incorporating Large
Ensembles into Multi-Model Ensembles’. Earth System Dynamics 11 (3): 807–34.
https://doi.org/10.5194/esd-11-807-2020.

How to cite: Thiele, P., Baier, K., Hasel, K., Schellander-Gorgas, T., Lehner, S., Spiekermann, R., Lampert, J., Lexner, A., and Schicker, I.: Machine learning based dynamic numerical climate multi-model ensemble weighting for high impact weather affecting the energy infrastructure, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18613, https://doi.org/10.5194/egusphere-egu26-18613, 2026.

X5.98
|
EGU26-19617
|
ECS
Cristina Radin, Moritz Mathis, Hongmei Li, and Tatiana Ilyina

 Ocean physical and biogeochemical extremes, such as marine heatwaves (MHWs), deoxygenation, and acidification events have significant impacts on the marine environment, ecosystems, and economic livelihoods. In recent decades, the frequency, intensity and spatial extent of these extremes have been amplified (Capotondi et al., 2024; Shu et al., 2025; Gruber et al., 2021). Hence, a deeper understanding of the processes and precursors leading to extreme events remains crucial for improving and forecasting risk assessment.

In this study, we apply interpretable machine learning approaches to investigate which oceanic and atmospheric variables, as well as their lag effects, are most relevant for the extreme events in the North Atlantic, a relevant region for their occurrence in recent decades (England et al., 2025). Our framework combines high-resolution ocean model simulations with explainable artificial intelligence (XAI) techniques (He et al., 2024, Camps-Valls, 2025), allowing us to examine where, when, and which model variables are more important when identifying extreme events.

Rather than focusing on predictive skill, the emphasis of this study lies on identifying the underlying physics of precursor patterns leading to ocean extremes across different spatial and temporal scales. By integrating XAI into the analysis, this approach provides a more transparent and interpretable perspective on the decision-making processes of machine learning models, offering insights into the key variables and structures associated with the occurrence of ocean extremes. The outcomes of this study improve the interpretable assessment of potential precursors of MHWs, ocean deoxygenation and acidification extremes.

 

Camps-Valls, G., Fernández-Torres, M. Á., Cohrs, K. H., et al. (2025). Artificial intelligence for modeling and understanding extreme weather and climate events. Nature Communications, 16, 1919. https://doi.org/10.1038/s41467-025-56573-8

Capotondi, A., Rodrigues, R. R., Sen Gupta, A., et al. (2024). A global overview of marine heatwaves in a changing climate. Communications Earth & Environment, 5, 701. https://doi.org/10.1038/s43247-024-01806-9

England, M. H., Li, Z., Huguenin, M. F., et al. (2025). Drivers of the extreme North Atlantic marine heatwave during 2023. Nature, 642, 636–643. https://doi.org/10.1038/s41586-025-08903-5

Gruber, N., Boyd, P. W., Frölicher, T. L., et al. (2021). Biogeochemical extremes and compound events in the ocean. Nature, 600, 395–407. https://doi.org/10.1038/s41586-021-03981-7

He, Q., Zhu, Z., Zhao, D., Song, W., & Huang, D. (2024). An interpretable deep learning approach for detecting marine heatwave patterns. Applied Sciences, 14(2), 601. https://doi.org/10.3390/app14020601

Shu, R., Wu, H., Gao, Y., et al. (2025). Advanced forecasts of global extreme marine heatwaves through a physics-guided data-driven approach. Environmental Research Letters, 20(4). https://doi.org/10.1088/1748-9326/adbddd

How to cite: Radin, C., Mathis, M., Li, H., and Ilyina, T.: Explainable AI for Identifying Precursors of Extreme Oceanic Events in the North Atlantic, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19617, https://doi.org/10.5194/egusphere-egu26-19617, 2026.

X5.99
|
EGU26-19697
Alexandru Dumitrescu

High-resolution gridded climate datasets are essential for Earth system modelling and impact assessments, yet generating them from sparse, irregularly distributed station networks remains a significant challenge, particularly in regions with complex topography. This study evaluates the Spatial Multi-Attention Conditional Neural Process (SMACNP), a probabilistic deep learning framework, for the daily spatial interpolation of air temperature and precipitation, marking the first application of its localized encoder variant to the challenge of gridding climate data from a sparse station network. We investigate two distinct encoder configurations—Global and Localized—to determine the optimal structural prior for capturing spatial dependencies in data-scarce regimes. The models were developed and evaluated using data from a sparse network of meteorological stations in Romania from 2020 to 2023. To ensure applicability for long-term historical reconstruction, the input features were restricted to static topographic predictors derived from a Digital Elevation Model (DEM). Performance was benchmarked against Regression Kriging (RK), a standard geostatistical baseline that incorporates these same topographic covariates. Results demonstrate that the SMACNP architectures substantially outperform the RK baseline for both variables.

The SMACNP (Localized) configuration, which utilizes an attention mechanism, emerged as the most robust model, achieving the lowest Mean Absolute Error (MAE) and the highest correlation across the majority of seasons. The performance gains were particularly pronounced for precipitation, where the deep learning models effectively captured fine-scale spatial heterogeneity and non-linearities that traditional methods tended to over-smooth. These findings indicate that localized neural process-based models offer a powerful, scalable, and physically plausible alternative to geostatistical methods for generating high-quality gridded climate datasets in complex, data-sparse environments.

This research was supported by the project “Cross-sectoral Framework for Socio-Economic Resilience to Climate Change and Extreme Events in Europe (CROSSEU)” funded by the European Union Horizon Europe Programme, under Grant agreement n° 101081377.

How to cite: Dumitrescu, A.: A deep learning framework for gridding daily climate variables from a sparse station network, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19697, https://doi.org/10.5194/egusphere-egu26-19697, 2026.

X5.100
|
EGU26-19718
|
ECS
Vitor Miranda, Maria Castro, João Paixão, Ines Girão, Bruno Marques, Rune Magnus Koktvedgaard Zeitzen, Rita Cunha, Caio Fonteneles, Élio Pereira, Manvel Khudynian, Peter Thejll, Hjalte Jomo Danielsen Sørup, Quentin Paletta, and Ana Patrícia Oliveira

As climate change intensifies, urban areas are increasingly exposed to more frequent, severe and longer-lasting temperature extremes, particularly heatwaves. This growing thermal amplitude represents a major challenge for highly urbanised and ageing societies, with direct consequences for public health, energy systems and social equity. Cities are especially vulnerable due to the Urban Heat Island effect, whereby land cover characteristics, urban morphology and reduced vegetation cover amplify thermal stress. Despite this vulnerability, effective local adaptation remains constrained by the limited availability of high-resolution operational air temperature data, to support early warning systems, urban planning, and scenario-based assessments. 

CLIM4cities is a European Space Agency (ESA)-funded project under the Artificial Intelligence Trustworthy Applications for Climate programme that applies Machine Learning (ML) techniques to downscale near-surface air temperature (T2m) and land surface temperature (LST) in urban environments. By integrating numerical weather prediction outputs, Earth Observation data, and quality-controlled crowdsourced observations, CLIM4cities provides sub-kilometric urban temperature information tailored to local decision-making needs. The project constitutes a key step towards the development of cost-effective Urban Climate and Weather components that are interoperable with local Digital Twin systems. 

During its first phase, CLIM4cities developed and evaluated coupled ML-based downscaling models for T2m and LST across four Danish metropolitan areas (e.g. Aalborg, Arhus, Odense and Kobenhavn), demonstrating the feasibility and transferability of the proposed approach. For LST, Sentinel-3 thermal observations and vegetation-related predictors were employed within a scale-invariance downscaling approach, with independent validation using Landsat 8/9 data. Results show that while non-linear ML models can enhance predictive skill at coarser spatial scales, their performance at finer resolutions is limited by the breakdown of scale-invariance assumptions. Incorporating residual correction proved essential to recover fine-scale variability, whereas timestamp-specific linear models often outperformed more complex ML architectures. Model performance exhibits strong seasonal dependence, with the highest score achieved in summer (R² ≈ 0.75), when reduced cloudiness and drier conditions enhance the representation of urban thermal patterns.  

In contrast, T2m downscaling achieved its highest skill using comparatively simpler modelling approaches. Random Forest models consistently performed well across both spatial and temporal evaluation datasets, increased model complexity did not yield substantial gains. Model performance was assessed under average conditions as well as during heatwave and cold-wave events, complemented by sensitivity analyses of key hyperparameters. The results indicate an R² of 0.98 under average conditions, remaining stable during heatwaves and decreasing marginally to 0.97 during cold events. Mean absolute errors below 1K across all subsets confirm the robustness and operational suitability of the approach for monitoring urban-scale atmospheric temperature variability. 

Building on these results, the ongoing CLIM4cities project extension focuses on replicating and validating the T2m ML framework across additional European metropolitan regions spanning diverse climatic and urban contexts. Case studies include Copenhagen, Athens, Seville, and Lisbon, enabling a systematic evaluation of model behaviour across climate zones. 

How to cite: Miranda, V., Castro, M., Paixão, J., Girão, I., Marques, B., Magnus Koktvedgaard Zeitzen, R., Cunha, R., Fonteneles, C., Pereira, É., Khudynian, M., Thejll, P., Jomo Danielsen Sørup, H., Paletta, Q., and Oliveira, A. P.: CLIM4cities - from Citizen Science, Machine Learning and Earth Observation towards Urban Climate Services , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19718, https://doi.org/10.5194/egusphere-egu26-19718, 2026.

X5.101
|
EGU26-19946
|
ECS
Alex Marshall, Chris Lucas, Nans Addor, Natalie Lord, Jorge Sebastian Moraga, Jannis Hoch, and Oliver Wing

The accurate assessment of extreme flood events and their associated losses requires massive sample sizes (e.g., 50,000+ years of weather data) for statistical robustness and a comprehensive coverage of event characteristics. Generating such a large dataset using dynamical Earth System Models would be extremely computationally intensive, so instead, we propose a lightweight and computationally efficient climate emulator built upon a video diffusion architecture. 

The model is trained to reproduce the statistical properties and physical dynamics of the Community Earth System Model version 2 (CESM2) over Europe. It operates autoregressively to generate synthetic, multivariate, daily atmospheric data (including temperature, specific humidity, wind vectors, and surface pressure) at ~100 km resolution. The model utilizes a U-Net architecture that is conditioned on previous time-steps to produce and evolve weather patterns with spatial and temporal consistency. To enhance the stability of long-term generation and improve the faithful reproduction of extremes, we employ a seasonality-aware standardization scheme, training the model to learn the dynamics in anomaly space rather than physical space.

We demonstrate that this approach successfully reproduces the complex spatiotemporal dependencies within CESM2, captures atmospheric dynamics, including the frequency and persistence of dominant circulation types, and can maintain stability over multi-decadal generation windows. Furthermore, the output of this emulator can be fed into existing downscaling models to produce higher resolution multivariate meteorological data fields to drive downstream impact models. We validate this full modeling chain by demonstrating that the resulting hydrological statistics exhibit physical characteristics consistent with the CESM2-driven benchmark.

This computationally efficient generative model offers a pathway to generating thousands of years of physically consistent flood events. 

How to cite: Marshall, A., Lucas, C., Addor, N., Lord, N., Moraga, J. S., Hoch, J., and Wing, O.: Fast emulation of climate models for precipitation and flood impact modelling using autoregressive video diffusion, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19946, https://doi.org/10.5194/egusphere-egu26-19946, 2026.

X5.102
|
EGU26-20756
Sébastien Tétaud and Jean Marc Delouis

Remote sensing datasets for land cover classification are mostly distributed in UTM projection which introduce significant geometric distortions—particularly at high latitudes—and fail to respect the spherical geometry of Earth. These distortions propagate into deep learning models trained on such data, leading to latitude-dependent biases, edge artifacts in tile-based processing, and poor generalization across geographic boundaries. While convolutional neural networks (CNNs) have achieved state-of-the-art performance on benchmark datasets like BigEarthNet, they operate on Euclidean grids and cannot naturally handle the structure of a sphere.

Here we introduce a comprehensive pipeline for transforming the BigEarthNet dataset—comprising 549,488 multispectral image patches from its original UTM projection into the HEALPix (Hierarchical Equal Area isoLatitude Pixelization) representation. HEALPix, originally developed for cosmic microwave background analysis, offers equal-area partitioning of the sphere, ensuring uniform statistical treatment of pixels regardless of latitude, and provides a natural hierarchical structure for multi-resolution analysis.

We implement and evaluate spherical CNNs architectures designed for data on spherical manifolds—against traditional planar CNN baselines (Unet/Resnet50) trained on the HEALPix-transformed data, benchmarking classification performance for multi-label land cover prediction using the 19-class BigEarthNet nomenclature with metrics suited to imbalanced settings (F1-macro/micro, precision, recall, average precision).

This work represents the first large-scale application of HEALPix projection to Remote Sensing classification and validates the effectiveness of spherical deep learning for real-world remote sensing beyond traditional climate science domains. Our experimental design employs matched training protocols and comparable model capacities, demonstrating that spherical representations eliminate projection-induced artifacts, enable seamless cross-boundary analysis, and provide rotation equivariance that reduces the need for extensive spatial data augmentation—key advantages for global-scale Earth observation applications.

How to cite: Tétaud, S. and Delouis, J. M.: BigEarthNet-HEALPix: Spherical CNNs for Land Cover Classificatiom, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-20756, https://doi.org/10.5194/egusphere-egu26-20756, 2026.

X5.103
|
EGU26-4507
|
ECS
Shivanshi Asthana, Erwan Koch, Sven Kotlarski, and Tom beucler

Regional Climate Models (RCMs) are vital for capturing mesoscale variability, however remain too coarse for impact assessments in complex topographies like Switzerland. In this study, we bridge the "km-scale gap" by introducing a generative super resolution pipeline to downscale EURO-CORDEX ensemble to a 1 km grid over Switzerland.

We establish the added value of a deterministic residual U-Net, pixel-based as well as generative residual Latent Diffusion over operational baselines and conventional bias correction (BC) methods such as Cumulative Distribution Function - transform (CDF-t), Empirical Quantile Mapping (EQM) and dynamical Optimal Transport Correction (dOTC). Our results demonstrate that super resolved fields have superior distributional skill, better visual fidelity of fields, shows improved  trend preservation and representation of interannual variability across diverse biogeographical regions  and major population centres such as Bern, Zurich and Locarno. Further, as demonstrated by a marked reduction in bias for  20-, 50-, and 100-year return levels of multi-day precipitation totals, super resolution (SR) also complements BC for improved representation of extremes in our km-scale downscaled EUROCORDEX. Our findings establish that while BC methods remain essential for distributional fidelity, residual generative models offer a potent, actionable pathway for producing high-resolution climate information from coarse climate fields.

How to cite: Asthana, S., Koch, E., Kotlarski, S., and beucler, T.: Next-Generation Climate Projections: Insights from Blending Bias Correction with Super Resolution over Complex Terrain, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-4507, https://doi.org/10.5194/egusphere-egu26-4507, 2026.

X5.104
|
EGU26-13326
|
ECS
Nathan Mankovich, Andrei Gavrilov, Feini Huang, Gustau Camps-Valls, Fangfei Lan, and Alejandro Bodas-Salcedo

Cloud feedback is one of the key sources of uncertainty in the sensitivity of climate projections to anthropogenic forcing in Earth system models (ESMs). Improving its representation remains challenging because clouds sit at the intersection of radiation, dynamics, and microphysics, and small errors in any of these can strongly affect climate sensitivity.. Consequently, analysing and understanding errors in simulated cloud feedback, evaluated against observations, is essential for advancing cloud parameterizations in ESMs.

In this work, we explore methodological frameworks for evaluating cloud feedback in climate models that move beyond simple model–observation comparisons toward physically interpretable insights into model properties and dynamics. We propose two advances: (1) improved cloud regime identification by extending standard k-means clustering to Wasserstein k-means, and (2) the use of explainable machine-learning methods to evaluate the extent the ESMs capture the realistic sensitivity between the cloud radiative anomalies and key cloud-controlling factors. We demonstrate these approaches by evaluating different versions of the HadGEM model in AMIP experiments against observations, illustrating their potential to support more physically grounded diagnosis of cloud-feedback behaviour in climate models.

How to cite: Mankovich, N., Gavrilov, A., Huang, F., Camps-Valls, G., Lan, F., and Bodas-Salcedo, A.: Explainable Cloud Feedback Evaluation in Earth System Models, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13326, https://doi.org/10.5194/egusphere-egu26-13326, 2026.

X5.105
|
EGU26-2500
Kalle Nordling

Exploring uncertainty and internal variability across future emission pathways remains computationally demanding with state-of-the-art Earth system models (ESMs). We present a diffusion-based machine-learning emulator trained on output from the CESM2 large ensemble dataset to reproduce absolute annual-mean temperature and year to year variability,  conditioned on anthropogenic co2 and sulfate emisisson from ssp3-7.0 scenario. The emulator employs a three-dimensional UNet architecture that learns the spatiotemporal distribution of global temperature fields in latitude–longitude–time space. Conditioning variables include cumulative CO₂ and aerosol emissions, enabling the generation of physically consistent climate responses under arbitrary emission trajectories.To enhance physical interpretability, we integrate explainable AI (XAI) methods, including gradient-based attribution and sensitivity analyses, to quantify how emission-related conditioning variables influence regional temperature responses. The emulator reduces computational cost by several orders of magnitude compared to full ESM simulations, enabling rapid scenario exploration and uncertainty assessment. This framework aims provides a scalable and interpretable pathway for fast climate response emulation

How to cite: Nordling, K.: Emulating absolute annual temperatures and variability  from the CESM2 Large Ensemble using a diffusion model, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-2500, https://doi.org/10.5194/egusphere-egu26-2500, 2026.

X5.106
|
EGU26-10055
|
ECS
Homer Durand, Gherardo Varando, and Gustau Camps-Valls

Statistical causality methods are becoming increasingly widespread in climate teleconnection analysis, but they typically require a prior reduction of high-dimensional, multivariate climate fields. Most common aggregation techniques, such as spatial averaging or Principal Component Analysis (PCA) (largely known as Empirical Orthogonal Functions, EOF, in the climate community) [1], are not designed to preserve causal structure and can mask spatially complex or low-variance causal signals.

We introduce Granger PCA [2], a novel dimensionality reduction method that explicitly extracts components that are influenced by a causal driver. Instead of maximizing variance, Granger PCA identifies spatial patterns whose associated time series are maximally Granger caused by an external variable, such as a large-scale climate mode. This is achieved by optimizing spatial weights to maximize the Granger causality F-statistic and yields a low-dimensional representation that captures the Granger causal information present in the field.

The method is particularly effective in cases where causal effects are spatially heterogeneous, have low variance, or are hidden by strong local autocorrelation. In such cases, variance-based methods can fail even when robust causal influence exists.

We apply Granger PCA to several teleconnection problems, including the influence of the North Atlantic Oscillation on precipitation and the impact of ENSO on vegetation variability. Granger PCA recovers physically interpretable patterns that are not captured by PCA or correlation-based approaches.

In summary, Granger PCA provides a simple and interpretable framework for causally oriented dimensionality reduction and offers a new tool for teleconnection analysis in climate science.

References

  • [1] A. Hannachi, I. T. Jolliffe, D. B. Stephenson et al., “Empirical orthogonal functions and related techniques in atmospheric science: A review,” International Journal of Climatology, vol. 27, no. 9, pp. 1119–1152, 2007.
  • [2] G. Varando, M.-Á. Fernández-Torres, J. Muñoz-Marí, and G. Camps-Valls, “Learning causal representations with Granger PCA,” in UAI 2022 Workshop on Causal Representation Learning, 2022.

How to cite: Durand, H., Varando, G., and Camps-Valls, G.: Granger PCA: Extracting Granger-causal patterns in climate fields, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-10055, https://doi.org/10.5194/egusphere-egu26-10055, 2026.

X5.107
|
EGU26-4514
Sandip Dhomse and Martyn Chipperfield

Understanding long-term trends in stratospheric species is vital for evaluating the success of the Montreal Protocol and its amendments. However, reliable trend estimation remains challenging due to the sparse spatial and temporal coverage of high-quality observations, such as those from the Atmospheric Chemistry Experiment–Fourier Transform Spectrometer (ACE-FTS).

To overcome this limitation, we present an innovative machine learning framework that fuses ACE-FTS observations with the continuous output of the TOMCAT global Chemical Transport Model (CTM). Using XGBoost regression, we constrain TOMCAT tracers against co-located ACE-FTS measurements, generating the TCOM (TOMCAT CTM and occultation-measurement-based) stratospheric profile datasets for key species: CFC-11, CFC-12, HCl, HF, HNO3, O3, CH4, N2O, and H2O.

The latest TCOM release (version 2.0) provides gap-free, global daily vertical profiles from 2000 to 2024. Validation demonstrates substantial improvements over TOMCAT, including the removal of systematic low biases in simulated CFC concentrations. Interpretable machine learning analysis reveals that XGBoost primarily acts as a “transport corrector,” with dynamical features such as Age-of-Air, temperature, and long-lived tracers exerting the greatest influence. This finding highlights that circulation biases dominate TOMCAT’s baseline errors.

TCOM datasets are publicly available and offer an observationally constrained benchmark for refining chemical models, improving stratospheric transport representation, and reducing uncertainties in ozone-depleting substance (ODS) trend analyses.

Dataset links:

  • CFC-11 v2: https://doi.org/10.5281/zenodo.18145730
  • CFC-12 v2: https://doi.org/10.5281/zenodo.18147392
  • CH4 v2: https://doi.org/10.5281/zenodo.18197333
  • N2O v2: https://doi.org/10.5281/zenodo.18197444
  • HCl v2: https://doi.org/10.5281/zenodo.18184430
  • HF v2: https://doi.org/10.5281/zenodo.18184779
  • HNO3 v2: https://doi.org/10.5281/zenodo.18199002
  • O3 v2: https://doi.org/10.5281/zenodo.18199586
  • H2O v2: https://doi.org/10.5281/zenodo.18199962
  • COF2 v2: https://doi.org/10.5281/zenodo.18201786

How to cite: Dhomse, S. and Chipperfield, M.:   Machine Learning For Atmospheric Chemistry: Creating Global, Gap-Free Stratospheric Datasets for Montreal Protocol Assessments, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-4514, https://doi.org/10.5194/egusphere-egu26-4514, 2026.

X5.108
|
EGU26-821
|
ECS
Rosa Lyngwa, Akshaya Nikumbh, and Subimal Ghosh

Generating high-resolution (HR) weather and climate information at ~10 km or finer across the Himalayan regions remains a major challenge due to extremely high computational cost of forecasting models and complexity of atmospheric processes. Most operational global weather prediction systems run at low-resolution (LR) of ~25 km or coarser, that are inadequate for impact-based analyses of highly localized extreme weather events common to these regions. To bridge this gap, downscaling is essential for producing climate information at impact-relevant scales, with both statistical and dynamical approaches remaining widely used despite major shortcomings. The former is computationally efficient but often fail under future climate non-stationarity, while the latter, though physically consistent, is computationally expensive and constrained by domain-resolution trade-offs. Currently, there is no efficient data-driven approach that can produce regional-model-scale precipitation fields for the Himalayan region. This work presents WGAN, a deterministic deep neural generative adversarial network (GAN)-based emulator of the Weather Research and Forecasting (WRF) model for HR precipitation downscaling over the Himalayan region. The model is conditioned on LR meteorological variables from the European Centre for Medium-Range Weather Forecasts Re-Analysis version 5 (ERA5; 0.25°×0.25°) as input and is trained against HR precipitation from WRF (0.1°×0.1°), which uses ERA5 as boundary conditions. The architecture uses Wasserstein-1 distance (WGAN) in the generator and critic value functions with a gradient penalty for stable training. WGAN demonstrated the ability to generate fine-scale precipitation fields that closely matches WRF’s outputs, accurately capturing spatial patterns and the mean values. Incorporating terrain and an extreme aware-weighting MSE (Mean Squared Error) loss function in the model further improves precipitation magnitude representation, reduces biases, and yield ~29% reduction in RMSE in the upper decile. The model effectively captured low-frequency (large-scale) variability and better matches WRF’s power spectrum at mid-high frequency (short-scale) variability. This raises the probability of detection and lowers the false alarm rate across thresholds. With a case study, WGAN showed the ability to capture the fine-scale spatial distribution of precipitation in the mountains and foothills, at both extreme precipitation day and dry conditions, outperforming CNN-based precipitation output. These results underscore the capability of WGAN as a fast and efficient tool for precipitation downscaling for the Himalayan region, operating at only a fraction of the computational cost. The model has strong potential for operational use in early warning, risk assessment, vulnerability analysis, disaster management, and other sectors that rely on localized climate information, ultimately supporting the preparedness of communities living in and around these mountains.

How to cite: Lyngwa, R., Nikumbh, A., and Ghosh, S.: A Generative-driven Model for Precipitation Downscaling Over Himalayan Region, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-821, https://doi.org/10.5194/egusphere-egu26-821, 2026.

X5.109
|
EGU26-7678
|
ECS
Andrei Gavrilov, Nathan Mankovich, Moritz Link, Feini Huang, and Gustau Camps-Valls

Earth system model (ESM) intercomparison is essential for assessing model performance and identifying future challenges in climate modeling. The Taylor diagram [1] is one of the most widely used tools for this purpose, as it provides an intuitive summary of standard evaluation metrics — such as correlation, root-mean-square error, and standard deviation — by comparing multiple simulated datasets against a reference, typically observations or a ground truth, within a single plot.

However, in several relevant applications, including the development of new ESM parameterizations, the comparison of conceptual models, or the evaluation of simulated statistical distributions, classic linear correlation and RMSE metrics may be insufficient. Here, we propose a set of extensions to the Taylor diagram based on a generalization of cross-covariance using kernels, allowing both nonlinear relationships and distributional aspects of similarity to be taken into account. Nonlinear similarity is characterized through a kernel-space analogue of rotational alignment, while distributional similarity can be quantified using metrics such as maximum mean discrepancy, as originally introduced in [2], as well as alternative kernel-based measures. Using controlled synthetic experiments, we show that the proposed kernel Taylor diagrams can resolve differences in model skill that remain indistinguishable under the classical Taylor diagram. These results indicate that the kernel-based extensions provide complementary diagnostic information to standard metrics and can support more informative Earth system model evaluation and development.

[1] Taylor, K. E. (2001), Summarizing multiple aspects of model performance in a single diagram, J. Geophys. Res., 106(D7), 7183–7192, doi:10.1029/2000JD900719.

[2] Wickstrøm, K., Johnson, J. E., Løkse, S., Camps-Valls, G., Mikalsen, K. Ø., Kampffmeyer, M., & Jenssen, R. (2022). The Kernelized Taylor Diagram. doi:10.48550/arXiv.2205.08864

How to cite: Gavrilov, A., Mankovich, N., Link, M., Huang, F., and Camps-Valls, G.: Kernel Taylor Diagram for Earth System Model Evaluation, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7678, https://doi.org/10.5194/egusphere-egu26-7678, 2026.

Please check your login data.