AS5.1 | AI for Weather and Climate: Opportunities and Challenges
AI for Weather and Climate: Opportunities and Challenges
Co-organized by CL5/ESSI1/NP5
Convener: Sebastian Engelke | Co-conveners: Erich Fischer, Pedram HassanzadehECSECS, Tim WhittakerECSECS
Orals
| Thu, 07 May, 10:45–12:30 (CEST)
 
Room M2
Posters on site
| Attendance Thu, 07 May, 08:30–10:15 (CEST) | Display Thu, 07 May, 08:30–12:30
 
Hall X5
Posters virtual
| Wed, 06 May, 14:12–15:45 (CEST)
 
vPoster spot 5, Wed, 06 May, 16:15–18:00 (CEST)
 
vPoster Discussion
Orals |
Thu, 10:45
Thu, 08:30
Wed, 14:12
The recent revolution of data-driven forecasting systems based on artificial intelligence (AI) has opened new research possibilities in weather forecasting, climate science, and various other areas. At the same time, many open questions remain–such as how to properly evaluate the model outputs in terms of generalizability under climate change, whether models extrapolate to unseen extremes, and to what extent they are consistent with physical principles. This session focuses on new scientific approaches emerging from this AI revolution, limitations of current models, and strategies to overcome them. We encourage submissions that explore a wide range of topics, including evaluations of outputs and comparisons to numerical models, technical advancements in initial condition optimization or model fine-tuning, novel techniques from explainable AI, and other relevant studies. Bringing together experts from AI, climate sciences, statistics, and applied math will foster interdisciplinary collaborations and guide scientific progress in this quickly evolving field of research.

Orals: Thu, 7 May, 10:45–12:30 | Room M2

The oral presentations are given in a hybrid format supported by a Zoom meeting featuring on-site and virtual presentations. The button to access the Zoom meeting appears just before the time block starts.
Chairpersons: Sebastian Engelke, Tim Whittaker
10:45–10:50
10:50–11:00
|
EGU26-8870
|
On-site presentation
Gregory Hakim

Recently developed AI weather models have been widely recognized for revolutionizing weather prediction, producing forecasts more skillful than traditional models at a fraction of the computational cost. Here I will argue that the next phase of the revolution involves the adjoints of these models, applied to a wide range of problems, including novel exploration of dynamical process in weather and climate variability, extreme events, and new data assimilation systems. Adjoints are derived from gradient operations on the forward model, and are useful for measuring the sensitivity of model outputs to inputs and parameters. Historically adjoints have been derived for a limited set of traditional models, and mainly applied to problems in data assimilation. The ubiquitous availability of adjoints for AI models makes these tools easily accessible and available for a much wider range of applications. Specific examples I will discuss include shadowing trajectories for predictability, "gray swans" and a factory for out-of-sample extreme events, and mechanistic interpretability of specific phenomena.

How to cite: Hakim, G.: Using Adjoints of AI-based Weather Models to Study Predictability and Extreme Events, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8870, https://doi.org/10.5194/egusphere-egu26-8870, 2026.

11:00–11:10
|
EGU26-17600
|
On-site presentation
Freddy Bouchet, Dorian Abbot, Laurent Dubus, Pedram Hassanzadeh, Amaury Lancelin, Jonathan Weare, Peter Werner, and Alexander Wikner

In the climate system, extreme events and tipping points (transitions between climate attractors) are of primary importance for understanding the impacts of climate change and for designing effective adaptation and mitigation strategies. Recent extreme heat waves with severe societal consequences, as well as prolonged periods of very low renewable energy production in electricity systems, are striking examples. A key challenge in studying such phenomena is the lack of available data: these events are inherently rare, and realistic climate models are computationally expensive and highly complex. This data scarcity severely limits the applicability of traditional approaches, whether based on modelling, physics, or statistical analysis.

In this talk, I will present new algorithms and theoretical approaches based on rare-event simulations, climate-model emulators, machine-learning methods for stochastic processes, and up to date blend of data and model use to estimate generalized extreme value (GEV) distribution. These methods are specifically designed to predict the probability that an extremely rare event will occur, to produce huge catalogues of dynamical trajectories leading to the event, and to use the best available historical and model data. The rare event simulation/emulator approach combines, on the one hand, state-of-the-art AI-based emulators that reproduce the full atmospheric dynamics of climate models, and, on the other hand, rare-event simulation techniques that reduce by several orders of magnitude the computational cost of sampling extremely rare events. In parallel the Bayesian GEV approach mix information from historical observation and CMIP model output to produce the best possible estimate of extreme event probabilities.

To illustrate the performance of these tools, I will present results on midlatitude extreme heat waves and on extremes of renewable energy production, with a particular focus on their implications for the resilience of electricity systems.

How to cite: Bouchet, F., Abbot, D., Dubus, L., Hassanzadeh, P., Lancelin, A., Weare, J., Werner, P., and Wikner, A.: Rare event simulations, emulators, machine learning, and Bayesian GEV estimation, for predicting extreme heat waves and extremes of renewable electricity production, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-17600, https://doi.org/10.5194/egusphere-egu26-17600, 2026.

11:10–11:20
|
EGU26-2971
|
On-site presentation
Sebastian Lerch

Artificial intelligence (AI)-based data-driven weather prediction (AIWP) models have experienced rapid progress over the last years. They achieve impressive results and demonstrate substantial improvements over state-of-the-art physics-based numerical weather prediction (NWP) models across a range of variables and evaluation metrics. However, most efforts in data-driven weather forecasting have been limited to deterministic, point-valued predictions, making it impossible to quantify forecast uncertainties, which is crucial in research and for optimal decision making in applications.

I will present recent work on uncertainty quantification (UQ) methods in the context of data-driven weather prediction. The post-hoc use of UQ methods enables the generation of skillful probabilistic weather forecasts from a state-of-the-art deterministic AIWP model [1]. Further, by subjecting the deterministic backbone of physics-based and data-driven models post hoc to the same UQ technique, and computing the in-sample mean continuous ranked probability score of the resulting forecast, we propose a new measure that enables fair and meaningful comparisons of single-valued output from AIWP and NWP models, called potential continuous ranked probability score [2].

References

[1] Bülte, C., Horat, N., Quinting, J. and Lerch, S. (2025). Uncertainty quantification for data-driven weather models. Artificial Intelligence for the Earth System, in press. DOI:10.1175/AIES-D-24-0049.1

[2] Gneiting, T., Biegert, T., Kraus, K., Walz, E.-M., Jordan, A. I., and Lerch, S. (2025). Probabilistic measures afford fair comparisons of AIWP and NWP model output. Preprint, arXiv:2506.03744. DOI:10.48550/arXiv.2506.03744

How to cite: Lerch, S.: Uncertainty quantification for data-driven weather prediction: From probabilistic forecasts to fair model comparisons, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-2971, https://doi.org/10.5194/egusphere-egu26-2971, 2026.

11:20–11:30
|
EGU26-4301
|
Highlight
|
On-site presentation
Zhongwei Zhang, Erich Fischer, Jakob Zscheischler, and Sebastian Engelke

Artificial intelligence (AI)-based models are revolutionizing weather forecasting and have surpassed leading numerical weather prediction systems on various benchmark tasks. However, their ability to extrapolate and reliably forecast unprecedented extreme events remains unclear. Here, we show that for record-breaking weather extremes, the numerical model High RESolution forecast (HRES) from the European Centre for Medium-Range Weather Forecasts still consistently outperforms state-of-the-art AI models GraphCast, GraphCast operational, Pangu-Weather, Pangu-Weather operational, and Fuxi. We demonstrate that forecast errors in AI models are consistently larger for record-breaking heat, cold, and wind than in HRES across nearly all lead times. We further find that the examined AI models tend to underestimate both the frequency and intensity of record-breaking events, and they underpredict hot records and overestimate cold records with growing errors for larger record exceedance. Our findings underscore the current limitations of AI weather models in extrapolating beyond their training domain and in forecasting the potentially most impactful record-breaking weather events that are particularly frequent in a rapidly warming climate. Further rigorous verification and model development is needed before these models can be solely relied upon for high-stakes applications such as early warning systems and disaster management.

How to cite: Zhang, Z., Fischer, E., Zscheischler, J., and Engelke, S.: Numerical models outperform AI weather forecasts of record-breaking extremes, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-4301, https://doi.org/10.5194/egusphere-egu26-4301, 2026.

11:30–11:50
|
EGU26-9387
|
solicited
|
On-site presentation
Hannah Christensen, Bobby Antonio, and Kristian Strommen

Understanding how fast atmospheric variability shapes slow climate variability and sensitivity is a central challenge in Earth-system science. Recent advances in machine-learned (ML) atmospheric models have demonstrated remarkable skill on weather timescales, but their emergent behaviour in a fully coupled climate system is largely unexplored. We present results from a new hybrid modelling framework that couples a machine-learned atmosphere to a dynamical ocean model. We report on a set of 70-year coupled simulations (1950–2020 historical forcing and fixed-1950s control) in which the ACE2 ML climate emulator is interactively coupled to the NEMO ocean model. These experiments represent, to our knowledge, the first multi-decadal integrations of a machine-learned atmosphere interacting with a full-depth dynamical ocean. We assess the behaviour of the coupled system, with particular focus on low-frequency tropical variability and the climate response to greenhouse-gas forcing. Preliminary results indicate realistic emergent El Nino-like variability and a physically plausible climate sensitivity, suggesting that key atmosphere–ocean feedbacks can be captured within a hybrid ML–dynamical framework. These results evaluate the possible role of entirely machine-learned components in next-generation Earth-system models.

How to cite: Christensen, H., Antonio, B., and Strommen, K.: Evaluating emergent climate behaviour in a hybrid machine learned atmosphere -- dynamical ocean model, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9387, https://doi.org/10.5194/egusphere-egu26-9387, 2026.

11:50–12:00
|
EGU26-15037
|
ECS
|
On-site presentation
Renu Singh, Robert Brunstein, Antonia Anna Jost, Yana Hasson, Guillaume Couairon, Christian Lessig, and Claire Monteleoni

The last 5 years have seen an AI revolution in weather forecasting with data-driven models trained on ERA5 (such as Pangu-Weather, GraphCast) surpassing the skill of numerical models at a fraction of the compute costs . Furthermore, stochastic modeling approaches are now state-of-the-art as they can model the uncertainty in the dynamics of the earth system (GenCast, FGN). Similarly, there have been recent advances in long-term climate emulation using data-driven methods, although they either use deterministic models (ACE2, Lucie) or are trained on simulated climate data from physical models (ArchesClimate). Here, we evaluate a stochastic modeling approach, ArchesWeatherGen, on historical climate timescales (last 40 years) and its response to ocean forcings in an AMIP run setup (atmospheric model forced with sea surface temperature and sea ice). These simulations contribute to AIMIP (AI Model Intercomparison Project), an initiative to organize and compare the current state-of-the-art AI climate models. 

ArchesWeather and ArchesWeatherGen are efficient data-driven models built for medium-range weather forecasting. ArchesWeather is a deterministic transformer-based model and ArchesWeatherGen is a probabilistic generative model based on flow matching, with the same transformer backbone, that corrects the deterministic model prediction and accounts for variability in the time evolution.

In adherence to the AIMIP Stage 1 protocol, we adapt the models to serve as an atmospheric climate model for AMIP climate simulations on the historical period of 1979-2024. ArchesWeather and ArchesWeatherGen are extended to take into account monthly mean forcings for sea surface temperature (SST) and sea ice cover computed from ERA5. These models are trained on daily averaged 1-degree ERA5 data and they predict the state of the atmosphere at a forecast lead time of 24 hours given initial conditions.

We examine the ability of both models to stably emulate the current climate by quantitatively and qualitatively comparing them to the ERA5 climatology. Our results show that the models are able to emulate the current climate faithfully and reproduce many teleconnections as well as modes of annular variability correctly. We ablate different model configurations against each other and investigate the influence of the residual predictions of ArchesWeatherGen on the quality of the climate simulations compared to the deterministic predictions of ArchesWeather. We also analyse the models' capability to reproduce extreme weather statistics. Lastly, we examine the models’ response to forcings by evaluating the stability, trend, and physical correlations when running the model in different forcing scenarios, such as no forcings, annually repeating forcings, and increased SST.

How to cite: Singh, R., Brunstein, R., Jost, A. A., Hasson, Y., Couairon, G., Lessig, C., and Monteleoni, C.: Evaluating ArchesWeather and ArchesWeatherGen under Multi-Decadal AMIP-style climate simulations, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-15037, https://doi.org/10.5194/egusphere-egu26-15037, 2026.

12:00–12:10
|
EGU26-6394
|
ECS
|
On-site presentation
Sarah Schöngart, Lukas Gudmunsson, Chris Womack, Carl-Friedrich Schleussner, and Sonia Seneviratne

Machine-learning-based weather and climate emulators are rapidly transforming how climate information is generated and applied by enabling fast scenario exploration, large ensemble analysis, and the generation of decision-relevant climate data at scales beyond the reach of traditional climate models. Emulators are increasingly integrated into policy-relevant assessments and are expected to play a growing role in upcoming IPCC reports. Yet the field remains fragmented as task definitions and evaluation standards differ across communities, and frameworks for connecting short-term weather emulation to long-term climate projections are missing..

Here, we synthesise 77 studies on spatially explicit climate, hybrid weather-climate, and weather emulators within a unified conceptual framework, mapping inputs and outputs, methodological choices, validation practices, and computational requirements. Three structural patterns emerge. First, most climate emulators prioritise computational speed and scenario agility but offer limited output flexibility, typically generating gridded fields for a narrow set of variables. Second, the emulator landscape is fragmented: weather and hybrid weather-climate emulators form a coherent, machine-learning-driven cluster, whereas climate emulators are more heterogeneous, less connected to machine-learning advances, and validated inconsistently. Third, state-of-the-art weather emulators often rely on specialised hardware and institutional resources concentrated in a few organisations, raising questions of computational equity and “agility for whom”.

Our findings suggest that realizing genuine agility will require future research to focus on user-tailored outputs, rigorous evaluation across forcing scenarios, cross-domain methodological integration, and equitable access to computational resources. These priorities will help the field transition from methodological innovation toward policy-relevant application.

How to cite: Schöngart, S., Gudmunsson, L., Womack, C., Schleussner, C.-F., and Seneviratne, S.: A review of spatially explicit climate emulators for enhancing modelling agility, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-6394, https://doi.org/10.5194/egusphere-egu26-6394, 2026.

12:10–12:20
|
EGU26-19652
|
ECS
|
On-site presentation
Lily-belle Sweet, Christoph Müller, Jonas Jägermeyr, Weston Anderson, and Jakob Zscheischler

Climate impacts such as crop yield failure arise from complex combinations of weather conditions acting across multiple time scales, making it challenging to identify the most relevant climate drivers from high-resolution weather data. However, with data limitations, and the existence of complex and interacting relationships between growing-season climate conditions and plant growth, complex machine learning models that show high performance in predicting crop yield are often ‘right for the wrong reasons’. Process-based crop model simulations, which embody known functional relationships, could provide a useful testbed for developing and evaluating more trustworthy and robust methods. We present a novel two-stage, data-driven framework designed to extract a parsimonious set of climate drivers from multivariate daily meteorological inputs by systematically generating, evaluating and discarding candidate features using machine learning and then producing a set of drivers that are robust across locations, years and predictive feature combinations. We first validate the method using simulated U.S. maize yield failure data from two global gridded crop models, using rigorous out-of-sample testing: training on only early 20th-century data and holding out over 70 subsequent years for evaluation. The drivers identified using our approach align with known crop model mechanisms and rely solely on model input variables. Parsimonious logistic regression models built from these drivers achieve strong predictive skill under non-stationary climate conditions.

After validating the methodology on simulated data, we apply the same approach to observed county-level yields and daily multivariate weather data in rainfed and irrigated US maize systems. We identify compact sets of five climate drivers that effectively reproduce interannual variability and major historic failure events, including the 1993 Midwest floods and the 2012 drought. In rainfed systems, yield failure risk is strongly associated with extended periods of high soil moisture conditions after establishment, seasonal precipitation levels and vapor pressure deficit (VPD), with more than 40 high-VPD days between flowering and maturity markedly increasing odds of yield failure. In irrigated systems, critical drivers include soil moisture conditions surrounding planting, hot or dry days after establishment, and dewpoint temperatures near harvest. Our results demonstrate the transferability of the method from simulations to observations, and suggest its applicability to other crops, locations and further climate-related impacts. By avoiding reliance on post-hoc interpretability of black-box models, this framework enables the use of inherently interpretable, statistical models while still leveraging the predictive power of high-dimensional meteorological datasets.

How to cite: Sweet, L., Müller, C., Jägermeyr, J., Anderson, W., and Zscheischler, J.: Using process-based model simulations to develop and validate a data-driven approach for identifying climate drivers of maize yield failure, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19652, https://doi.org/10.5194/egusphere-egu26-19652, 2026.

12:20–12:30
|
EGU26-13814
|
ECS
|
On-site presentation
Robert Brunstein and Christian Lessig

The capabilities and skill of emerging data-driven weather forecasting and climate models are steadily increasing and significant progress has been made in terms of their quality in the last years. Data-driven weather forecasting models predict the state of the atmosphere for a single step, e.g. 6h. Longer lead times are obtained using time-stepping where predictions are fed back into the model for the next step. Although many models exhibit stable behaviour for long rollouts, the training only considers short trajectories. The trained models are therefore statistically not well calibrated at longer lead times and for phenomena like blocking patterns or teleconnections, which happen on time scales larger than a few days, the predictions are poorly constrained by the training. To address this issue, the training of data-driven models needs to consider information about the atmospheric conditions from several days up to several weeks. 

We approach this problem by using ArchesWeather and ArchesWeatherGen. ArchesWeather provides a deterministic prediction of the next state of the atmosphere. ArchesWeatherGen, a probabilistic flow-matching model, corrects  the deterministic prediction to obtain a probabilistic prediction that matches the ground truth state. We tackle the long lead time calibration problem by applying ArchesWeatherGen after a large number of deterministic forecasting steps, in contrast to the single step used for ArchesWeatherGen for medium-range weather forecasting. We therefore condition ArchesWeatherGen on an entire long forecast trajectory produced by the deterministic model. Through this, ArchesWeatherGen obtains more temporal information about the atmosphere as well as the error development and can explicitly learn longer-time correlation patterns in the atmospheric dynamics. This leads to a better calibrated model at longer lead times. It also reduces the number of diffusion steps, and hence the computational costs, as we only correct the mean prediction after a larger number of deterministic autoregressive forecasting steps. For our study, we examine the influence of the length of the input trajectory and evaluate the improvement of our approach compared to the results obtained with a single step model correction.

How to cite: Brunstein, R. and Lessig, C.: Statistical Calibration of ArchesWeatherGen for Enhanced Sub-Seasonal and Longer Predictions, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13814, https://doi.org/10.5194/egusphere-egu26-13814, 2026.

Posters on site: Thu, 7 May, 08:30–10:15 | Hall X5

The posters scheduled for on-site presentation are only visible in the poster hall in Vienna. If authors uploaded their presentation files, these files are linked from the abstracts below.
Display time: Thu, 7 May, 08:30–12:30
Chairpersons: Sebastian Engelke, Tim Whittaker
X5.199
|
EGU26-719
|
ECS
Pankaj Sahu, Sukumaran Sandeep, and Hariprasad Kodamana

Machine Learning Weather Prediction (MLWP) models—specifically GraphCast, PanguWeather, Aurora, and FourCastNet—show great promise for competing with physics-based Numerical Weather Prediction (NWP) models by providing global forecasts at a low computational cost. However, a thorough physical evaluation is needed before they can be used in place of NWP models. Our comprehensive study comparing these four leading MLWP models with NWP and observations in Tropical Cyclone (TC) forecasting across all tropical basins uncovers a significant duality: MLWP models are very good at predicting the TC track (with an average error of less than 200 km at a 96-hour lead time) because they accurately capture the underlying dynamics. However, they always underestimate the maximum sustained wind speeds (intensity). This systematic low intensity bias is directly related to biases that come from their ERA5 training data and are made worse by penalties. Even with this limitation, the models accurately depict important physical structures, such as low-level convergence and the vertical warm core, while also keeping different physical fields consistent. This suggests that the models learn how different dynamical and thermodynamical processes are related to each other in a way that makes sense. Ultimately, although MLWPs, especially Aurora, exhibit an implicit comprehension of TC dynamics, their enduring intensity bias requires additional refinement prior to their complete substitution of NWP models.

How to cite: Sahu, P., Sandeep, S., and Kodamana, H.: Does AI Learn Physics? Assessing the Physical Fidelity of Data-Driven Tropical Cyclone Forecasts, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-719, https://doi.org/10.5194/egusphere-egu26-719, 2026.

X5.200
|
EGU26-2452
|
ECS
Gurjeet Singh, Frantzeska Lavda, and Alexandros Kalousis
Deep generative models such as flow matching and diffusion models have great potential for learning complex dynamical systems, but they typically act as black boxes, neglecting underlying physical structure. In contrast, physics-based models governed by ODEs and PDEs provide interpretability and physical consistency, yet are often incomplete due to unresolved processes, missing source terms, or uncertain parameterisations. Bridging these two paradigms is a central challenge in data-driven weather and climate modelling.

We propose a Climate Grey-Box Dynamics Matching framework designed for weather and climate systems, that explicitly combines existing physical models with data-driven learning to capture unresolved dynamics where known physical operators are directly embedded into the learned dynamics. Our framework learns from observational trajectories alone and operates in a simulation-free manner inspired by gradient matching and flow matching methods. By avoiding numerical solvers, it eliminates the memory overhead, computational cost, and numerical instability associated with Neural ODE–based approaches.

To capture temporal dependencies in our simulation-free method, we introduce a lightweight attention-based temporal encoder that aggregates short-term history in a physically consistent manner. This design enables the model to represent unresolved dynamics without increasing computational complexity, making it well-suited for high-dimensional spatiotemporal climate systems. We apply this framework to weather and climate forecasting and demonstrate its effectiveness against ClimODE, a state-of-the-art solver-based grey-box model. Reformulating ClimODE as a simulation-free grey-box model reduces training complexity from Ο(L) to Ο(1), where L denotes the number of solver steps. Beyond computational gains, the simulation-free formulation yields substantial memory efficiency: training is possible on a single RTX 3060 (12 GB), whereas ClimODE requires at least 25 GB of GPU memory with a small batch size. This enables efficient training on commodity hardware and improves accessibility for large-scale climate modelling.

Experiments on weather and climate benchmarks show that the proposed method achieves improved forecast accuracy and faster convergence compared to simulation-based and fully data-driven baselines. The method demonstrates particular robustness to long horizons, as performance gains become more pronounced with extended forecast times—indicating enhanced temporal stability and resistance to error accumulation, an essential property for reliable long-range climate prediction.

How to cite: Singh, G., Lavda, F., and Kalousis, A.: Climate Grey-Box Flow Matching for Robust Climate and Weather Prediction, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-2452, https://doi.org/10.5194/egusphere-egu26-2452, 2026.

X5.201
|
EGU26-3091
|
ECS
Tim Whittaker and Alejandro Di Luca

Atmospheric rivers (ARs) are the dominant drivers of hydrological extremes along the western coast of North America, yet the physical upper limits of their intensity remain poorly understood and weakly constrained by the short observational record. While thermodynamic amplification of ARs under climate change is well-documented, the potential for dynamical amplification driven by the wind field remains uncertain and computationally expensive to sample using conventional techniques such as large ensembles of simulations. Here, we address this sampling barrier by leveraging techniques from machine learning, specifically combining a differentiable global climate model with high-resolution regional downscaling to generate storylines of unprecedented AR events in western Canada. By formulating the event generation as an optimal control problem, we compute the gradients of the model’s output to learn minimal, physically plausible perturbations to historical initial states that maximize AR’s associated integrated vapour transport at landfall. These optimized storylines are further dynamically downscaled using a high-resolution regional climate model, producing extreme precipitation events that significantly exceed historical benchmarks. 

How to cite: Whittaker, T. and Di Luca, A.: Learning to sample unprecedented atmospheric rivers, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-3091, https://doi.org/10.5194/egusphere-egu26-3091, 2026.

X5.202
|
EGU26-8656
Chia-Ying Tu, Yu-Chi Wang, Chung-Cheh Chou, and Zheng-Yu Yan

Recent advancements in AI/ML-based Data-Driven Weather Prediction (DWP) have revolutionized meteorological forecasting. By leveraging deep learning architectures trained on the ECMWF ERA5 reanalysis, DWP models can iteratively predict atmospheric states with accuracy comparable to traditional Numerical Weather Prediction (NWP) while requiring orders of magnitude less computational power. However, DWP’s reliance on historical training data poses challenges for climate-scale simulations, particularly in representing evolving phenomena influenced by non-stationary climate change. This study investigates the applicability of the GraphCast DWP model for climate research, specifically focusing on its potential for global climate downscaling and bias correction.

To evaluate performance across varying initial conditions, we conducted three distinct 72-hour GraphCast integration experiments. The first experiment utilized high-resolution (0.25°) ERA5 data from 2000–2010 to assess model reproducibility (H-ERA5), while the second experiment employed low-resolution (1.0°) ERA5 data to quantify sensitivity to initial horizontal grid spacing (L-ERA5). In the third experiment, we utilized 36 years (1979–2014) of HiRAM climate simulations as initial conditions to evaluate a novel DWP-based climate modeling framework (GC-HiRAM).

Results from the H-ERA5 and L-ERA5 experiments demonstrate that GraphCast effectively reproduces the climate mean state and variance of the ERA5 dataset. However, both experiments exhibited an underestimation of tropical cyclone (TC) frequency and intensity, consistent with known TC climatology biases in ERA5. Notably, the GC-HiRAM experiment closely aligned with the mean states and long-term trends of the original HiRAM simulations while yielding precipitation and surface temperature variances comparable to ERA5. Interestingly, the inherent TC underestimation in GraphCast served as a functional bias correction for HiRAM, which traditionally overestimates TC frequency, thereby improving overall simulation skill. Our findings suggest that this innovative DWP-driven approach provides a computationally efficient and robust framework for global climate modeling, effectively capturing essential climate phenomena while introducing a viable pathway for high-resolution climate downscaling and ensemble simulations.

How to cite: Tu, C.-Y., Wang, Y.-C., Chou, C.-C., and Yan, Z.-Y.: Harnessing Data-Driven Weather Prediction (DWP) Model for Climate Modeling, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8656, https://doi.org/10.5194/egusphere-egu26-8656, 2026.

X5.203
|
EGU26-12464
Georgie Logan, Daniel Cotterill, Mark McCarthy, Andrew Ciavarella, Henry Addison, Peter Watson, and Tomas Wetherell

Probabilistic attribution of extreme events requires large-ensemble climate model simulations, for both present and counterfactual climates, to adequately capture the tails of the distribution. Accurately modelling rainfall extremes, particularly those involving convection, or rainfall over regions with complex topography, requires high-resolution climate models. High-resolution climate data is particularly important for impact attribution to simulate realistic flood inundation as input to flood models.

Large ensembles of climate model runs for pre-industrial climates do not currently exist at convection-permitting resolution, as conventional convection-permitting models are computationally expensive to run. Therefore, attribution studies on extreme localised convective rainfall events are limited, despite the large impacts these events have on society.

To address this, we create a convective-permitting-resolution, large-ensemble dataset for England and Wales using a generative AI approach to downscale a pre-existing large ensemble of attribution runs from the HadGEM3 climate model. We use the diffusion model CPMGEM from Addison et al. (2025), which is trained and tested on the convection-permitting-resolution UK local Climate Projections data. We use CPMGEM, which enables stochastic generation of multiple samples per coarse model input, to generate multiple high-resolution precipitation samples from our original large-ensemble dataset. This process is relatively computationally cheap and enables creation of a high-resolution dataset that is larger than the input dataset.

We first investigate the ability of CPMGEM to be applied to a different configuration of the model it was trained on, and on an alternative set of counterfactuals. We also explore its ability to conserve climate trends and reproduce realistic values for the extremes.

We then assess the validity of using the downscaled dataset for attribution studies. If suitable, we will revisit a number of relevant attribution studies of extreme rainfall events and compare the original results from the coarse climate model HadGEM3-A to our new results using the high-resolution downscaled CPMGEM output. Overall, this could significantly extend the capability to attribute localised extreme rainfall events.

How to cite: Logan, G., Cotterill, D., McCarthy, M., Ciavarella, A., Addison, H., Watson, P., and Wetherell, T.: Attribution of convective rainfall events using AI-downscaling – how extreme can we go?, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12464, https://doi.org/10.5194/egusphere-egu26-12464, 2026.

X5.204
|
EGU26-15189
|
ECS
Tom Wood and Tom Matthews

This study addresses recent calls for greater focus on understanding unprecedented extreme events (e.g. Kelder et al., 2025; Matthews et al., in review) by exploring the potential to use downscaled ‘synthetic data’ from climate model projections to train cutting-edge, computationally efficient deep learning models and generate very large ensembles of high-resolution extreme weather events under future perturbed climates. The study seeks to advance understanding of plausible upper limits in extreme high-impact, low-likelihood (HILL), record-shattering extremes and unprecedented tail risks, focusing initially on the threat of uncompensable heat with the potential to result in catastrophic mass mortality impacts. We address a number of open questions in this nascent field by testing a set of recently developed tools in new and innovative ways to understand the benefits and limitations of this approach. 

Can we generate new insights beyond what can be achieved using traditional methods, such as large ensembles of physics-based models and advances such as ensemble boosting? What are the benefits of producing very large stochastic ensembles of plausible extreme weather systems and how does this complement (or otherwise) other approaches with similar motivations (e.g. emulators)? Can we identify and validate plausible physical climate storylines leading to unprecedented extreme events e.g., by identifying and clustering meteorological setups leading to very large, compound, or concurrent non-contiguous regional extremes? Can we robustly constrain this method to ensure physical plausibility in unprecedented climates? Can we advance understanding of rare event probability under a non-stationary climate from various emissions pathways? What are the limitations due to aleatoric and epistemic uncertainty? How do we mitigate biases and limit their propagation? Can we investigate downward counterfactuals and identify meteorological conditions aligning with imagined worst-case scenarios?

By addressing these questions, this study seeks to advance knowledge of the threats posed by the most extreme plausible weather events posing potentially catastrophic risks to society.

How to cite: Wood, T. and Matthews, T.: How can AI tools be used to explore unprecedented future climate and weather extremes?, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-15189, https://doi.org/10.5194/egusphere-egu26-15189, 2026.

X5.205
|
EGU26-16518
|
ECS
Yuki Maeda and Masaki Satoh

The Western North Pacific Subtropical High (WNPSH) is one of the dominant subtropical anticyclonic circulations over the western North Pacific during boreal summer, strongly influencing East Asian extremes such as tropical cyclone tracks, heatwaves, and the Baiu/Meiyu front. WNPSH variability reflects both midlatitude teleconnections and tropical intraseasonal oscillations (BSISO). Therefore, to clarify predictability, it is essential to identify and quantify how individual events contribute to forecast skill and uncertainty.

We develop a probabilistic deep learning framework to predict a WNPSH index with explicit uncertainty, represented as Gaussian regression outputs (μ, σ), and assess its predictability up to a 1-month lead. We adopt a model that combines a three-dimensional convolutional neural network with self-attention. To capture diverse representations, we pretrain the model using a millennial-scale ensemble dataset from d4PDF and then fine-tune it with the ERA5 reanalysis. As a result, the prediction skill reaches ACC = 0.6 at 10-day lead time. With deep learning models, the prediction problem can be formulated as an explainable AI (XAI) task, in which precursor signals relevant to the forecast can be estimated directly from spatial patterns and input variables (Maeda and Satoh, 2025). Here, we analyze the predictability using a combination of XAI and the concept of windows of opportunity. During opportunity events, forecast skill improves to about a 15-day lead time. Clear precursor patterns emerge in the initial conditions, including signatures of intraseasonal oscillations and midlatitude wave trains. These signals are consistent with heatmap-based interpretations from XAI, providing quantitative statistics on the sources of predictability for prominent events.

How to cite: Maeda, Y. and Satoh, M.: Probabilistic Deep Learning Identifies Windows of Opportunity and Precursors for Western North Pacific Subtropical High Prediction, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16518, https://doi.org/10.5194/egusphere-egu26-16518, 2026.

X5.206
|
EGU26-16579
|
ECS
Jeong-Hwan Kim, Daehyun Kang, Young-Min Yang, Jae-Heung Park, and Yoo-Geun Ham

Artificial intelligence has advanced global weather forecasting, outperforming traditional numerical models in both accuracy and computational efficiency. Nevertheless, extending predictions beyond subseasonal timescales requires the development of deep learning (DL)–based ocean–atmosphere coupled models that can realistically simulate complex oceanic responses to atmospheric forcing. This study presents KIST-Ocean, a DL-based global three-dimensional ocean general circulation model. Comprehensive evaluations confirmed the model’s robust ocean predictive skill and efficiency. Moreover, it accurately reproduces realistic ocean responses, such as Kelvin and Rossby wave propagation, and vertical motions induced by rotational wind stress, demonstrating its ability to represent key ocean–atmosphere interactions underlying climate phenomena, including the El Niño–Southern Oscillation. These findings reinforce confidence in DL-based global weather and climate models by demonstrating their capacity to capture essential ocean-atmosphere relationships. Building upon this foundation, the present study paves the way for extending DL-based modeling frameworks toward integrated Earth system simulations, thereby offering substantial potential for advancing long-range climate prediction capabilities.

How to cite: Kim, J.-H., Kang, D., Yang, Y.-M., Park, J.-H., and Ham, Y.-G.: Data-driven global ocean model resolving atmospherically forced ocean dynamics, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16579, https://doi.org/10.5194/egusphere-egu26-16579, 2026.

X5.207
|
EGU26-16636
Nina Effenberger and Luca Schmidt

Earth System Models (ESMs) represent our most comprehensive tools for understanding and projecting climate change impacts; yet, they are highly computationally demanding and technically complex. Climate model emulators offer an alternative approach by approximating components or full ESM outputs at a reduced computational cost. Such emulators can range from reduced-order climate models to fully data-driven machine learning surrogates. As the demand for climate information increases, interest in climate model emulation has grown across both climate science and machine learning research, leading to rapid methodological development. Despite this shared interest, the two research fields remain largely disconnected and the application of machine learning climate emulators in climate science remains challenging [1]. Many emulators, therefore, remain unused in decision-making contexts--not because they lack value, but because methodological developers and users lack a shared framework for communication, evaluation, and practical guidance. 
This work examines this disconnect and takes a step towards facilitating the use of machine learning–based climate emulators in applied research and decision-making. We analyze and contrast methodological and applied perspectives on emulators, identify points of misalignment, and highlight opportunities for improved interaction. Building on these insights, we propose a tutorial-style framework that connects the two perspectives and provides practical guidance for developing, evaluating, and using climate emulators in research and decision-making contexts.

[1] Fowler, H. J., Mearns, L. O. and Wilby, R. L. [2025], Downscaling future climate projections: Compound-
ing uncertainty but adding value?, in ‘Uncertainty in Climate Change Research: An Integrated Approach’,
Springer, pp. 185–197.

How to cite: Effenberger, N. and Schmidt, L.: How can climate model emulators be aligned more closely with the needs of applied researchers?, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16636, https://doi.org/10.5194/egusphere-egu26-16636, 2026.

X5.208
|
EGU26-17113
|
ECS
Skye Williams-Kelly, Lisa Alexander, Steefan Contractor, and Sahani Pathiraja

Accurate precipitation predictions are vital for water resource management and risk mitigation. Interpolated precipitation estimates derived from in situ observations are frequently used to evaluate climate models and analyse trends. However, these inadequately represent its spatio-temporal characteristics and significantly smooth out extremes, inhibiting effective evaluation of dynamical models and analysis of trends. Machine learning methods may be suited to addressing these limitations due to their ability to identify patterns in large datasets and use of GPU acceleration. Therefore, we compare three ML-based approaches for improving observational daily precipitation datasets: Gaussian Processes, Bayesian Neural Fields, and Neural Processes. Their performance is evaluated using traditional and distributional metrics, including on out-of-sample prediction, enabling an objective assessment of generalisation skill and representation of extremes. Results are further compared against existing precipitation products to identify the relative strengths and limitations of each method.

How to cite: Williams-Kelly, S., Alexander, L., Contractor, S., and Pathiraja, S.: Evaluating machine learning approaches to improve observational daily precipitation datasets, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-17113, https://doi.org/10.5194/egusphere-egu26-17113, 2026.

X5.209
|
EGU26-18038
|
ECS
Mozhgan Amiramjadi, Christopher Roth, and Peer Nowack

Data-driven weather prediction models have demonstrated remarkable skill, yet their ability to maintain a physically consistent three-dimensional atmospheric structure under out-of-distribution (OOD) conditions remains poorly understood. If OOD performance criteria could be met approximately, AI models would open up entirely new possibilities to generate large AI weather ensembles under future climate scenarios—for example, if initialized from climate model simulations (Rackow et al., 2024). This study conducts a multi-scale diagnostic evaluation of four state-of-the-art models—NeuralGCM (a deterministic hybrid model), GraphCast (a deterministic graph neural-network model), AIFS (a deterministic transformer-based model), and GenCast (an ensemble generative and diffusion-based model)—initialized across three distinct climate states: 1955 (cold), 2023 (neutral), sourced from ERA5 reanalysis, and 2049 (warm) simulated by the nextGEMS climate model (Segura et al., 2025).

Over 1–10-day leads, we find no detectable resolution-dependence for NeuralGCM's global skill, though the 1.4° configuration minimizes mean drift. A dominant spatial signature emerges across all models: a robust land–ocean contrast where oceans maintain smaller biases and slower Anomaly Correlation Coefficient (ACC) decay. Cross-hemispheric skill comparisons reveal that this contrast drives a significant asymmetry in error characteristics. In the 2049 warming scenario, the land-heavy Northern Hemisphere (NH, 39% land coverage) is the primary site of GraphCast's systematic "cool-drift" toward its training distribution, which peaks during boreal summer (JJA). In contrast, the generative GenCast model develops a pronounced warm bias localized in the oceanic Southern Hemisphere (SH, with about 20% land coverage).

For all three climate states, we further evaluate model performance across the entire troposphere and, as far as available, the stratosphere. While all four models maintain high variance-explained in the present-day mid-troposphere, performance degrades non-linearly under OOD forcing elsewhere, particularly within the stratosphere (< 200 hPa) and the boundary layer (> 900 hPa). Latitudinal R2-score cross-sections reveal that this degradation is most severe at polar latitudes; notably, in the 2049 scenario, GenCast exhibits a near-total collapse of skill by day 10, whereas NeuralGCM and GraphCast maintain localized predictive skill within the tropical troposphere.

The architecture-dependence of these simulated ensembles is confirmed by projecting day-10 drifts onto inter-climate "fingerprints" (T2049 - T2023 and T1955 - T2023). While AIFS and NeuralGCM show superior stability, GraphCast exhibits a systematic "cool-drift" toward its training climatology, and GenCast develops a distinct warm ocean drift. Beyond evaluating skill in surface variables, our results underline the need to assess data-driven models comprehensively across vertical, hemispheric, and seasonal diagnostics when applied to climate science scenarios, with implications for future AI model development.

References:

Rackow, T., et al (2024). Robustness of AI-based weather forecasts in a changing climate. arXiv preprint  arXiv:2409.18529. https://doi.org/10.48550/arXiv.2409.18529

Segura, H., et al. (2025). nextGEMS: entering the era of kilometer-scale Earth system modeling. Earth system modeling, Geosci. Model Dev., 18, 7735–7761, https://doi.org/10.5194/gmd-18-7735-2025

How to cite: Amiramjadi, M., Roth, C., and Nowack, P.: Architectural Sensitivity of AI Weather Prediction Models to 3D Structural and Seasonal Climate Forcing, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18038, https://doi.org/10.5194/egusphere-egu26-18038, 2026.

X5.210
|
EGU26-18557
|
ECS
Sabine Scholle and Felix Pithan

Bias-Correcting Arctic ERA5 Surface Air Temperatures using Deep Learning 

Fine-tuning AtmoRep, a climate dynamics foundational model for improved Arctic 2m temperature predictions 

Due to the Arctic's harsh environment, comprehensive observational networks remain incomplete, leading to a reliance on biased reanalysis datasets such as ERA5. [1] This study investigates the potential of fine-tuning AtmoRep, a pre-trained transformer model for global atmospheric dynamics, to improve bias correction of Arctic 2-meter temperature (t2m) predictions. [2] 

Our methodology involves fine-tuning AtmoRep using ERA5 fields as input and bias-corrected Arctic t2m synthetic data, from a parallel project, as a target. [3] The project goal is to leverage AtmoReps global climate representations to further push the bias-corrected synthetic Arctic t2m data, given ERA5 as input (evaluated against observational data).

Preliminary results demonstrate stable validation performance of AtmoRep over the Arctic, achieving a t2m RMSE of 0.27 K during fine-tuning. Model robustness was further evaluated under severely masked target fields (up to 90% masking), and comparing BERT-style reconstruction with a forecasting-based training strategy. 

This study represents a novel application of foundation pretrained climate models for bias correction in sparsely observed Arctic regions, highlighting the potential of machine learning approaches to advance atmospheric science. 

  • Tian, T., Yang, S., Høyer, J. L., Nielsen-Englyst, P., & Singha, S. (2024). Cooler Arctic surface temperatures simulated by climate models are closer to satellite-based data than the ERA5 reanalysis. Communications Earth & Environment, 5(1). https://doi.org/10.1038/s43247-024-01276-z 
  • Lessig, C., Luise, I., Gong, B., Langguth, M., Stadtler, S., & Schultz, M. (2023b, August 25). AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning. arXiv.org. https://arxiv.org/abs/2308.13280 
  • Hossain, A., Keil, P., Grover, H., et al. Machine Learning Eliminates Reanalysis Warm Bias and Reveals Weaker Winter Surface Cooling over Arctic Sea Ice. ESS Open Archive . December 24, 2025.  https://doi.org/10.22541/essoar.176659533.30384251/v1 

How to cite: Scholle, S. and Pithan, F.: Bias-Correcting Arctic ERA5 Surface Air Temperatures using Deep Learning , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18557, https://doi.org/10.5194/egusphere-egu26-18557, 2026.

X5.211
|
EGU26-20173
|
ECS
Marco Froelich and Sebastian Engelke

There has been recent interest in the advantage of differentiability of AI-weather models to enable direct computation of model sensitivities to initial conditions. In the field of machine learning, adversarial attacks leverage these sensitivities to influence the output of the prediction system by finding optimal initial condition perturbations. In weather forecasting, this methodology can be seen under two lenses: differentiable models are susceptible to malicious attacks aimed at distorting operational forecasts [1], while having access to sensitivities is an opportunity to further our understanding of real events through the generation of synthetic forecasts. Adversarial examples - perturbed initial conditions obtained from adversarial attacks - have been used in [2] to create even more extreme forecasts of a heatwave, providing a storyline approach to understanding black swan heatwave events. 

We further this effort by exploring adversarial attacks of tropical cyclone predictions at 0.25° resolution using Operational GraphCast. Although AI-weather models are known to improve tropical cyclone track predictions against numerical systems it remains challenging to forecast high intensities, particularly at high-resolution. Indeed, AI-weather models trained with MSE-type losses on reanalysis are known to suffer from 'blurred' forecasts due to the implicit down-weighing of small scale features. We find that while standard adversarial attacks of tropical cyclone forecasts are effective in controlling tropical cyclone tracks, they fail to reproduce realistic gradients of temperature, geopotential and wind fields, effectively worsening blurring effects. This is true also for attacks on the AMSE-finetuned Operational GraphCast model [3] which otherwise shows significant improvements in representing small scale features. We then borrow insights from the machine learning literature on the impact of the low-frequency bias of neural networks and its relationship to adversarial examples to improve this limitation and explore the capabilities of AI-weather models in global high-resolution tropical cyclone forecasting. 

 

References: 
[1] Imgrund, E., Eisenhofer, T., Rieck, K., 2025. Adversarial Observations in Weather Forecasting.
[2] Whittaker, T., Luca, A.D., 2025. Constructing Extreme Heatwave Storylines with Differentiable Climate Models.
[3] Subich, C., Husain, S.Z., Separovic, L., Yang, J., 2025. Fixing the Double Penalty in Data-Driven Weather Forecasting Through a Modified Spherical Harmonic Loss Function.

How to cite: Froelich, M. and Engelke, S.: Exploring Adversarial Attacks in AI Weather Models for Generation of High-resolution Tropical Cyclones, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-20173, https://doi.org/10.5194/egusphere-egu26-20173, 2026.

X5.212
|
EGU26-3927
Yoo-Geun Ham, Seol-Hee Oh, and Gyuhui Kwon

Reliable prediction of climate variables and high-impact extremes in the midlatitudes is crucial for climate risk assessment, agricultural planning, water resource management, and disaster preparedness. However, conventional deep learning–based approaches for midlatitude climate prediction trained with dynamical climate models (e.g., CMIP models) can cause systematic errors in capturing the observed climate-relevant signals, ultimately limiting prediction skill. These limitations highlight the need to improve midlatitude prediction by detecting climate signals solely from the limited numbers of reliable observational climate data. To address the challenge of limited training samples, we employ the model-agnostic meta-learning (MAML) algorithm along with domain-knowledge-based data augmentation to predict mid-latitude winter temperatures. The proposed data augmentation is purely based on the observed data by defining the labels using large-scale climate variabilities associated with the target variable. The MAML-applied convolutional neural network (CNN) demonstrates superior correlation skills for winter temperature anomalies compared to a reference model (i.e., the CNN without MAML) and state-of-the-art dynamical forecast models across all target lead months during the boreal winter seasons. Moreover, occlusion sensitivity results reveal that the MAML model better captures the physical precursors that influence mid-latitude winter temperatures, resulting in more accurate predictions.

How to cite: Ham, Y.-G., Oh, S.-H., and Kwon, G.: Few-shot learning for mid-latitude climate forecasts, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-3927, https://doi.org/10.5194/egusphere-egu26-3927, 2026.

X5.213
|
EGU26-5719
|
ECS
Ilenia Manco, Otavio Medeiros Feitosa, Mario Raffa, and Paola Mercogliano

High-resolution climate datasets are fundamental for monitoring extreme events, assessing climate variability, and supporting climate adaptation strategies. However, producing high-resolution climate reanalyses usually requires computationally expensive dynamical downscaling. As a result, near–real-time high-resolution climate services remain limited, since most downscaling products are generated retrospectively with delays of months to years (Hersbach et al., 2020; Harris et al., 2022). Recent advances in generative machine learning enable realistic fine-scale atmospheric fields that preserve spatial coherence and key statistics, including extremes (Rampal et al., 2025; Camps-Valls et al., 2025). Hybrid statistical–dynamical approaches therefore provide an efficient and physically consistent pathway for operational high-resolution dataset production (Glawion et al., 2025; Schmidt et al., 2025). This work presents the progress achieved in the development of a high-resolution climate datasets over the Italian Peninsula at 2.2 km resolution, exploiting a conditional Generative Adversarial Network (cGAN) model developed in Manco et al. (2025). The framework follows a hybrid statistical–dynamical downscaling strategy, in which ERA5 reanalysis data at 0.25° resolution are downscaled using cGANs trained against the very-high-resolution dynamical product VHR-REA_IT (Raffa et al., 2021). The system has been extended to multiple near-surface atmospheric variables, including mean, minimum, and maximum 2 m temperature, relative surface humidity, cumulative precipitation, and 10 m wind (speed and direction), the latter two representing particularly challenging targets (Fig. 1). Each variable is downscaled using a dedicated cGAN trained independently to learn the non-linear spatial relationships between coarse-resolution ERA5 predictors and high-resolution VHR-REA_IT targets, while employing a common network architecture and loss function to ensure methodological consistency. This enabled the production of a high-resolution historical dataset covering the period 1990–2024 at daily frequency, with 1990–2000 used for training. Since January 2025, the framework (Fig. 2) has been integrated into an operational chain and used to generate high-resolution fields in near real time, automatically updating the dataset as new ERA5 data become available, with an average latency of approximately six days. All data are distributed in NetCDF format through the CMCC Data Delivery System (https://dds.cmcc.it/) within the FAIR (Fast AI Reanalysis) product, with daily maps accessible via the Dataclime dashboard (https://www.dataclime.com/). Both deterministic and probabilistic configurations of the cGAN framework are presented. Results, evaluated against the dynamically downscaled fields available at the same resolution over the common historical period, show that the proposed approach robustly reproduces spatial patterns (Fig. 3), mean values, and variability across all variables. The probabilistic configuration improves uncertainty representation and shows skill in capturing both mean conditions and extremes. Overall, the framework represents a versatile and robust solution for the generation of high-resolution climate datasets in both historical and operational contexts. Remaining limitations primarily concern the representation of extreme precipitation percentiles in regions characterized by complex orography, which will be the focus of future developments.

Fig. 1 – Wind speed at 10 m for a random day.

Fig. 2 - c-GAN Training Framework

Fig. 3 – Seasonal Analysis. 2-m minimum temperature.

 

How to cite: Manco, I., Feitosa, O. M., Raffa, M., and Mercogliano, P.: An AI-based framework for high-resolution climate dataset over Italy: from historical reconstruction to an operational chain, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5719, https://doi.org/10.5194/egusphere-egu26-5719, 2026.

X5.214
|
EGU26-7801
Emilia Diaconescu, Jean-François Caron, Valentin Dallerit, Stéphane Gaudreault, Syed Husain, Shoyon Panday, Carlos Pereira Frontado, Leo Separovic, Christopher Subich, Siqi Wei, and Sasa Zhang

Environment and Climate Change Canada (ECCC) is actively advancing the integration of artificial intelligence (AI) into numerical weather prediction (NWP) through a coordinated research-to-operations strategy that combines state-of-the-art machine learning approaches with established physical modeling frameworks. This presentation summarizes the progress achieved to date.

We first describe the development of GEML (Global Environmental eMuLator), a global AI forecast model, based on Google DeepMind’s GraphCast, trained and fine-tuned in-house using ERA5 reanalysis and ECMWF operational analyses. Building on GEML, ECCC has implemented an experimental hybrid AI–NWP global forecasting system, GDPS-SN, which applies large-scale spectral nudging to improve the operational Global Deterministic Prediction System (GDPS) by leveraging the large-scale accuracy of GEML.

The presentation also introduces a description of PARADIS, a fully Canadian, physically inspired, AI-based weather forecast model, developed by ECCC and its partners. These activities illustrate ECCC’s strategic vision for AI-enabled weather prediction by combining scientific rigor, collaboration and  operational relevance to deliver more accurate forecasting systems.

 

How to cite: Diaconescu, E., Caron, J.-F., Dallerit, V., Gaudreault, S., Husain, S., Panday, S., Pereira Frontado, C., Separovic, L., Subich, C., Wei, S., and Zhang, S.: Bridging Physics and Machine Learning to Enhance Weather Forecasting at ECCC, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7801, https://doi.org/10.5194/egusphere-egu26-7801, 2026.

X5.215
|
EGU26-9811
|
ECS
Tim Radke, Susanne Fuchs, Iuliia Polkova, Christian Wilms, Johanna Baehr, and Marc Rautenhaus

Detection of atmospheric features in gridded datasets is typically done by means of rule-based algorithms. Recently, the feasibility of learning feature detection tasks using supervised learning with convolutional neural networks (CNNs) has been demonstrated. This approach corresponds to semantic segmentation tasks widely investigated in computer vision. However, while in recent studies the performance of CNNs was shown to be comparable to human experts, CNNs are largely treated as a “black box”, and it remains unclear whether they learn the features for physically plausible reasons. Here, we build on recently published studies that discuss datasets containing features of tropical cyclones (TCs), atmospheric rivers (ARs), and atmospheric surface fronts (SFs) as detected by human experts. We adapt the explainable artificial intelligence technique “Layer-wise Relevance Propagation” to the semantic segmentation task and investigate which input information CNNs with the Context-Guided Network (CGNet) and U-Net architectures use for feature detection. We find that for the detection of TCs and ARs, both CNNs indeed consider plausible patterns in the input fields of atmospheric variables. For instance, relevant patterns include point-shaped extrema in vertically integrated precipitable water (TMQ) and circular wind motion for TCs. For ARs, relevant patterns include elongated bands of high TMQ and eastward winds. Such results help to build trust in the CNN approach. In contrast, for the detection of SFs, we find only partially physically plausible patterns. While U-Net uses regions of changing temperature and humidity as well as strong wind shears to detect SFs, we also find noisy patterns relating to spurious correlations with the background data. To assess whether these implausible patterns reduce U-Net's generalizability, we evaluate it on a different SF dataset. Here, depending on the domain, SFs are often erroneously detected, especially in the Tropics and Arctic, highlighting the importance of analyzing whether patterns learned by a CNN are physically plausible. We also demonstrate application of the approach for finding the most relevant input variables and evaluating detection robustness when changing the input domain.

How to cite: Radke, T., Fuchs, S., Polkova, I., Wilms, C., Baehr, J., and Rautenhaus, M.: Explaining neural networks for detection of atmospheric features in gridded data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9811, https://doi.org/10.5194/egusphere-egu26-9811, 2026.

X5.216
|
EGU26-17080
Seonyu Kang, Yoo-Geun Ham, and Dongjin Cho

While deep learning-based atmospheric have been actively developed, in contrast, the development of ocean prediction models which allows multi-decade simulations through the autoregressive operation has been largely limited. This study developed a deep learning-based global ocean prediction model using the HEALPix grid system that capable of multi-decades integration in daily time step by successfully reproducing the observed global ocean statistics. Model training uses Fourier amplitude and phase losses to preserve low-frequency spatial structure and phase consistency, batch anomaly loss to learn anomalous variability, and sequentially ingests past-to-present atmospheric forcing to enable physically consistent coupled atmosphere–ocean dynamics in long-term integration. Long-term ocean model integration experiments with the observed atmospheric forcing demonstrate drift-free stable climatology for 20-yr simulations, with realistic Niño3.4 variations and ENSO-related global oceanic anomaly patterns consistent with observations. Furthermore, oceanic subsurface temperature responses to the westerly wind bursts (WWBs) over the equatorial western Pacific successfully capture the eastward propagation properties associated with the oceanic Kelvin waves.

How to cite: Kang, S., Ham, Y.-G., and Cho, D.: Deep learning-Based Global Ocean prediction model on the HEALPix Mesh, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-17080, https://doi.org/10.5194/egusphere-egu26-17080, 2026.

X5.217
|
EGU26-21336
|
ECS
Rebecca Herman and Jakob Runge

Climate scientists are increasingly exploring the possible applications of artificial intelligence to climate modeling, whether for use inside the model to replace parameterized model components, or for use separately as an emulator of observed or simulated climate. However, a major limitation of standard artificial intelligence techniques is that they cannot distinguish between statistical association and causality. While this is not a drawback for the purpose of statistical prediction in an unchanging system, it can pose a problem for generalization of parameterizations and emulators under climate change, and furthermore, it means that it is not sound to use such techniques to predict the response of the climate system to unobserved interventions, including proposed climate engineering initiatives. The framework of causal inference attempts to address this limitation, providing techniques for discovering qualitative (“discovery”) and quantitative (“effect estimation”) information about the system’s response to interventions from purely observational data (or imperfect experiments) using causal reasoning. However, it was not originally developed for application to spatiotemporal dynamical systems such as the climate system.

In previous work, we develop a unified framework for causal effect estimation in spatiotemporal dynamical systems. In contrast to the hard interventions on univariate representations of coupled climate phenomena that until now have been more commonly used, our framework allows the user to investigate the effect of a spatiotemporal perturbation on a climate variable in one finite region on another variable in a different finite region at another time after specifying the qualitative causal relationships between the regions as a whole. This framework advances causal effect estimation for climate science because spatiotemporal perturbations are better defined, more actionable, and more interpretable than hard interventions on conceptual climate phenomena.

Here, we evaluate its performance using CMIP6-class models, focusing initially on the effect of the El Niño Southern Oscillation (ENSO) on the North Atlantic Oscillation as an example query. We assess the robustness of the method to data sample size, resolution, and other methodology choices by comparing the causal effect for a given model calculated from different subsets of its pre-Industrial control simulation using various amounts of spatial data and various values of other parameters of the algorithm. We use these results to assess the expected uncertainty on any inferences made using this technique from the short observational record or CMIP6 historical simulations, and make recommendations for best practices in different circumstances. Finally, we evaluate the accuracy of the predictions by using a causal model trained on historical simulations to predict the output of Tropical Basin Interaction Model Intercomparison Project experiments from the same climate model that nudge Pacific Sea Surface Temperature in the ENSO region in a manner comparable to our perturbation intervention.

How to cite: Herman, R. and Runge, J.: Performance of Spatiotemporal Causal Effect Estimation in Coupled Climate Models, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21336, https://doi.org/10.5194/egusphere-egu26-21336, 2026.

X5.218
|
EGU26-21303
Étienne Plésiat, Maximilian Witte, Johannes Meuer, and Christopher Kadow

We present a flexible deep learning framework for climate data analysis that leverages message-passing graph neural networks.

The framework is fully configurable and allows users to construct diverse architectures. In particular, it supports encoder-processor-decoder configurations in which geophysical fields are mapped onto a hierarchy of multi-icosahedral meshes, enabling information to propagate across scales before being mapped back to the original spatial grid. The model architecture is defined through a set of graph operators, including transformer-based graph convolutions. The framework operates on both regular and irregular grids, and enables flexible multivariate processing with spatial consistency. It further incorporates adaptive graph connectivity, enabling robust handling of missing data through dynamic edge construction. Additionally, several explainable AI (XAI) techniques are integrated to facilitate interpretation and physical attribution.

These features make the framework suitable for a broad range of climate and Earth-system applications, including data infilling, downscaling and process attribution. Its capabilities are illustrated through two case studies: (i) the reconstruction of global precipitation fields from incomplete observations, with comparison to established statistical and deep learning methods, and (ii) the attribution of large-scale drivers contributing to an extreme heatwave event.

The framework is currently being deployed as a web processing service that supports operational inference for selected climate applications.

How to cite: Plésiat, É., Witte, M., Meuer, J., and Kadow, C.: Multiscale Graph Neural Networks for Climate Data Analysis, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21303, https://doi.org/10.5194/egusphere-egu26-21303, 2026.

X5.219
|
EGU26-19650
|
ECS
Marc Girona-Mata, Andrew Orr, and Richard Turner

Recent probabilistic machine learning weather forecasting models have demonstrated competitive skill relative to state-of-the-art (SOTA) numerical weather prediction ensemble systems. However, a rigorous global assessment of their skill, particularly in the distribution tails relevant for extremes as well as across different geographical regions, remains limited. Here, we present a systematic evaluation of various SOTA probabilistic AI weather forecasting systems against ECMWF’s Integrated Forecasting System Ensemble (IFS ENS), focusing on forecast skill across the full range of event intensities.

We analyse global forecasts at 24- and 72-hour lead times for near-surface temperature, 10 m wind speed, and total precipitation at 0.25° resolution over the 2024-2025 period. Forecasts are evaluated using the fair Continuous Ranked Probability Score (fCRPS) to account for differing ensemble sizes, as well as other complementary metrics. We also employ the threshold-weighted CRPS (twCRPS) computed for different quantiles ranging from the median up to the one-in-a-million extreme event. Scores are area-weighted and analysed both i) globally, ii) over land only, and iii) for different regions.

AI-based forecasts demonstrate comparable or improved probabilistic skill relative to the IFS ensemble in the bulk of the distribution, with particularly strong performance over tropical and mid-latitude oceans. However, skill systematically degrades at high quantiles for most variables, with more pronounced losses over land and at short lead times. Both diffusion- and CRPS-based probabilistic forecasts are competitive, but their relative skill varies across variables. Spatial diagnostics reveal coherent regime-dependent behaviour, with AI models underperforming in complex terrain and coastal regions where the IFS ENS retains a clear advantage. 

These results highlight both the promise and current limitations of probabilistic AI weather forecasting models, emphasising that headline global skill can mask substantial degradation in extreme-event and regional reliability.

How to cite: Girona-Mata, M., Orr, A., and Turner, R.: Global Evaluation of Probabilistic AI Weather Forecasts Across Extremes and Regimes, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19650, https://doi.org/10.5194/egusphere-egu26-19650, 2026.

Posters virtual: Wed, 6 May, 14:00–18:00 | vPoster spot 5

The posters scheduled for virtual presentation are given in a hybrid format for on-site presentation, followed by virtual discussions on Zoom. Attendees are asked to meet the authors during the scheduled presentation & discussion time for live video chats; onsite attendees are invited to visit the virtual poster sessions at the vPoster spots (equal to PICO spots). If authors uploaded their presentation files, these files are also linked from the abstracts below. The button to access the Zoom meeting appears just before the time block starts.
Discussion time: Wed, 6 May, 16:15–18:00
Display time: Wed, 6 May, 14:00–18:00

EGU26-20844 | Posters virtual | VPS4

Machine Learning-Based Prediction of Tropical Cyclone Intensification Over the North Indian Ocean Using ERA5 Reanalysis  

Dhanya Madhu, Neha Meriya Binu, and Maneesha Vinodini Ramesh
Wed, 06 May, 14:12–14:15 (CEST)   vPoster spot 5

Machine Learning models are rapidly becoming popular for complementing, enhancing, and in some cases, replacing traditional numerical models. This study presents a data-driven framework for predicting 24-hour tropical cyclone intensification over the North Indian Ocean using supervised machine learning and ERA5 reanalysis data. Cyclones that formed over Bay of Bengal and the Arabian Sea during the period 1990–2024 are considered here.  We have integrated environmental parameters from ERA5 with intensity records from the IBTrACS archive, excluding early developmental stages and retaining only dynamically mature systems. Intensification is formulated as a binary classification problem based on the sign of the 24-hour change in maximum sustained wind speed. While this captures general strengthening behaviour, it does not distinguish between moderate and rapid intensification, nor does it estimate the magnitude of intensity change. Five machine learning models—Logistic Regression, Random Forest, Extra Trees, Support Vector Machine, and Multilayer Perceptron—are trained and evaluated. Results indicate that the Random Forest classifier has achieved the highest accuracy. Feature-importance analysis reveals strong physical consistency, highlighting the dominant roles of upper-level circulation, sea surface temperature, vertical wind shear, and atmospheric moisture in regulating short-term intensification. Cyclone Montha (2025) is used as a test case to illustrate the model's real-world applicability and is validated outside of historical data. The model-predicted intensification probability is estimated as 0.943, which indicates good performance. Although a single case study does not constitute statistical validation, this illustrates the applicability of data-driven models in tropical cyclone intensity estimation. The results encourage further investigations into the use of such data-driven models in tropical cyclone intensity prediction, which aids disaster management efforts.

How to cite: Madhu, D., Binu, N. M., and Ramesh, M. V.: Machine Learning-Based Prediction of Tropical Cyclone Intensification Over the North Indian Ocean Using ERA5 Reanalysis , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-20844, https://doi.org/10.5194/egusphere-egu26-20844, 2026.

Please check your login data.