NP5.1 | Statistical methods in an era of AI in geosciences: forecasting, verification, and interpretability
EDI
Statistical methods in an era of AI in geosciences: forecasting, verification, and interpretability
Co-organized by AS5/HS13
Convener: Maxime Taillardat | Co-conveners: Philine BommerECSECS, Jieyu ChenECSECS, Sebastian Lerch, Romain PicECSECS, Sándor Baran, Stéphane Vannitsem
Orals
| Wed, 06 May, 16:15–18:00 (CEST)
 
Room -2.21, Thu, 07 May, 08:30–12:30 (CEST)
 
Room -2.15
Posters on site
| Attendance Wed, 06 May, 08:30–10:15 (CEST) | Display Wed, 06 May, 08:30–12:30
 
Hall X4
Orals |
Wed, 16:15
Wed, 08:30
This session explores forecasting in geosciences using statistical methods. Ranging from linear regression to the most advanced machine learning (ML) or artificial intelligence (AI) algorithms, the session welcomes all contributions developing and/or using these tools for various applications such as AI-based numerical weather prediction and nowcasting, time series forecasting in geosciences, forecast blending, statistical post-processing, and downscaling.
This session also welcomes contributions advancing the assessment of AI-based forecasts. Aiming at a proper and in-depth assessment of the strengths and weaknesses of AI-based models, the session will report on benchmarking activities, new verification methodologies, diagnostics of forecast realism, and progress in the interpretability of AI-based models.

This session is designed to foster interdisciplinary discussions among geoscientists from meteorology, climate, hydrology, and other related communities, promoting the use of statistical methods in forecasting, verification, and beyond.

Orals: Wed, 6 May, 16:15–08:30 | Room -2.21

The oral presentations are given in a hybrid format supported by a Zoom meeting featuring on-site and virtual presentations. The button to access the Zoom meeting appears just before the time block starts.
Chairpersons: Philine Bommer, Zied Ben Bouallegue, Jochen Broecker
16:15–16:20
16:20–16:40
|
EGU26-4103
|
solicited
|
Highlight
|
Virtual presentation
David Harrison, Israel Jirak, and Patrick Marsh

Artificial intelligence (AI) and machine learning (ML) tools are rapidly growing in capability and application across the weather enterprise.  Fully AI-based numerical weather prediction (NWP) emulators are beginning to outperform traditional NWP, and many weather agencies have started to adopt ML-derived guidance products into the forecast process.  For example, the United States National Weather Service’s Storm Prediction Center (SPC) has implemented a number of ML models to aid in the prediction and detection of tornadoes, severe wind, hail, and wildfires.  However, the development of these AI/ML products and their subsequent transition into SPC operations revealed several challenges which potentially slowed their overall adoption into the forecasters’ workflow.  This presentation will discuss several factors that impacted the adoption of AI/ML into forecast operations and highlight some best practices used by SPC to help streamline the research-to-operations transition.  Case studies of AI/ML projects that were successfully transitioned into SPC operations will help illustrate the application of these best practices and showcase some of the common pitfalls faced by AI/ML development for operational applications.

How to cite: Harrison, D., Jirak, I., and Marsh, P.: Lessons Learned from the Development and Implementation of AI Forecast Guidance at the U.S. National Weather Service’s Storm Prediction Center, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-4103, https://doi.org/10.5194/egusphere-egu26-4103, 2026.

16:40–16:50
|
EGU26-153
|
ECS
|
On-site presentation
Ilaria Luise, Savvas Melidonis, Julius Polz, Sorcha Owens, Timothee Hunter, Christian Lessig, and Michael Tarnawa

The next generation of machine learning (ML) weather and climate models is increasingly trained on a wide variety of datasets, including reanalyses, forecasts and observations . This diversity can typically not be handled by existing evaluation tools that are often limited to gridded data or fixed lead times Furthermore, many existing evaluation frameworks are developed internally by institutions, remain closed-source, and lack interoperability across platforms and high-performance computing (HPC) environments. This creates a gap in the ability to systematically assess model skill across different data streams, experiments, and computing infrastructures.

The WeGen FastEvaluation tool, developed within the WeatherGenerator project, aims to bridge this gap. It provides a flexible, open-source framework designed to evaluate machine learning–based weather prediction models across a wide range of dataset types and formats. Unlike most existing tools, WeGen FastEvaluation makes minimal assumptions about data structure, allowing consistent analysis of both gridded and unstructured inputs, deterministic and probabilistic outputs, and multiple forecast lead times. Built on xarray, the WeGenFastEvaluation supports multi-dimensional data handling, including probabilistic outputs and ensemble forecasts. The tool enables efficient computation of skill metrics and generation of 2D visualizations, allowing users to compare an arbitrary number of model runs across different data streams and forecast configurations.

The presentation will introduce the design and capabilities of the WeGen FastEvaluation, highlighting its integration within the WeatherGenerator workflow. Through examples, we demonstrate how the WeGen FastEvaluation tool enables consistent benchmarking, collaborative analysis across HPC systems, and reproducible ML-for-weather research.



How to cite: Luise, I., Melidonis, S., Polz, J., Owens, S., Hunter, T., Lessig, C., and Tarnawa, M.: WeGen FastEvaluation: An open-source tool for the evaluation and comparison of machine learning models in weather and climate applications, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-153, https://doi.org/10.5194/egusphere-egu26-153, 2026.

16:50–17:00
|
EGU26-1553
|
ECS
|
On-site presentation
Jakub Lewandowski, Leif Denby, and Andrew Ross

Nowcasting - the prediction of weather conditions over the next few hours - is critical for mitigating the impacts of severe convective storms. Machine learning offers new opportunities for improving nowcasting, particularly for convective precipitation, where traditional numerical models struggle. Yet, despite rapid progress in model development, evaluating these models remains a major challenge. Current verification practices typically rely on a narrow set of standard metrics that often fail to capture the complexity of atmospheric phenomena and cannot distinguish between different types of errors, providing limited insight into the specific weaknesses of the models.

This research introduces a comprehensive verification framework that combines carefully crafted datasets with sensitivity analyses, aiming to transform metric-based evaluation into a more informative process. Synthetic datasets are generated using ArtPrecip, a novel tool that randomly generates radar-like precipitation fields while allowing full control over properties such as motion, initiation, and evolution. Observational radar data are classified based on synoptic setting and observed precipitation properties, using different dimension-reduction methods. Sensitivity analyses examine how existing metrics respond to various error patterns, providing guidance on interpreting benchmark results.

The resulting system provides a well-defined and well-described set of benchmarks and enables reproducible, objective, and meaningful comparison of models. By addressing gaps in evaluation methodology, this work contributes to a more robust assessment of machine learning nowcasting skill and its applicability to severe weather forecasting.

How to cite: Lewandowski, J., Denby, L., and Ross, A.: Deriving meaning from metrics – a new approach for machine learning nowcasting verification, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-1553, https://doi.org/10.5194/egusphere-egu26-1553, 2026.

17:00–17:10
|
EGU26-21878
|
Virtual presentation
Amy McGovern, Taylor Mandelbaum, and Daniel Rotenberg

Properly evaluating AI and NWP models before deployment will help to ensure that the final models are trustworthy. Currently, most evaluation is done at a global scale, such as with WeatherBench, rather than focusing on high-impact events. While this global evaluation is important, it can obscure the results of how a model performs on high-impact events. For example, a heat wave may be poorly forecast by one model but the model may look promising overall when examining global Root Mean Squared Error. Only by examining specific case studies do we get the bigger picture of how the model performs on phenomena that impact humanity around the world.

We introduce Extreme Weather Bench (EWB), a new community driven benchmarking suite with almost 300 case studies of high-impact weather events across the globe. EWB facilitates model validation and verification on a variety of high-impact hazards that matter to people around the globe. EWB provides a standard set of case studies (spanning multiple spatial and temporal scales and different parts of the weather spectrum), observational data, impact-based metrics, and open-source code for users to evaluate their models. The case studies include tropical cyclones, atmospheric rivers, convective weather outbreaks, heat waves and major freeze events. To facilitate ease-of-use, EWB is distributed as a pure Python package, and integrates with either local data or data saved on the cloud.

EWB will help to drive the science forward for all weather models, enabling true comparisons across models and enabling people to evaluate their models on specific high-impact phenomena while diving deeply into case studies. EWB is a free open-source community-driven system and will be adding additional phenomena, test cases and metrics in collaboration with the worldwide weather and forecast verification community.

How to cite: McGovern, A., Mandelbaum, T., and Rotenberg, D.: ExtremeWeatherBench 1.0: A Flexible Evaluation Framework for Extreme Weather Events, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21878, https://doi.org/10.5194/egusphere-egu26-21878, 2026.

17:10–17:20
|
EGU26-7223
|
ECS
|
On-site presentation
Mehzooz Nizar, Reinhard Schiemann, Andrew G Turner, Kieran Hunt, and Steffen Tietsche

India relies on agriculture as one of its main sources of income. Therefore, reliable prediction of Indian summer monsoon
rainfall is crucial to the country’s policy making and development of crop management strategies. The recent development
of global AI Weather Prediction (AIWP) models has revolutionized weather forecasting. Owing to the very recent
emergence of AIWP models, their performance in simulating the Indian monsoon system is still insufficiently explored.
In this study, we verify the precipitation forecast skill of AIWP models GraphCast and FuXi at a lead time of 1-9 days
during Indian summer monsoon 2023 and compare their performance to the physics-based model ECMWF IFS-HRES
(IFS). Satellite-derived precipitation dataset IMERG is used as the ground truth to verify precipitation along with
ERA5 precipitation. Root mean squared error (RMSE), pattern correlation coefficient (PCC), structure (S)-amplitude
(A)-location error (L) and stable equitable error in probability space (SEEPS) were the metrics used to evaluate the
models.

A number of case studies, seasonal and intra-seasonal characteristics of precipitation forecast at various lead times were
analysed during June-September 2023. The case studies reveal that the AIWP models have lower RMSE and higher PCC
than IFS in general, while the AIWP models smoothen (positive S error) precipitation at longer leads. FuXi consistently
underestimates precipitation (negative A error) in the case studies. Analysing the daily mean rainfall for the country
as a whole and the precipitation bias at a lead time of 5 days, it is confirmed that FuXi shows a systematic dry bias in
forecasting monsoon rainfall. Non-parametric statistical tests were conducted to decide which model performs the best
at each metric in forecasting the entire season at various lead times. It is found that FuXi consistently achieved the
lowest RMSE, IFS delivered the best S, and GraphCast recorded the smallest SEEPS score at a lead time of 1, 5 and 9
days while no model shows a significant advantage in PCC, A and L. It was also seen that AIWP models outperformed
IFS in RMSE and PCC while AIWP models have larger S error than IFS corroborating the findings of case studies.
FuXi scored the largest A error across all lead times. The loss functions used to train AIWP models directly penalise
point-wise errors, which likely explains their RMSE advantage over IFS.

These results show us that even though AIWP models have good overall accuracy and correlation with observed precipi-
tation, exhibits a lack of realism in capturing the spatial distribution and the intensity of precipitation. Also, model skill
is metric dependent and choosing between an AIWP or physics-based model should hinge on the forecaster’s priority.

How to cite: Nizar, M., Schiemann, R., Turner, A. G., Hunt, K., and Tietsche, S.: How skilful are AI-based forecasts of 2023 Indian summer monsoon precipitation?, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7223, https://doi.org/10.5194/egusphere-egu26-7223, 2026.

17:20–17:30
|
EGU26-4550
|
ECS
|
On-site presentation
Leonardo Olivetti, Gabriele Messori, Paolo Avner, and Stéphane Hallegatte
Recent years have witnessed rapid advances in data-driven weather forecasting, with an ever-increasing number of AI-based models reporting skill comparable to or exceeding that of physical models. Comparing AI and physical forecasting systems, however, remains challenging: these models often exhibit a different set of strengths and weaknesses, making their real-world value strongly dependent on the specific application. Yet, most existing comparisons of AI and physical models focus exclusively on meteorological skill, largely overlooking the question of forecast value in real-world decision-making.
 
In this talk, we tackle this question by proposing an application-dependent framework to evaluate the real-world value of AI weather forecasts. The framework is based on the classical concept of relative economic value, which we extend in several novel ways to better reflect realistic use cases. Besides allowing for varying cost–loss ratios to represent different protection and forecast costs, we introduce flexible penalty functions to account for compounding losses from sequential forecast misses as well as declining user trust due to repeated false alarms.
 
We apply the framework to a number of case studies, comprising cities exposed to high economic losses from weather-related natural hazards. We show that forecast value in these contexts depends not only on forecast and prevention costs, but also on the choice of penalty function and on whether compound losses from repeated misses or false alarms are considered. We thus advocate for evaluating real-world value alongside meteorological skill when developing and comparing forecasting models, to ensure that improvements in predictive accuracy translate into meaningful societal and economic benefits.

How to cite: Olivetti, L., Messori, G., Avner, P., and Hallegatte, S.: From Forecast Skill to Forecast Value: Do AI Weather Forecasts Deliver Real-World Economic Benefits?, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-4550, https://doi.org/10.5194/egusphere-egu26-4550, 2026.

17:30–17:40
|
EGU26-7988
|
ECS
|
Virtual presentation
Angela Iza-Wong, Gabriel Moldovan, Zied Ben Bouallegue, Becky Hemingway, Matthew Chantry, and David A. Lavers

Accurate precipitation forecasting remains challenging, particularly in regions with complex terrain and sparse observational networks. This study evaluates precipitation forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF), the Integrated Forecasting System (IFS), and Artificial Intelligence/Integrated Forecasting System (AIFS) (ECMWF, 2024, 2025)​, including experimental models trained on the Integrated Multi-satellite Retrievals for GPM (IMERG) and Multi-Source Weighted-Ensemble Precipitation (MSWEP) datasets, the high-resolution (4km) model developed within the Destination Earth (DestinE) initiative (ECMWF et al., 2025)​, and the GraphCast model (Lam et al., 2022)​. The evaluation is based on 2 years of observational data (2023–2024) from 30 Ecuadorian weather stations in coastal and Andean regions and considers forecast lead times of 1-10 days. Throughout the evaluation period, AIFS exhibits the highest overall predictive skill, whereas DestinE is most effective at identifying extreme precipitation events. Most models display a marked positive bias, particularly within the Andean region. AIFS models trained on IMERG and MSWEP demonstrate the lowest bias and highest skill, as indicated by the Stable Equitable Error in Probability Space (SEEPS) ​(Rodwell et al., 2010)​ and the Equitable Threat Score (ETS). The Frequency Bias Index (FBI) decreases across all models as thresholds increase from the 90th to the 99th percentile, with consistently elevated FBI values observed over mountainous terrain. AIFS (IMERG) achieves the best overall performance, while GraphCast demonstrates the lowest skill in both total and mountainous regions. Overall, in the Ecuadorian tropics, AI-based models generally outperform physical models, except during extreme precipitation events, when physical models remain more reliable. These results underscore the critical importance of training data for AI-based systems and the ongoing challenges of forecasting high-impact precipitation across both operational and experimental models.

Keywords: Precipitation forecasting, artificial intelligence, ECMWF, GraphCast, Ecuador, extreme rainfall

References

ECMWF. (2024). IFS Documentation CY49R1 - Part I: Observations. In IFS Documentation CY49R1. ECMWF. https://doi.org/10.21957/fd16c61484

ECMWF. (2025). ECMWF’s AI forecasts become operational ECMWF. https://www.ecmwf.int/en/about/media-centre/news/2025/ecmwfs-ai-forecasts-become-operational

ECMWF, EUMETSAT, & ESA. (2025). Destination Earth (DestinE)-digital model of the Earth. https://destination-earth.eu/

Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsberger, P., Fortunato, M., Alet, F., Ravuri, S., Ewalds, T., Eaton-Rosen, Z., Hu, W., Merose, A., Hoyer, S., Holland, G., Vinyals, O., Stott, J., Pritzel, A., Mohamed, S., & Battaglia, P. (2022). GraphCast: Learning skillful medium-range global weather forecasting. http://arxiv.org/abs/2212.12794

Rodwell, M. J., Richardson, D. S., Hewson, T. D., & Haiden, T. (2010). A new equitable score suitable for verifying precipitation in numerical weather prediction. Quarterly Journal of the Royal Meteorological Society, 136(650), 1344–1363. https://doi.org/10.1002/qj.656

How to cite: Iza-Wong, A., Moldovan, G., Bouallegue, Z. B., Hemingway, B., Chantry, M., and Lavers, D. A.: Assessment of high-resolution physical and AI-based precipitation forecasts in the Ecuadorian Tropics, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7988, https://doi.org/10.5194/egusphere-egu26-7988, 2026.

17:40–17:50
|
EGU26-13910
|
On-site presentation
Sabrina Wahl

Current state-of-the-art artificial-intelligence weather prediction (AI-WP) systems are trained on a large archive of atmospheric reanalysis data. The training objective is to replicate the analysis at a future time step using the previous time steps. Loss functions guide the model to minimize the prediction error on known data. An analysis-based verification of forecasts derived from unseen data will reveal the strength and weaknesses of the AI-WP model in reproducing the statistical and dynamical characteristics of the underlying reanalysis.

In contrast, the development and fine-tuning of traditional physics-based numerical weather prediction (NWP) systems relies on verification against observations, with the aim of reducing discrepancies relative to various observational systems. This fundamental difference raises the question of what to expect when applying observation-based verification to AI-WP models that are trained on reanalysis rather than directly on observations.

Reanalysis datasets have well-known errors with respect to observations which are documented in literature. Consequently, observation-based verification of AI-WP systems will inherently reflect the observational error characteristics of the reanalysis. Deviations from this expectation are particularly informative: a larger error than that of the reanalysis may indicate deficiencies in emulation, whereas a smaller error raises the question of whether, and from where, additional information beyond the reanalysis has been obtained.

To address these questions, we apply the multiple correlation decomposition based on partial correlations introduced by Glowienka-Hense et al. (2020). This method decomposes the explained variance of two different datasets with respect to the same observations into a component of information contained in both datasets (shared explained variance) and the respective added values, i.e., information present in one dataset but not in the other. This decomposition enables quantification of the information transferred from the reanalysis into the forecasts and reveals potential deficiencies, or improvements relative the reanalysis, in the training process. Furthermore, it facilitates comparison of different forecasting systems in terms of there shared and unique information content. The method is demonstrated using 2m-temperature station observations and global deterministic AI-WP and NWP forecasts.

Glowienka-Hense et al. (2020): Comparing forecast systems with multiple correlation decomposition based on partial correlation, ASCMO, 6, 103–113, https://doi.org/10.5194/ascmo-6-103-2020

How to cite: Wahl, S.: Observation-based verification of AI weather prediction models: What can we expect?, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13910, https://doi.org/10.5194/egusphere-egu26-13910, 2026.

17:50–18:00
|
EGU26-11765
|
ECS
|
On-site presentation
Soufiane Karmouche, Linus Magnusson, Tim Hewson, and Thomas Haiden

Standard scores such as the root mean squared error provide limited insight into whether Machine-learning (ML) weather prediction systems reproduce the physically consistent dynamical structures that underpin high-impact weather. Here, we present a multi-faceted assessment of the physical realism of ECMWF’s Artificial Intelligence Forecasting System (AIFS), combining case-study diagnostics of severe extratropical storms with conditional verification based on large-scale circulation.

We first examine two North Atlantic storms: Storm Amy (October 2025) and Storm Eowyn (January 2025). Using diagnostics inspired by Charlton-Perez et al. (2024), we analyse frontal structure, vorticity, and surface and upper-air wind fields in AIFS-Single and AIFS Ensemble Control forecasts, benchmarked against the IFS Control and analysis. While ML systems capture storm tracks and large-scale frontal geometry well, they systematically smooth sharp gradients, compact vorticity cores, and localized wind maxima, leading to underestimation of extreme winds. Probabilistic training in the ensemble configuration improves realism but does not fully overcome these structural limitations.

We then present ongoing work assessing the physical consistency of ML forecasts using diagnostics of the ageostrophic-to-geostrophic wind ratio at multiple pressure levels. These reveal systematic differences between ML-based and physics-based models, particularly in dynamically active midlatitude regions.

Finally, we present regime-based verification results highlighting improved AIFS performance for 2-m temperature forecasts during persistent wintertime anticyclonic conditions, illustrating ML strengths in stable large-scale regimes where physics-based forecasts suffer from long-standing systematic biases.

Overall, our results highlight the importance of moving beyond general verification scores toward diagnostic and physically interpretable evaluation frameworks when assessing AI-based weather forecasts, especially for high-impact weather events.

This work is funded by the Destination Earth project.

REFERENCES:

Charlton-Perez, A.J., Dacre, H.F., Driscoll, S. et al. Do AI models produce better weather forecasts than physics-based models? A quantitative evaluation case study of Storm Ciarán. npj Clim Atmos Sci 7, 93 (2024). https://doi.org/10.1038/s41612-024-00638-w

How to cite: Karmouche, S., Magnusson, L., Hewson, T., and Haiden, T.: Assessing the physical realism of AI-based weather forecasts: insights from extratropical storms and large-scale flow diagnostics., EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11765, https://doi.org/10.5194/egusphere-egu26-11765, 2026.

Orals: Thu, 7 May, 08:30–12:30 | Room -2.15

The oral presentations are given in a hybrid format supported by a Zoom meeting featuring on-site and virtual presentations. The button to access the Zoom meeting appears just before the time block starts.
Chairpersons: Sebastian Lerch, Philine Bommer, Maxime Taillardat
XAI / verification and diagnostics (II)
08:30–08:40
|
EGU26-9685
|
ECS
|
Virtual presentation
Sudhanyasree Prasanna Ravikumar, Sakila Saminathan, and Subhasis Mitra

Precipitation forecasts generated by Numerical Weather Prediction (NWP) models often exhibit systematic biases arising from limitations in model resolution, representation of sub-grid-scale processes, and uncertainties in initial conditions. This study systematically assesses different predictor combinations (PC) obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) model to improve short-range precipitation forecasts using data-driven approaches over the peninsular Indian region. Different data-driven formulations, comprising of four machine learning (ML) models and two deep learning (DL) models, were implemented and systematically compared. Further, the different PCs and data driven formulations are evaluated and compared against the traditional Bayesian Model Averaging (BMA) approach, widely adopted for precipitation forecast enhancement. The improvement in precipitation forecast skill was assessed using standard deterministic and probabilistic verification metrics. The results indicate that incorporating exogenous predictor variables leads to a slight improvement in precipitation forecast skill, while DL models exhibit performance comparable to that of traditional ML models. Overall, the exogenous variable PC achieved higher forecast skill than other PCs and the traditional BMA, yielding an approximate 20% improvement in RMSE compared to 14% for the traditional BMA. Feature importance analysis revealed that total precipitation, wind speed, and 2-m temperature consistently ranked among the top five most influential variables across the different data driven formulations, underscoring the interpretability of the models.

How to cite: Prasanna Ravikumar, S., Saminathan, S., and Mitra, S.: Comparative Assessment of Predictor Variable Combinations within Data Driven Approaches for NWP based Precipitation Forecast Enhancement, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9685, https://doi.org/10.5194/egusphere-egu26-9685, 2026.

08:40–08:50
|
EGU26-8532
|
ECS
|
On-site presentation
Alex Schuddeboom, Christian Zammit, David Plew, Piet Verburg, and Aidin Jabbari

The ERA5-land Global Gridded Stochastic Weather Generator (EGGS-WG) model was released to the public last year as an open source and freely accessible stochastic weather generator. The purpose of this model is to provide an easy to use, low resource and modern Stochastic Weather Generator that can produce rainfall, air temperature and dew point temperature. This model offers several advancements over existing freely available stochastic weather generators, including the ability to simulate any terrestrial region of the planet, moving from a single site simulation approach to an entire gridded domain and increasing the temporal resolution of temperature simulation from daily to hourly.

Validation case studies have been performed over a range of different regions that represent substantially different climates. In general, EGGS-WG shows a strong ability to recreate the statistical behaviour seen in the ERA5-Land dataset. Precipitation occurrence rates and daily rainfall amounts are shown to be reproduced accurately by the model. Several different aspects of these variables are validated, including seasonality, spatial correlations and rainfall spells. While the general quality of the simulation is high, there are some clear issues in the simulation of the most extreme precipitation values, as well as some unique issues in consistently wet climates. Analysis of the air temperature and dew point temperature simulations shows stronger agreement. In particular, the spatial distributions and diurnal cycles of temperature are shown to be well simulated.

Many future developments have been planned that build on the released software package. Most prominent amongst these is the expansion of the simulated variables to include winds and radiation, which introduces a unique set of challenges due to the strong diurnal patterns and spatial organisation. Additionally, integrated support for CMIP6 driven future warming simulation is a high priority. These extensions are in various stages of development and are likely to be released over the coming year.

How to cite: Schuddeboom, A., Zammit, C., Plew, D., Verburg, P., and Jabbari, A.: Design, operation and validation of the ERA5-land Global Gridded Stochastic Weather Generator, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8532, https://doi.org/10.5194/egusphere-egu26-8532, 2026.

08:50–09:00
|
EGU26-17204
|
ECS
|
On-site presentation
Martin Bonte, Stéphane Vannitsem, and Lesley De Cruz

The variability in ensemble forecasts can either be generated dynamically - as is usually done with Numerical Weather Prediction (NWP) models -, stochastically or by using new approaches such as AI generative techniques. As these approaches are in their infancy for geophysical applications, the properties of the ensembles of generative models are still far from clear, especially if those models are to be used in operational activities. This aspect is investigated here for nowcasting models.

This work provides a predictability analysis over Belgium for the generative AI nowcasting model LDCast [1], as well as for the stochastic STEPS nowcasting algorithm (pysteps implementation [2]). Both models correctly estimate the error at almost all scales by means of their ensemble spread (i.e. good spread/error relationship), and they adapt the morphology of their ensembles depending on whether the event dynamics is convective or stratiform. Surrogate ensembles are also derived from the ensembles of STEPS and LDCast, and used as benchmarks with which to compare the spatial scores of the models. This reveals that both STEPS and LDCast ensembles struggle to provide added value for the spatial localization of the uncertainty associated with the growth and decay of rainfall. Therefore, STEPS and LDCast ensembles seem to be accurate statistically but not dynamically.

[1] Leinonen, J., et al. (2023). Latent diffusion models for generative precipitation nowcasting with accurate uncertainty quantification. arXiv preprint arXiv:2304.12891.

[2] Pulkkinen, S., et al. (2019). Pysteps: an open-source python library for probabilistic precipitation nowcasting (v1.0). GMD, 12(10):4185–4219.

How to cite: Bonte, M., Vannitsem, S., and De Cruz, L.: Dynamical evaluation of the error representation in the generative AI nowcasting model  LDCast, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-17204, https://doi.org/10.5194/egusphere-egu26-17204, 2026.

09:00–09:10
|
EGU26-13223
|
ECS
|
On-site presentation
Britta Seegebrecht, Sabrina Wahl, Stefanie Hollborn, Erik Pavel, Wael Almikaeel, Michael Langguth, Martin Schultz, Christian Lessig, Ilaria Luise, Juergen Gall, Anas Al-Iahham, and Mohamad Hakam Shams Eddin

Data-driven weather prediction models based on artificial intelligence (AI) have rapidly advanced in recent years and are frequently reported to outperform traditional physics-based numerical weather prediction (NWP) models for selected verification scores. However, optimization with respect to a specific loss function can adversely affect other metrics, potentially leading to unrealistic forecast characteristics, such as overly smooth spatial structures when mean-squared or mean-absolute error–based loss functions are used.

A robust and meaningful comparison of AI-based and NWP models therefore requires a carefully chosen and diverse set of verification metrics that accounts for potential dependencies. The main focus is placed on the prominent forecast accuracy-activity tradeoff, associated with the double penalty problem of deterministic forecasts. Related questions include: How sensitive is the relationship between accuracy and activity metrics to the choice of verification measure? Are there systematic differences between AI-based and NWP models? What is the impact of the (in)dependence between the AI training loss function and the verification metrics on the assessment of forecast skill?

These questions are addressed using both scale-independent and scale-dependent verification metrics, allowing the quantification of forecast performance on individual spatial scales.

As a starting point, global deterministic forecasts are considered. The analysis is partly based on forecasts from the Weather Prediction Model Intercomparison Project (WP MIP), which provides a collection of NWP and AI-model forecasts from multiple national weather services and research institutions.

The work is conducted within the RAINA project, which aims to develop a foundation model for the atmosphere with a particular focus on reliable, high-resolution forecasts of extreme wind and precipitation events. Consequently, the relation between, e.g., forecast activity and the predictive capability for extreme weather are of special interest.

How to cite: Seegebrecht, B., Wahl, S., Hollborn, S., Pavel, E., Almikaeel, W., Langguth, M., Schultz, M., Lessig, C., Luise, I., Gall, J., Al-Iahham, A., and Shams Eddin, M. H.: Scale-dependent analysis of the accuracy–activity trade-off in AI weather forecasts, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13223, https://doi.org/10.5194/egusphere-egu26-13223, 2026.

09:10–09:15
09:15–09:25
|
EGU26-12781
|
On-site presentation
Dominique Brunet, Laura Huang, Jonathan Belletête, Ahmed Mahidjiba, and Sudesh Boodoo

Recent research and development at Environment and Climate Change Canada has been conducted on improving the current operational radar precipitation nowcasting by transitioning from an optical flow method (Farnebäck smoothed) to machine learning (ML)-based nowcasts. Two ML-based nowcasting models were trained on the Canadian radar composite: RainNet, a convolutional neural network based on the U-Net architecture, and NowcastNet, which combines a Generative Adversarial Network with an Evolution Network to explicitly model precipitation dynamics. Verification of radar precipitation nowcasts revealed that the optimal method depends on both lead time and precipitation threshold. RainNet performed best for low precipitation thresholds (0.1-1 mm/h) at all lead times, highlighting its ability to capture widespread, weak precipitation, while NowcastNet outperformed the others at longer lead times (beyond one hour) and for higher precipitation thresholds (4+ mm/h). Farnebäck smoothed remained the most skillful for nowcasting heavy precipitation (12+ mm/h) during the first hour, likely due to its robust short-term motion estimation. 

Building on these results, we propose a Lagrangian blending method that optimally combines the predicted motion paths and the growth and decay of precipitation intensity components of the different nowcasting methods.  While optical flow methods assume constant motion and intensity evolution, ML-based methods produce time-varying motion vectors and precipitation intensities, which are explicitly leveraged in the blending framework. For deterministic nowcasts, we apply a bias-correction followed by the blending of both motion paths and intensity, allowing the generation of time-evolving blended motion fields with growth and decay.  

We also generate probabilistic nowcasts of precipitation occurrence (0.1 mm/h) and extreme precipitation (50 mm/h) by determining the optimal spatial smoothing for each model and lead time based on the area under the ROC curve. We then calibrate the resulting probabilities using isotonic (i.e. monotonically increasing) regression. Experiments are conducted using both static and dynamically varying weighting strategies for both deterministic and probabilistic radar precipitation nowcasting. The goal is to produce a blended and post-processed nowcast that outperforms each individual method across all lead times and precipitation thresholds. 

How to cite: Brunet, D., Huang, L., Belletête, J., Mahidjiba, A., and Boodoo, S.: A Lagrangian blending of optical flow and ML-based radar precipitation nowcasts , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12781, https://doi.org/10.5194/egusphere-egu26-12781, 2026.

09:25–09:35
|
EGU26-4804
|
On-site presentation
ying zhang, ziming zou, and yurong liu

Multi-step time series forecasting is a fundamental problem across geoscientific applications, including meteorology, hydrology, climate analysis, and space and environmental sciences. A persistent challenge in such tasks is the progressive degradation of predictive accuracy as the forecast horizon increases. This phenomenon is primarily driven by the accumulation and temporal propagation of forecast errors, while most existing statistical and machine learning models lack explicit mechanisms to characterize, model, and correct the evolving dynamics of horizon-dependent residuals.

To address this limitation, we propose an adaptive error post-processing framework termed the Adaptive Residual Decay Mechanism (ARDM). ARDM is designed as an end-to-end predictive optimization strategy that enhances forecasting stability, robustness, and generalization across diverse temporal patterns and application scenarios. Rather than modifying the internal structure of forecasting models, ARDM operates as a residual-aware modification layer that can be seamlessly integrated with a wide range of statistical and machine-learning-based forecasting pipelines.

The proposed framework systematically integrates data preprocessing, initial multi-step forecasting, residual sequence construction, residual dependency modeling, dynamic error modification, and final output refinement. By explicitly constructing residual time series from preliminary forecasts, ARDM captures both short-term and long-term temporal dependencies in forecast errors, enabling structured modeling of error evolution across lead times. Within a symmetrical residual modeling architecture, a time-sensitive adaptive decay function is introduced to dynamically estimate and correct horizon-dependent forecast errors, allowing error adjustments to evolve consistently with increasing prediction horizons.

The decay function and its parameters are optimized through a joint multi-metric loss formulation evaluated across geoscientific and cross-domain time series forecasting datasets. This optimization strategy balances sensitivity to error magnitude with robustness to directional deviations, ensuring stable and reliable post-processing behavior, particularly for longer-range forecasts. Furthermore, ARDM systematically exploits historical residual information during the observation phase, enabling horizon-aware and dynamically consistent refinement of prediction errors through structured residual dependencies without increasing model complexity.

Extensive experiments conducted on multiple real-world geophysical time series datasets, including representative geomagnetic indices, demonstrate that ARDM consistently outperforms mainstream baseline statistical and machine learning methods across a range of standard evaluation metrics, including MAE, MSE, RMSE, MAPE, SSE, and the index of agreement (IA). Performance improvements are especially pronounced at longer prediction horizons, highlighting ARDM’s effectiveness in mitigating error accumulation in multi-step forecasting of geophysical processes. These results suggest that residual-aware, horizon-adaptive statistical post-processing provides a powerful and flexible pathway for improving the reliability of geophysical time series forecasting, with direct relevance to space weather and broader Earth system applications.

How to cite: zhang, Y., zou, Z., and liu, Y.: ARDM: Adaptive Residual Decay Mechanism for Dynamic Error Modification in Geophysical Time Series Forecasting, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-4804, https://doi.org/10.5194/egusphere-egu26-4804, 2026.

09:35–09:45
|
EGU26-19403
|
ECS
|
On-site presentation
Shin-Hau Chen and Chung-Chieh Wang

Uncertainty remains a major challenge in typhoon rainfall forecasting over Taiwan, even when cloud-resolving numerical weather prediction models are employed. Individual forecasts often exhibit large variability in rainfall amount and spatial distribution, particularly at long lead times, while their credibility is generally unknown at forecast time.

This study presents a machine learning–based framework for the a priori diagnosis of uncertainty in typhoon rainfall forecasts. Approximately fifteen years of cloud-resolving regional model forecasts and corresponding precipitation observations are used to quantify forecast quality through a similarity skill score (SSS), which measures the spatial agreement between forecasted and observed accumulated rainfall during the typhoon impact period. The machine learning model is designed to predict the future SSS of individual forecasts using only information available at forecast time, including diagnostics from the regional model and large-scale environmental and track-related predictors derived from global forecasts.

To ensure robust evaluation, the dataset is split by independent typhoon cases and time periods to avoid information leakage. Preliminary analyses suggest that the proposed approach can capture variations in forecast credibility, with forecasts predicted to have high SSS exhibiting a substantially higher likelihood of achieving high observed SSS.

Rather than improving rainfall forecasts themselves, this study focuses on statistical post-processing and uncertainty diagnosis, demonstrating the potential of machine learning as an objective tool for assessing the credibility of high-resolution typhoon rainfall forecasts.

How to cite: Chen, S.-H. and Wang, C.-C.: A Priori Diagnosis of Uncertainty in Cloud-Resolving Typhoon Rainfall Forecasts over Taiwan Using Machine Learning, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19403, https://doi.org/10.5194/egusphere-egu26-19403, 2026.

09:45–09:55
|
EGU26-8449
|
On-site presentation
Charles Jones, Callum Thompson, David Siuta, Nathan Quinn, and Nicholas Sette

California is prone to extreme fire weather conditions characterized by high winds, elevated temperatures, and low humidity. Accurate predictions with high spatial resolution are critical for emergency operations to monitor and respond to fast-spreading wildfires. While current operational numerical weather prediction models, such as the NOAA Global Forecasting System GFS model, offer reliable probabilistic forecasts in the medium range (up to 15 days), their coarse spatial resolution (typically 0.25° latitude/longitude, ~25 km) limits their utility for localized fire risk assessment. This resolution is insufficient for capturing terrain-driven wind patterns and microclimate variations that drive fire behavior, especially in complex topography regions like the wildland–urban interface.

High-resolution probabilistic forecasts of fire weather conditions are generated by downscaling GFS ensemble outputs from a native resolution of 0.25° latitude/longitude to 1.5 km horizontal grid spacing over a domain encompassing California and Nevada. The downscaling framework integrates singular value decomposition (SVD), UNet-based convolutional neural networks, and diffusion models to capture both large-scale variability and fine-scale terrain-driven features. Models are trained using GFS initial conditions (00 UTC) and paired with 1.5 km Weather Research and Forecasting (WRF) simulations spanning the period 2015–2020. To evaluate forecast skill, ten high-impact case studies characterized by strong wind events in the Sierra Nevada and Southern California are analyzed. Probabilistic predictions of surface air temperature, relative humidity, and wind speed are validated against surface meteorological observations. The study includes a discussion of forecast skill metrics, operational applications, and ongoing research directions.

How to cite: Jones, C., Thompson, C., Siuta, D., Quinn, N., and Sette, N.: High-resolution Probabilistic Forecasts of Fire Weather Conditions in California using Downscaling Machine Learning Models, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8449, https://doi.org/10.5194/egusphere-egu26-8449, 2026.

09:55–10:05
|
EGU26-19639
|
Virtual presentation
Guy Even, Andreas Karrenbauer, Rex Lei, Jonatan Ostrometzky, and Christian Sohler

Commercial microwave links (CMLs) are part of the infrastructure of wireless networks.  Their measured attenuations have been studied as an opportunistic source for monitoring spatiotemporal rainfall and other atmospheric phenomena. CML attenuation measurements can enhance the spatiotemporal accuracy and resolution of existing weather monitoring instruments. In addition, they serve as stand-alone weather monitoring devices in places where dedicated weather monitoring devices are scarce or do not exist.

Current techniques for 2D rainfall map reconstruction usually reduce CML measurements to virtual rain-gauges (i.e., point measurements) and rely on interpolation techniques such as inverse distance weighting or Kriging. While effective in many scenarios, these methods are suboptimal because they do not address the mis-modeling due to the reduction from a link-path attenuation integration to a single point rain-intensity measurement.

In this study, we revisit the rainfall map reconstruction problem from CML signal attenuation measurements as a principled optimization approach. We formulate the problem of the partial-to-complete field reconstruction as a physics-informed optimization problem. The reconstructed rainfall field is quantized and represented by pixel-rainfall variables whose values are constrained to agree with the observed CML signal attenuations. The resulting solution minimizes a weighted sum of the attenuation errors along the links, spatial differences between neighboring pixels, and the total rainfall in all the pixels of the map.

To evaluate our approach, we create a benchmark of hundreds of rainfall maps and CML locations and attenuations.
Rainfall maps are algorithmically extracted by identifying rain events in EURADCLIM rain maps (the European climatological high-resolution gauge-adjusted radar precipitation dataset). We identify rain events consisting of patches of about 50x50 km² over various terrain types and rain patterns.
We overlay CMLs on each patch using the free ``Four-year commercial microwave link dataset for the Netherlands'' (publicly available in the 4TU.ResearchData platfrom).
We then apply the ITU-R P.838 model at a pixel level to compute the CML attenuations based on the rainfall to obtain noiseless attenuation measurements.

We apply the inverse optimization procedure to the CML attenuations to reconstruct the rainfall maps. The accuracy of the reconstructed rainfall map is evaluated and compared with the inverse distance weighting approach.
Overall, this study reframes rainfall reconstruction from opportunistic sensing networks as a well-posed inverse problem with an explicit objective function.
Our reconstruction framework can also assist in explaining AI-solutions in the absence of ground truth.

How to cite: Even, G., Karrenbauer, A., Lei, R., Ostrometzky, J., and Sohler, C.: Discrete Learning Algorithms for Precipitation Estimation from Commercial Microwave Links, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19639, https://doi.org/10.5194/egusphere-egu26-19639, 2026.

10:05–10:15
|
EGU26-8733
|
On-site presentation
Tae-Jin Oh, In-Chae Na, and Woo-Yeon Park

This study outlines the development of an artificial intelligence (AI)-based precipitation forecasting system at the Korea Institute of Atmospheric Prediction Systems (KIAPS). The system is designed with three main components:  an observation-based model for very short-term forecasting (nowcasting), a post-processing model to correct numerical weather prediction (NWP) fields for longer lead times, and a hybrid model to integrate these approaches which is to be built. The nowcasting model utilizes a U-Net architecture incorporating ConvLSTM at the bottleneck. It uses radar and satellite data sequences to produce 6-hour forecasts; the training strategy involves pretraining on radar/satellite data followed by fine-tuning with 1-hour accumulated rainfall gauge data from Automatic Weather Stations (AWS). The post-processing model employs a ConvNeXt v2 U-Net to correct Korea Integrated Model (KIM) NWP fields for forecasts up to 24 hours. Performance evaluations show that the observation-based model excels at shorter lead times with 34% improvement in the Critical Success Index (CSI) for precipitation exceeding 8 mm/hr, averaged over the 1–6 hour forecast period, compared to the baseline KIM forecast. Meanwhile, the post-processing model, which incorporates a differentiable CSI loss function for robust heavy precipitation forecasting, averaged over the 24 hour forecast period, achieves 31% CSI improvement relative to KIM with reduced performance degradation at longer lead times. Future work will focus on developing the hybrid model to merge these outputs for optimal accuracy across all forecast lead times.

How to cite: Oh, T.-J., Na, I.-C., and Park, W.-Y.: Development of AI-based precipitation forecasting at KIAPS, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8733, https://doi.org/10.5194/egusphere-egu26-8733, 2026.

Coffee break
Chairpersons: Jieyu Chen, Stéphane Vannitsem, Sándor Baran
10:45–10:55
|
EGU26-10414
|
On-site presentation
Marko Laine, Leila Hieta, Tuukka Tuukka Himanka, Mikko Partio, and Olle Räty

Advances in data-driven artificial intelligence (AI) weather models are transforming how national meteorological services produce forecasts. The Finnish Meteorological Institute (FMI) has developed Aila, a regional AI model inspired by Met Norway's Bris AI model and built using the Anemoi framework - an open European initiative that integrates machine learning techniques with meteorology. Aila has been trained on 40 years of European Centre for Medium-Range Weather Forecasts (ECMWF) global historical ERA5 reanalysis data and about three years of high-resolution Harmonie analyses over the Scandinavian region, utilizing the computational power of the LUMI supercomputer. The model's graph-based neural network architecture enables enhanced spatial resolution and improved representation of atmospheric processes over Northern Europe. 

This study focuses on evaluating Aila's performance during cold winter conditions in Finland, a key challenge for numerical weather prediction models. Prolonged low-temperature episodes are often governed by persistent high-pressure systems and strong temperature inversions that prove difficult to forecast accurately. Using case studies from recent winters, we evaluate Aila’s skill in forecasting 2-meter temperatures during cold spells by comparing its predictions against FMI's operational forecast products and observations.

The results demonstrate that the AI-based Aila model achieves competitive accuracy in temperature forecasts during challenging cold weather conditions while providing substantial computational efficiency compared to traditional numerical approaches. Future development efforts will focus on implementing a multi-decoder approach where the Aila model will be fine-tuned using observational data to better capture extreme cold temperatures and improve forecast reliability.

How to cite: Laine, M., Hieta, L., Tuukka Himanka, T., Partio, M., and Räty, O.: Forecasting Cold Winter Temperatures in Finland with the Aila AI Weather Model, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-10414, https://doi.org/10.5194/egusphere-egu26-10414, 2026.

10:55–11:05
|
EGU26-4713
|
ECS
|
On-site presentation
Yang Xiao and Paula Moraga

Accurate real-time tracking of infectious diseases is often challenged by reporting delays. Existing nowcasting methods typically struggle with three major limitations: they either (1) oversimplify complex reporting delays; (2) ignore spatial connections by treating regions separately; or (3) are too computationally expensive when handling detailed spatio-temporal data, making them impractical for real-time use.

To solve these issues, we propose a flexible Bayesian spatio-temporal framework that incorporates a delay adjustment structure, allowing the framework to adapt to changing reporting behaviors while effectively capturing spatial dependencies. To ensure this complex model is fast enough for real-time applications, we implement it via inlabru using a novel linear approximation strategy. This method significantly improves computational efficiency, enabling scalable inference without the speed bottlenecks of traditional MCMC methods.

We validate the framework by monitoring dengue in Brazilian states during 2025. Our model outperforms the baseline model in 22 out of 26 states (85\% win rate), successfully capturing rapid trend shifts and providing more precise estimates compared to existing systems.

Our findings demonstrate that combining detailed delay dynamics with a spatio-temporal structure effectively balances model flexibility with computational speed. This offers a robust, scalable solution for monitoring epidemics in diverse geographical regions.

How to cite: Xiao, Y. and Moraga, P.: Bayesian spatio-temporal disease nowcasting using parametric time-varying functions of cumulative reporting probability, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-4713, https://doi.org/10.5194/egusphere-egu26-4713, 2026.

11:05–11:15
|
EGU26-12477
|
On-site presentation
Tamas Bodai

I present a data-driven forecast system applied to the Indian summer monsoon rain. By forecasting pentads, 5-day rain totals, the system is well suited to forecasting the monsoon onset/withdrawal as well as its progression, also known as intra-seasonal variability. I will provide a comparison of the forecast skill with those of other systems, both physics-based NWP and AI systems. The skill of the JJA seasonal forecast issued on 1 May in terms of the Pearson correlation coefficient far surpasses that of GLOSEA5. I will also discuss delicate questions about forecast skill, as to what is concepotually sound and what can be computed.

How to cite: Bodai, T.: Data-driven seasonal weather forecast: An application to the Indian summer monsoon rain, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12477, https://doi.org/10.5194/egusphere-egu26-12477, 2026.

11:15–11:25
|
EGU26-17711
|
ECS
|
On-site presentation
Tobias Biegert, Nils Koster, and Sebastian Lerch

In recent years, significant progress in machine learning technologies has enabled the development of various artificial intelligence weather prediction (AIWP) models, approaching, or even surpassing the skill of numerical weather prediction (NWP) models.

However, despite these advancements, several important questions remain open. Most data-driven models primarily focus on deterministic point forecasts and lack the capability to generate probabilistic predictions, which, however, is crucial for optimal decision making and quantifying weather risk in applications. Further, while it has been widely demonstrated that physics-based NWP models substantially benefit from post-processing methods, which aim to correct systematic errors, the use of post-processing for data-driven weather models has not been explored in detail.

Our overarching aim thus is to investigate the application of various post-processing techniques to potentially improve predictions, as well as to generate probabilistic forecasts from deterministic AIWP as well as NWP model outputs. We assess whether AI-based weather models benefit from post-processing to a similar extent as physics-based NWP, enabling a fair comparison between post-processed AIWP and NWP forecasts. The resulting post-processed AIWP forecasts also yield a relatively simple probabilistic benchmark for evaluating whether inherently probabilistic AIWP models deliver commensurate skill improvements given their increased computational cost.

Experiments are based on the WeatherBench 2 framework, which provides a standardized archive of prominent AIWP as well as operational NWP model outputs. Specifically, we apply a suite of established statistical and machine learning post-processing methods to model outputs for the eight variables defined as headline scores (Z500, T850, Q700, WV850, T2M, WS10, MSLP, TP24hr) in the WeatherBench 2 framework, and systematically evaluate the effectiveness of these methods for improving deterministic and probabilistic forecasts.

Results show that post-processed probabilistic forecasts can outperform the ensemble predictions from the European Centre for Medium-Range Weather Forecasts for shorter lead times of up to one week for selected variables, but the results vary across variables, lead times, post-processing methods and forecasting models.

How to cite: Biegert, T., Koster, N., and Lerch, S.: Probabilistic Benchmarks and Post-Processing for Data-Driven Weather Forecasting, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-17711, https://doi.org/10.5194/egusphere-egu26-17711, 2026.

11:25–11:35
|
EGU26-5760
|
ECS
|
Virtual presentation
M'hamed Oubouisk, Driss Bari, and Soumia Mordane

Fog and low stratus forecasting remains a challenge due to the high sensitivity of these phenomena to boundary layer processes. One-dimensional models, such as COBEL–ISBA, offer physical consistency but often lead to systematic errors in key surface variables. This work proposes a novel hybrid calibration framework combining physical modeling with machine learning (ML) to correct COBEL–ISBA forecasts at Nouasseur Airport, Morocco. Using two winter seasons of model outputs and SYNOP observations, we calibrate five variables (2-m temperature and humidity, 10-m wind components, visibility) for each forecast run and lead time (0–12 h).

Two ML architectures are tested: direct correction (ML–COBEL) and residual-learning approach (ML–Phys) using Random Forest and XGBoost. For visibility, a two-stage classification-regression model is implemented, and an oversampling technique is used to address class imbalance. Results are benchmarked against classical bias correction and quantile mapping.

The ML–Phys approach outperforms traditional methods across all variables and lead times, reducing errors (bias, RMSE) while preserving observed temporal variability. Furthermore, it improves also low-visibility event detection. In contrast, traditional methods show limited skill, often degrading beyond short lead times. This work demonstrates the potential of hybrid AI-physics strategies to mitigate 1D model limitations, providing a path toward more reliable operational fog and visibility forecasting.

How to cite: Oubouisk, M., Bari, D., and Mordane, S.: Hybrid AI-Physics Calibration of a 1D Fog Model: Improving Near-Surface and Visibility Forecasts at a Moroccan Airport, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5760, https://doi.org/10.5194/egusphere-egu26-5760, 2026.

11:35–11:45
|
EGU26-11641
|
ECS
|
On-site presentation
Andrejs Cvečkovskis, Juris Seņņikovs, and Uldis Bethers

Forecasting of local renewable energy variables such as solar irradiance and wind speed is critically important for operational grid management and energy markets. We present a hybrid machine learning model that combines Adaptive Fourier Neural Operator (AFNO) architectures with physics-informed loss constraints, designed to capture both learned spatial–temporal patterns and key physical relationships in atmospheric fields. The model is trained on reanalysis and high-resolution observational datasets over the Baltic region and evaluated in comparison with baseline statistical and numerical weather prediction benchmarks.

Our contributions include: (i) a hybrid modelling strategy that enforces approximate physical consistency via penalised residuals of key balance equations during training; (ii) a detailed benchmarking framework for lead-time dependent forecast skill on solar and wind energy generation targets; and (iii) an assessment of uncertainty and calibration properties using probabilistic scoring metrics. Results are evaluated against numerical weather prediction baselines, highlighting the strengths and limitations of the hybrid approach and outlining a viable pathway for future improvements in sub-daily renewable energy forecasting.

This work contributes to the session’s themes of advanced machine learning and statistical forecasting methods in geosciences and highlights the potential of hybrid approaches for enhancing short-term predictive skill.

How to cite: Cvečkovskis, A., Seņņikovs, J., and Bethers, U.: Hybrid Neural Operator and Physics-Informed Learning for Renewable Energy Forecasting, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11641, https://doi.org/10.5194/egusphere-egu26-11641, 2026.

11:45–11:55
|
EGU26-20150
|
On-site presentation
Joris Van den Bergh and Geert Smet

As a proper score, the continuous ranked probability score (CRPS) is widely used within the field of statistical postprocessing of ensemble forecasts, both for forecast verification and as a loss function for parameter estimation with distributional regression approaches. This includes standard ensemble model output statistics (EMOS) and machine learning (ML) based approaches such as distributional regression networks (DRN). It is known that the CRPS admits equivalent representations as an integral of the Brier score over probability thresholds or an integral of the quantile score over quantile levels. The CRPS can be further generalized with a weighting function to put more weight on certain regions of the predictive distribution (the threshold-weighted CRPS or twCRPS), or to put more weight on certain quantiles of the distribution (quantile-weighting, denoted qwCRPS). In this work, we consider a general 2-parameter class of weight functions that give rise to an analytical expression for the qwCRPS for certain predictive distributions such as the logistic distribution. This generalized version of the CRPS puts a different penalty on over- or underforecasting the meteorological variable, allowing tailored postprocessing for end users with specific cost-loss ratios. We apply a DRN approach using the qwCRPS as loss function to various use cases, including the postprocessing of wind power forecasts for the Belgian Offshore Zone, and compare with the use of the standard CRPS as loss function. We also perform validation using the quantile score and the continuous generalisation of the relative economic value.

How to cite: Van den Bergh, J. and Smet, G.: Tailored postprocessing of ensemble forecasts with distributional regression networks and a quantile-weighted version of the continuous ranked probability score, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-20150, https://doi.org/10.5194/egusphere-egu26-20150, 2026.

11:55–12:05
|
EGU26-13365
|
On-site presentation
Alice Lake

As meteorological organisations transition to high-resolution ensemble-based forecasting, they risk leaving behind downstream users who rely on deterministic data: a need that may arise from the inability to process large volumes of data, or difficulty integrating probabilistic information into decision-making processes. Current solutions for such users typically involve providing the control (unperturbed) member of the ensemble, or deriving a single-value forecast through the independent treatment of variables (e.g., taking a median). However, relying solely on the control member discards the valuable information encoded within the full ensemble, fundamentally undermining the purpose of the ensemble. Meanwhile, univariate approaches can result in forecasts that lack physical consistency across variables. This limitation becomes critical when variables are interpreted jointly in real‑world decision‑making. Wind speed and direction exemplify this: these variables are used together in sectors such as renewable energy, where they inform turbine operation and resource planning, and aviation, where they underpin safety‑critical decisions around take‑off and landing. For these users, unrealistic combinations of speed and direction can translate directly into flawed risk assessments. 

  

To address this gap, we present a novel ensemble post-processing technique that generates physically-consistent spot forecasts of wind speed and direction by exploiting the full ensemble distribution. The method constructs joint predictive probability density functions (PDFs) using a Gamma kernel for wind speed and a von Mises kernel for wind direction, accommodating the distinct statistical properties of these variables: non-negativity and skewness for speed, and circularity for direction. A single-value forecast is then obtained by selecting the ensemble member that maximizes its log-likelihood under the joint density across a specified forecast horizon. Because the selected forecast corresponds to one of the original ensemble members, it represents a physically plausible atmospheric state and maintains consistency across all variables, including those not directly analysed. This is critical for operational users: approaches that treat wind speed and direction separately (such as taking independent averages or applying separate post-processing to each variable) can produce unrealistic artefacts when passed through downstream physical or statistical models.  

  

This method was evaluated using the Met Office convective-scale ensemble, MOGREPS-UK, over the UK domain for a full calendar year, with verification at both the surface and aloft. Results are promising: the approach demonstrates the potential to outperform the control member, particularly at longer leadtimes where ensemble spread is greatest. These findings highlight an important step toward improving our offering to users and ensuring they remain supported as we transition to purely ensemble-based forecasting. Crucially, this work is not just theoretical; the next stage is to embed the technique into operational workflows and deliver it within user-facing products, ensuring these advances translate directly into improved real-world decision-making. 

How to cite: Lake, A.: Joint Forecasting of Wind Speed and Direction via Ensemble Post-Processing, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13365, https://doi.org/10.5194/egusphere-egu26-13365, 2026.

12:05–12:15
|
EGU26-2357
|
On-site presentation
Candan Gokceoglu and Ahmet Ozcan

Rapid population growth and the continuous restructuring of economic relationships have significantly increased global demand for efficient transportation systems. In this context, accurate prediction of the Rate of Penetration (ROP) of the Tunnel Boring Machine (TBM) is crucial for construction planning, cost estimation, and real-time operational decision-making in TBM tunneling. When TBMs are appropriately selected in terms of type and capacity according to route conditions and are operated in compliance with sound engineering principles, they enable the excavation of tunnels at very high rate of penetration while maintaining economic feasibility. Estimating tunnel completion time based on geological and geotechnical conditions along the tunnel alignment and the operational capacity of the TBM has been one of the most intensively studied topics in tunneling research over the past two decades. However, recent advances in artificial intelligence (AI) techniques offer significant potential for achieving higher predictive performance in ROP estimation. In light of these developments, this study evaluates the performance of various AI algorithms using data obtained from the T2 tunnel of the Bahçe–Nurdağ (Türkiye) twin tunnels, the longest railway tunnels in Türkiye. In addition, synthetic input parameters were generated to enhance prediction accuracy beyond that achieved in previous studies. The results demonstrate that incorporating these synthetic input parameters leads to improved model performance, with an increase of up to 2.65% in terms of the correlation coefficient. Given the already high predictive capability achieved without synthetic inputs (R² = 0.8637), the improvement obtained in this study (R² = 0.8866) is particularly noteworthy. Overall, the findings indicate that ensemble-based artificial intelligence models incorporating synthetic input data can predict ROP of TBM with very high accuracy, thereby offering a robust and reliable tool for estimating tunnel completion times in TBM tunneling projects.

How to cite: Gokceoglu, C. and Ozcan, A.: Use of Synthetic Input Parameters for Enhancing Prediction Performance of Rate of Penetration of TBM , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-2357, https://doi.org/10.5194/egusphere-egu26-2357, 2026.

12:15–12:25
|
EGU26-17479
|
On-site presentation
Luca Glawion, Julius Polz, Harald Kunstmann, Benjamin Fersch, and Christian Chwala

Global reanalysis products such as ERA5 are indispensable for climate and hydrological studies, yet their coarse spatial and temporal resolution limits the representation of localised and short-lived precipitation extremes. Building on our earlier work [1], we now present the published and ready-to-use version of spateGAN-ERA5, a generative AI framework for global spatio-temporal downscaling of ERA5 precipitation to kilometre and sub-hourly scales (2 km, 10 min) [2].

The model, trained using gauge-adjusted radar observations over Germany, generates realistic high-resolution precipitation ensembles conditioned on ERA5 inputs. We demonstrate robust performance across multiple climate regimes through independent evaluations over Germany, the United States, and Australia, showing clear improvements in spatial structure, temporal coherence, and extreme rainfall representation compared to native ERA5 fields. Ensemble generation further enables probabilistic uncertainty quantification.

To facilitate broad adoption, we provide a public, easy-to-use downscaling tool [3] that enables on-demand generation of high-resolution precipitation for any region and time period worldwide. The approach is computationally efficient and applicable on modest GPU hardware, making it suitable for both regional studies and large-scale applications. spateGAN-ERA5 thus establishes a practical pathway toward global high-resolution precipitation products for climate impact analysis, hydrological modelling, and AI-based weather and climate research.

[1] Glawion, L., Polz, J., Kunstmann, H., Fersch, B., & Chwala, C. (2023). spateGAN: Spatio‑temporal downscaling of rainfall fields using a cGAN approach. Earth and Space Science, 10, e2023EA002906. https://doi.org/10.1029/2023EA002906

[2] Glawion, L., Polz, J., Kunstmann, H., Fersch, B., & Chwala, C. (2025). Global spatio‑temporal ERA5 precipitation downscaling to km and sub‑hourly scale using generative AI. npj Climate and Atmospheric Science, 8, 219. https://doi.org/10.1038/s41612-025-01103-y

[3] https://github.com/LGlawion/spateGAN_ERA5

How to cite: Glawion, L., Polz, J., Kunstmann, H., Fersch, B., and Chwala, C.: From ERA5 to Precipitation Extremes: Global km-Scale, Sub-Hourly Downscaling with Generative AI, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-17479, https://doi.org/10.5194/egusphere-egu26-17479, 2026.

12:25–12:30

Posters on site: Wed, 6 May, 08:30–10:15 | Hall X4

The posters scheduled for on-site presentation are only visible in the poster hall in Vienna. If authors uploaded their presentation files, these files are linked from the abstracts below.
Display time: Wed, 6 May, 08:30–12:30
Chairpersons: Philine Bommer, Maxime Taillardat, Stéphane Vannitsem
X4.8
|
EGU26-1764
|
ECS
Emily O'Riordan

Both dynamical and AI-based NWP have seen success in using spectral transformations to represent atmospheric variables efficiently. In particular, Fourier-based representations are widely adopted due to fast computational methods and compact encoding of large-scale structure. However, as the NWP community targets higher-resolution models, Fourier-bases may inadequately represent the sharp gradients and multi-scale features that often characterise extreme weather events. Furthermore, for limited-area domains, Fourier representations can impose artificial periodicity, making them less physically appropriate.

In this work, we investigate whether alternative spectral transformations better support AI-based NWP in regional, extreme-weather settings. We systematically compare neural forecasting models trained using Fourier, wavelet, and Legendre spectral representations, assessing their ability to predict multiple atmospheric variables over the Aotearoa New Zealand domain.  Wavelet and polynomial bases are explicitly designed for bounded domains and provide multi-scale, non-periodic representations, making these transformations more suitable for the regional forecasting task.

Aotearoa New Zealand provides an ideal test-bed for these methods, as a region with complex coastlines, steep orography, and frequent exposure to high-impact weather systems. Models are trained and evaluated on reanalysis datasets (ERA5 and BARRA-2), using standard verification metrics and case studies of major Aotearoa New Zealand storms such as Cyclones Gabrielle and Bola. Our results demonstrate that spectral choice has a measurable impact on forecast skill, particularly for extremes and fine-scale structure.

By analysing how different spectral representations influence AI-NWP performance in a regional context, this work provides guidance on the appropriate use of spectral methods for limited-area forecasting, and contributes to the development of more accurate and physically consistent AI-driven weather prediction systems for localised and extreme events.

How to cite: O'Riordan, E.: Spectral representations for regional AI-based weather prediction, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-1764, https://doi.org/10.5194/egusphere-egu26-1764, 2026.

X4.9
|
EGU26-2056
|
ECS
Ryu Shimabukuro, Tomohiko Tomita, Tsuyoshi Yamaura, and Ken-ichi Fukui

Quasi-stationary convective bands over Kyushu, Japan, frequently trigger rainy-season disasters, and hours with ≥50 mm h−1 rainfall are increasing. Yet skillful nowcasts beyond 3 h remain limited. This study presents FlowsNet, an observation-based multi-sensor fusion model that learns directly from radar/rain gauge-analyzed precipitation, surface variables from ground stations, geostationary satellite imagery, and satellite-derived precipitation context. The model targets category-4 (C4; ≥50 mm h−1) rainfall and incorporates two attention mechanisms: a channel-wise module that weights informative modalities and a spatial module that aligns features with banded structures at multi-hour leads. Training uses a tail-aware ordinal loss that couples focal reweighting with Earth Mover’s Distance to highlight rare extremes. FlowsNet maintains a non-zero C4 Critical Success Index through 6 h. From 4 to 6 h, it matches or exceeds the Japan Meteorological Agency’s very-short-range forecast, and it outperforms a leading extrapolation method and current deep-learning nowcasters. Case studies show preserved band geometry and corridor placement at long lead over complex terrain. Ablation experiments identify satellite water-vapor context and near-surface humidity as key for long-lead C4 prediction; combining satellite context with surface observations stabilizes placement and reduces false alarms. By avoiding numerical weather prediction model state and objective analyses/reanalyzes, the approach reduces latency and hardware demand, improves portability and resilience when model cycles degrade, and offers a practical route to earlier and more transferable warnings for extreme rainfall events.

How to cite: Shimabukuro, R., Tomita, T., Yamaura, T., and Fukui, K.: An NWP-Free, Observation-Driven Deep Learning Approach to Heavy-Rainfall Nowcasting Beyond the Three-Hour Limit , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-2056, https://doi.org/10.5194/egusphere-egu26-2056, 2026.

X4.10
|
EGU26-4039
|
ECS
Younes Essafouri, Corentin Seznec, Luciano Drozda, Laure Raynaud, and Laurent Risser

Each day, potentially critical decisions made by governments and organizations depend on accurate weather forecasts, determining whether to evacuate for a storm or simply to carry an umbrella. In this context, Deep Learning (DL) models are becoming a popular and computationally efficient alternative to traditional Numerical Weather Prediction (NWP) models, offering the potential to capture complex data patterns which may be missed using physical explicit equations (Lam et al., 2023). However, their opaque (black-box) nature remains a barrier to operational trust.

Explainable AI (XAI) aim to address this opacity by revealing the decision process behind predictions. Indeed, classical XAI techniques reveal when DL models rely on spurious correlations rather than causal physical mechanisms to deliver predictions (Geirhos et al., 2020). However, their direct application to meteorological data often yields attribution maps that are noisy (Kim et al., 2019) and difficult to interpret due to their high dimensionality. It additionally remains unclear whether these tools can consistently identify the complex physical drivers inherent in NWP (Bommer et al., 2024).

Based on previous works (Bommer et al., 2024; Kim et al., 2023; Yang et al., 2024), we establish a framework to generate compact and interpretable explanations of local weather forecast predictions obtained using deep neural networks. These explanations build on the output of gradient-based methods like VanillaGrad and SmoothGrad (Smilkov et al., 2017), which are scalable to high-dimensional data. More specifically, our framework first allows for targeted analysis by selecting a region of interest (e.g., Paris area) and a target variable (e.g., accumulated precipitation). It therefore answers the question: "Why did the neural network predict this feature at this location?" To do so, it first computes dense attribution maps with respect to all input variables (e.g., wind components at varying altitudes). Traditionally, bounding boxes are used to define the region of importance in these maps (Kim et al., 2023). However, they are unable to provide detailed directional information. We therefore propose in our framework to determine regions of importance using "confidence ellipses" that summarize the center, main directions, and importance of the most concentrated regions. Unlike bounding boxes, the representation of these ellipses, with the raw attribution maps as a background, provides rich and easily interpretable information regarding the directionality and spatial spread of the model's focus.

Preliminary results on the hybrid transformer-convolutional-based model UNETR++ (Shaker et al., 2024) trained and tested on the TITAN dataset from Météo-France (comprising hourly surface and vertical profiles of wind, temperature, and geopotential over metropolitan France) demonstrate our framework's pertinence for explaining predictions from deep neural networks. We were able to verify that different trained models successfully capture the vertical hierarchy of atmospheric variables, evidenced by an effective receptive field that expands with increasing altitude. More interestingly, our framework allowed us to identify systematic biases learned during training that correlate with known physical occurrences. These findings serve as a foundational step for future work on developing novel explainability methods to detect whether trained models capture complex physical mechanisms.

How to cite: Essafouri, Y., Seznec, C., Drozda, L., Raynaud, L., and Risser, L.: A Framework for Explainable AI in Weather Forecasting: Diagnosing Deep Learning Models via Gradient-Based Attributions, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-4039, https://doi.org/10.5194/egusphere-egu26-4039, 2026.

X4.11
|
EGU26-5377
|
ECS
Mária Lakatos

A widely recognized limitation of most post-processing methods is that they are typically applied independently for each forecast horizon, location, and variable, potentially neglecting important dependencies across these dimensions. Despite the development of numerous statistical and machine learning methods for modeling these dependencies, the topic remains the subject of ongoing research.
In this work, the proposed approach employs a graph neural network (GNN) trained with a composite loss function that combines the energy score (ES) and the variogram score (VS) for the multivariate postprocessing of ensemble forecasts. The method is evaluated using WRF-based solar irradiance forecasts over northern Chile and ECMWF visibility forecasts over Central Europe.
Across all multivariate verification metrics, the dual-loss GNN consistently outperforms empirical copula–based postprocessing methods as well as GNNs trained solely with CRPS or ES. For the WRF forecasts, the learned rank-order structure captures dependency information more effectively, leading to improved restoration of spatial relationships compared with both the raw ensemble and historical observational ranks. Moreover, incorporating VS into the training loss also improves univariate predictive performance for both forecast targets.

Lakatos, M. (in press). A composite-loss graph neural network for the multivariate post-processing of ensemble weather forecasts.
Quarterly Journal of the Royal Meteorological Society.

How to cite: Lakatos, M.: A Composite-Loss Graph Neural Network for the Multivariate Post-Processing of Ensemble Weather Forecasts , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5377, https://doi.org/10.5194/egusphere-egu26-5377, 2026.

X4.12
|
EGU26-8100
Elisa Perrone, Maurits Flos, Bastien François, Irene Schicker, and Kirien Whan

Weather predictions are often provided as ensembles generated by repeated runs of numerical weather prediction models. These forecasts typically exhibit bias and inaccurate dependence structures due to numerical and dispersion errors, requiring statistical postprocessing for improved precision. A common correction strategy is the two-step approach: first adjusting the univariate forecasts, then reconstructing the multivariate dependence. The second step is usually handled with nonparametric methods, which can underperform when historical data are limited. Parametric alternatives, such as the Gaussian Copula Approach (GCA), offer theoretical advantages but often produce poorly calibrated multivariate forecasts due to random sampling of the corrected univariate margins. In this work, we introduce COBASE, a novel copula-based postprocessing framework that preserves the flexibility of parametric modeling while mimicking the nonparametric techniques through a rank-shuffling mechanism. This design ensures calibrated margins and realistic dependence reconstruction. We evaluate COBASE on multi-site 2-meter temperature forecasts from the ALADIN-LAEF ensemble over Austria and on joint forecasts of temperature and dew point temperature from the ECMWF system in the Netherlands. Across all regions, COBASE variants consistently outperform traditional copula-based approaches, such as GCA, and achieve performance on par with state-of-the-art nonparametric methods like SimSchaake and ECC, with only minimal differences across settings. These results position COBASE as a competitive and robust alternative for multivariate ensemble postprocessing, offering a principled bridge between parametric and nonparametric dependence reconstruction.

How to cite: Perrone, E., Flos, M., François, B., Schicker, I., and Whan, K.: COBASE: A new copula-based shuffling method for ensemble weather forecast postprocessing, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8100, https://doi.org/10.5194/egusphere-egu26-8100, 2026.

X4.13
|
EGU26-9336
Suhwan Kim, Dongjin Kim, and Jong-Min Yeom

Fog detection using geostationary satellite data has the advantage of monitoring large areas in a short period of time. However, because fog exhibits highly diverse optical characteristics in both space and time, it is difficult to achieve reliable detection with a single satellite-based detection strategy that does not consider environmental conditions. Therefore, this study utilized data from GEO-KOMPSAT-2A (GK2A) to pre-define fog occurrence environments, construct appropriate input data and labels for each environmental condition, and then applied a categorized deep learning-based fog detection system.

First, fog was identified when ground-station visibility was under 1 km. To create reliable training data, the ground-station visibility data was spatially aligned with fog labels from the Korea Meteorological Administration (KMA) for GK2A observations. Only areas consistently identified as fog by both ground-station observations and KMA fog labels were selected and cropped. In this process, a spatial grouping method was used to eliminate noise and ensure the fog regions had continuous spatial coverage.        

In constructing the input data, variables representing surface characteristics were chosen to optimize detection accuracy for each environmental condition. Using this high-quality dataset, data were organized into different groups based on four seasons, three time periods (daytime, nighttime, dawn/dusk), and two surface types (land, ocean). Separate DeepLabV3+ models were trained for each category, with 2022 data used for training and 2023 data for validation.

To evaluate the model's ability to generalize, the entire 2024 dataset not included in training was used as an independent test set. For accurate assessment, post-processing filtering with a cloud mask was applied to measure detection performance in cloud-free regions. The results revealed notable seasonal fluctuations in performance, indicating that detection efficiency depends on environmental conditions. Even with the same deep learning architecture, this suggests that careful data preprocessing and environment-specific strategies can help advance satellite-based fog detection technology.

 

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT)(RS-2025-00515357).

How to cite: Kim, S., Kim, D., and Yeom, J.-M.: Environment-Specific Fog Detection over the Korean Peninsula Using GEO-KOMPSAT-2A and DeepLabV3+, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9336, https://doi.org/10.5194/egusphere-egu26-9336, 2026.

X4.14
|
EGU26-12713
|
ECS
Sebastian Buschow and Wael Almikaeel and the WeatherGenerator Team

Data driven weather models have proven their ability to learn various aspects of the weather prediction problem. While their point-to-point skill has been proven, the precise nature of their errors is not yet fully understood.

This contribution takes a first look at the spatial precipitation patterns simulated by the Weather Generator – a foundation model trained on diverse data sources with the goal of learning the underlying behavior of the atmosphere as a whole.  We analyze the correlation structure of the simulated precipitation fields using spatial verification techniques including two-dimensional wavelet transforms. Some attention is paid to the problem of applying these methods to global data on an irregular grid. The results can be compared to observations, reanalysis and potentially other data-driven forecast models.

How to cite: Buschow, S. and Almikaeel, W. and the WeatherGenerator Team: Verifying the spatial structure of precipitation fields from a foundation model of the atmosphere, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12713, https://doi.org/10.5194/egusphere-egu26-12713, 2026.

X4.15
|
EGU26-14107
Katharine Grant and Gavin Evans

Visibility forecasting is critical for aviation, transportation, and public safety, yet remains a challenging aspect of meteorology due to complex atmospheric processes and aerosol interactions. Accurate visibility prediction is essential for operational decision-making, but traditional approaches often struggle with physical realism and probabilistic reliability. 

This study addresses these challenges within the Met Office’s IMPROVER (Integrated Model post-PROcessing and VERification) framework, which provides probabilistic post-processing of Numerical Weather Prediction (NWP) output for customers including the UK Public Weather Service. Historically, visibility diagnostics in IMPROVER have been constrained by limitations in the underlying NWP model. To overcome this, two key enhancements were introduced. First, the integration of VERA (Visibility Employing Realistic Aerosols), an existing diagnostic within the Unified Model (UM), which incorporates polydisperse aerosol effects to deliver a more physically consistent representation of visibility.  
Second, building on this improved foundation, a statistical post-processing step was implemented using Quantile Regression Forests (QRF), marking the first application of machine learning within IMPROVER. QRF was chosen for its ability to capture complex, non-linear relationships and produce calibrated probabilistic forecasts. 

The primary objective was to improve forecast skill at operationally significant thresholds, particularly <7.5 km and <1 km, which are critical for aviation and road safety. Benchmarking on the EUPPBench dataset compared QRF against reliability calibration and Distribution Regression Networks (DRN). QRF demonstrated superior performance, achieving a 45% improvement in Ranked Probability Skill Score (RPSS) over the raw NWP output. Subsequent testing using Met Office data also showed significant improvement, with QRF delivering a 9% RPSS increase for thresholds <7.5 km and a 22% improvement in Continuous RPSS across all thresholds. 

This work demonstrates the value of combining physically realistic NWP diagnostics with machine learning techniques to enhance probabilistic visibility forecasts. These improvements pave the way for more reliable decision-making in sectors sensitive to visibility conditions. Putting this research into operational production as of early 2026 represents a significant step forward in the quality of our visibility forecasts. 

How to cite: Grant, K. and Evans, G.: Improvements to the Met Office operational Visibility diagnostic using Machine Learning , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14107, https://doi.org/10.5194/egusphere-egu26-14107, 2026.

X4.16
|
EGU26-14658
|
ECS
Jasmin Haupt, Hyunju Jung, Marie Müller, Steffen Tietsche, Tobias Selz, Peter Knippertz, and Julian Quinting

Equatorial waves are a key process in shaping tropical weather and have been linked to tropical-extratropical teleconnections. Besides, they are one of the reasons for the higher predictability limit in the tropics compared to the extratropics. Yet, their correct representation in weather prediction models is a long-standing challenge, even at model resolutions on the km-scale, leaving substantial potential in global weather predictions unused.

In this study, we systematically quantify and compare the representation of equatorial waves in 10-day forecasts of operational deterministic state-of-the-art weather prediction models (numerical, hybrid, and data-driven). The forecast data initialized from 01 January 2020 to 16 December 2020 are provided by WeatherBench2 and dedicated experiments with AIFS from the European Centre for Medium-Range Weather Forecasts (ECMWF). Equatorial Kelvin, Rossby, and westward-moving mixed Rossby-Gravity waves have been identified based on 850-hPa winds and geopotential height using the approach of Yang et al. (2003). The filtered data-driven forecast data are evaluated against ERA5 and operational ECMWF analysis for wave amplitude and pattern correlation, and compared with the numerical weather prediction (NWP) model Integrated Forecasting System (IFS) from ECMWF.

The key finding is that for the period 2020, all data-driven weather prediction models outperform the NWP-based forecasts of the IFS model in representing equatorial wave patterns beyond 3 days lead time, evaluated with the Pearson Correlation Coefficient, except for the Rossby wave mode n=1, which is equally well represented by all models.  
For Kelvin waves, the difference in forecast skill is most remarkable with an extension of the forecast horizon in most models from 8 to 10 days. In terms of Kelvin wave activity bias, ML-models exhibit a smaller systematic error than the IFS model, which locally underestimates the Kelvin wave activity by up to 30 % when evaluated against ERA5, with the highest underestimation in the Pacific. Interestingly, the equatorial wave representation in the data-driven model Pangu-Weather depends on the initialization dataset. We currently investigate the reason for this difference by systematically comparing ML-forecasts initialized from ERA5 and operational ECMWF analysis.

How to cite: Haupt, J., Jung, H., Müller, M., Tietsche, S., Selz, T., Knippertz, P., and Quinting, J.: Representation of equatorial waves in state-of-the-art data-driven weather prediction models, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14658, https://doi.org/10.5194/egusphere-egu26-14658, 2026.

X4.17
|
EGU26-15199
|
ECS
Eun-Tae Kim and Jung-Hoon Kim

This study aims to estimate the objective amount of the supercooled liquid water content (SLWC) using in situ aircraft observation data to construct an objective and consistent long-term dataset of aircraft icing intensity. SLWC was estimated using two conventional calibration methods and a newly proposed Gated Recurrent Unit (GRU) model, based on measurements from the Rosemount Icing Detector (RICE) and collocated in situ aircraft observations. The observations were collected by NARA research aircraft operated by the National Institute of Meteorological Sciences in South Korea, which has conducted regular atmospheric observations since February 2018. The GRU-based approach demonstrated substantially improved performance compared to the calibration methods, achieving a Pearson correlation coefficient of 0.945 and a Nash–Sutcliffe efficiency of 0.891 when evaluated against independent observations not used in model training. In particular, the proposed method enables a more detailed representation of SLWC evolution by providing time-series SLWC estimates, whereas calibration-based approaches typically provide a single representative value for each icing event. The GRU-based estimates closely reproduce the observed temporal variability of SLWC in NARA icing cases, further demonstrating the capability of the proposed method to capture realistic SLWC evolution. The estimated SLWC from the proposed model were subsequently used to classify icing intensity based on operationally established SLWC thresholds for each icing intensity category, resulting in a robust long-term icing intensity dataset spanning over six years. The outcomes of this study are expected to contribute not only to aircraft icing research but also to a broad spectrum of applications including remote-sensing-based hydrometeor detection, cloud microphysical processes, and numerical weather prediction model parameterizations.

Acknowledgement: This research is supported by the Korea Meteorological Administration Research and Development Program under Grant RS-2022-KM220310 and RS-2022-KM220410.

How to cite: Kim, E.-T. and Kim, J.-H.: A Novel Method for Estimating the Supercooled Liquid Water Content Using In Situ Aircraft Observation Data and Gated Recurrent Unit, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-15199, https://doi.org/10.5194/egusphere-egu26-15199, 2026.

X4.18
|
EGU26-18581
Yu-Lun Mai, Xin-Ni Chen, and Tien-Hsuan Lu

Taiwan is located along the circum-Pacific seismic belt and is frequently affected by destructive earthquakes. Identifying reliable preseismic anomalies is therefore crucial for seismic hazard mitigation. Previous studies have demonstrated that groundwater levels are influenced not only by nontectonic factors—such as precipitation, atmospheric pressure, tides, and temperature—but also by stress redistribution associated with earthquake preparation processes. However, robust quantitative methods capable of separating nontectonic influences from tectonic anomalies remain limited. In this study, the 2016 Meinong earthquake in southern Taiwan was investigated as a case study. Support vector regression (SVR) models were developed using meteorological variables and groundwater level observations to construct predictive models of groundwater fluctuations and to identify preseismic anomalies related to crustal stress accumulation. Groundwater monitoring stations located west of the epicenter were first selected based on their clear coseismic responses and strong spatial correspondence with observed surface deformation. Using air temperature, precipitation, and atmospheric pressure as explanatory variables, the SVR model and the Akaike Information Criterion (AIC) were applied to determine optimal lag structures and to establish pre-earthquake groundwater prediction models. The trained models were then used to simulate groundwater levels over the two years preceding the earthquake, and residual analysis was performed to identify anomalous signals. Among the 12 analyzed stations, 9 exhibited coefficients of determination (R²) ranging from 0.18 to 0.79. Stations situated in coastal fine-sand aquifers showed substantially higher predictive performance (R² = 0.42–0.79) than those located in mountainous regions (R² = 0.18–0.49). Six stations displayed pronounced negative residual anomalies exceeding two standard deviations approximately one year prior to the earthquake, followed by a gradual recovery toward the event. This temporal pattern is consistent with deformation trends observed at nearby surface monitoring stations. In addition, three stations exhibited short-term residual anomalies exceeding two standard deviations within approximately one month before the earthquake. These results demonstrate that groundwater level anomalies derived from physically informed predictive models can be systematically linked to surface deformation and short-term precursory processes preceding earthquakes. Our findings highlight the potential of groundwater monitoring as a complementary indicator for earthquake precursor detection and seismic hazard assessment.

How to cite: Mai, Y.-L., Chen, X.-N., and Lu, T.-H.: Identification of Tectonic Anomalies Prior to the Meinong Earthquake in Taiwan Using a Support Vector Regression–Based Groundwater Level Model, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18581, https://doi.org/10.5194/egusphere-egu26-18581, 2026.

X4.19
|
EGU26-20598
|
ECS
Manasa Pawar, Nicoletta Noceti, and Antonella Galizia

Short-term precipitation nowcasting, the prediction of rainfall over lead times from a few minutes to about an hour, remains challenging because radar-derived precipitation fields evolve not only through motion but also through rapid, non-linear changes such as growth, decay, and structural reorganization. Classical extrapolation methods are efficient yet struggle to represent these intensity and morphology changes, while many learning-based approaches become costly when scaled to large, high-resolution radar grids. 

Our approach treats temporal learning and spatial reconstruction as two separate problems. A compact 3D convolutional encoder processes a short radar sequence to capture how precipitation structures evolve over time. We then convert the encoder feature volumes into 2D skip representations through depth aggregation and channel compression and use a lightweight 2D decoder to reconstruct full resolution forecasts. We benchmark against persistence and a strong 2D convolution baseline. 

The framework is evaluated on the RYDL dataset derived from the German Weather Service radar composite, providing 2D radar fields every five minutes over Germany at 1 × 1 km resolution on a 900 × 900 grid. Performance is benchmarked against persistence and a strong 2D convolutional baseline using complementary verification measures, including mean absolute error, critical success index at multiple intensity thresholds, and fractions skill score with spatial tolerance. Across benchmark lead times, the proposed approach reduces MAE from 0.22 to 0.20 at 5 min, from 0.35 to 0.28 at 30 min, and from 0.44 to 0.42 at 60 min relative to the 2D baseline, indicating improved robustness at intermediate horizons while retaining competitive short-range accuracy. These results suggest that combining explicit spatio-temporal encoding with efficient two-dimensional reconstruction offers a practical route to scalable radar nowcasting on large domains. 
Keywords: Radar nowcasting, precipitation forecasting, deep learning, spatio-temporal representation learning, forecast verification 

How to cite: Pawar, M., Noceti, N., and Galizia, A.: Efficient deep learning for radar precipitation nowcasting using spatiotemporal encoding and two-dimensional reconstruction , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-20598, https://doi.org/10.5194/egusphere-egu26-20598, 2026.

X4.20
|
EGU26-21929
William Kleiber and Nicolas Coloma

As the power grid moves to a more renewable future, energy sources from weather-driven phenomena such as solar power will form an increasingly large portion of electricity generation.  The predicatibility, non-Gaussianity and intermittency of solar resources challenge current grid operation paradigms, and realistic data scenarios are required for grid planning and operational studies.  However, such data are not available at the space-time resolution needed for realistic grid models.  Given sparse spatial samples that are high-resolution in time, we introduce a framework for spatiotemporal prediction and downscaling in a functional data analysis framework when data exhibit nonstationary phase misalignment.  The approach is illustrated on a challenging irradiance dataset and compares favorably against existing methods.

How to cite: Kleiber, W. and Coloma, N.: Estimation and spatial prediction methods for high-frequency space-time solar irradiance, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21929, https://doi.org/10.5194/egusphere-egu26-21929, 2026.

X4.21
|
EGU26-20244
Mátyás Kocsis and Sándor Baran

Weather forecasts are issued by numerical weather prediction models, which describe the dynamic behaviour of the atmosphere. Due to the chaotic nature of the atmospheric processes, assessing the uncertainty of forecasts is essential. The state-of-the-art method is to run the prediction models several times with different initialisation and/or parameterisation to obtain an ensemble of forecasts, better representing the possible scenarios.

In the last few years, AI-based models have become the centre of attention in weather forecasting due to their accuracy and efficiency. The European Centre for Medium-Range Weather Forecasts (ECMWF) has developed its Artificial Intelligence/Integrated Forecasting System (AIFS) model, which was first to provide data-driven ensemble forecasts in June 2024. Since July 2025, the AIFS ensemble model has been operational and runs in parallel with the physics-based Integrated Forecasting System (IFS) model of ECMWF, which is considered the gold standard in weather prediction. The new AIFS model can generate forecasts ten times faster than the classical physics-based one, while consuming approximately a thousand times less energy.

We present the results of our systematic comparison of the performances of the IFS and AIFS models by investigating the accuracy of raw and post-processed 10-metre wind-speed forecasts generated by the two models between July 2025 and November 2025 across several thousand station locations. The post-processed case involves the application of the parametric Ensemble Model Output Statistics method as well as a nonparametric quantile regression approach to correct any systematic biases and dispersion inaccuracies in the raw forecasts, which are usually detectable in the case of ensemble predictions.

How to cite: Kocsis, M. and Baran, S.: AI and Physics-Based Weather Forecasting: A Comparative Study of ECMWF's Operative AIFS and IFS Ensemble Wind Speed Predictions, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-20244, https://doi.org/10.5194/egusphere-egu26-20244, 2026.

X4.22
|
EGU26-2503
|
ECS
Romain Pic, Zhongwei Zhang, Johanna Ziegel, and Sebastian Engelke

Receiver Operating Characteristic (ROC) and Precision–Recall (PR) curves are widely used to assess the discrimination ability of forecasts for binary events, such as threshold exceedances or warnings of extreme events. In weather forecasting, forecasts are provided as spatial fields, yielding location-wise ROC and PR curves that are often aggregated to facilitate comparison, although the effect of the aggregation strategy on performance assessment remains poorly understood.

We investigate how different aggregation strategies for ROC and PR curves affect the assessment of discrimination ability. In particular, we identify conditions under which aggregation strategies satisfy two desirable properties for fair comparison: preservation of dominance between forecasts and preservation of concavity of the curves. We review commonly used aggregation approaches from the literature, analyze their theoretical properties, and highlight potential pitfalls that may lead to misleading interpretations. Based on these findings, we provide practical guidelines for the interpretation of aggregated ROC and PR curves. The proposed framework is illustrated using AI-based global weather forecasts, showing how different aggregation strategies can lead to different rankings.

How to cite: Pic, R., Zhang, Z., Ziegel, J., and Engelke, S.:  Spatial aggregation of ROC and PR curves, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-2503, https://doi.org/10.5194/egusphere-egu26-2503, 2026.

X4.23
|
EGU26-5091
Sándor Baran and Martin Leutbecher

In evaluating multivariate probabilistic forecasts predicting vector quantities such as a weather variable at multiple locations or a wind vector, an important step is the assessment of their calibration and reliability. Here, we focus on the logarithmic score and are interested in the specific case when the density is multivariate normal with mean and covariance structure given by the ensemble mean and ensemble covariance matrix, respectively. Under the assumptions of multivariate normality and exchangeability of the ensemble members, a relationship is derived that describes the dependence on ensemble size. It is exploited to introduce a fair logarithmic score for multivariate ensemble forecasts [1].

An application to medium-range weather forecasts demonstrates the usefulness of the ensemble size adjustments when multivariate normality is only an approximation, where we consider ensemble predictions of sizes from 8 to 100 of vectors consisting of several different combinations of upper air variables. We show how the logarithmic score depends on ensemble size for various examples and to what extent the fair logarithmic score reduces this dependence.

References

1. Leutbecher, M. and Baran, S., Ensemble size dependence of the logarithmic score for forecasts issued as multivariate normal distributions. Q. J. R. Meteorol. Soc. 151 (2025), paper e4898, doi:10.1002/qj.4898.

*Research was supported by the Hungarian National Research, Development and Innovation Office under Grant No. K142849.

How to cite: Baran, S. and Leutbecher, M.: Fair logarithmic score for multivariate Gaussian forecasts, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5091, https://doi.org/10.5194/egusphere-egu26-5091, 2026.

X4.24
|
EGU26-7775
|
ECS
Zhixiao Niu, Song Chen, Zhihuo Xu, Joshua Lee, Hugh Zhang, Shuping Ma, Yaomin Wang, Xinyue Liu, and Xiaogang He

Rainfall nowcasting of deep convection in the tropics is extremely challenging, particularly in highly urbanized coastal regions such as Singapore, where high spatial resolution is required. Conventional optical flow-based nowcasting methods typically struggle with capturing the initiation, duration, and spatiotemporal evolution of deep convection and rainfall. When it comes to extreme rainfall, these existing methods cannot deliver skillful nowcasts due to rapid changes in localized features of individual deep convection events. Recent advances in AI-based data-driven models, particularly deep generative models utilizing high-resolution radar imagery, have improved nowcasting accuracy at longer lead times. However, they often serve as black boxes, neglecting the underlying physics, potentially missing unseen extremes, and underestimating their rainfall intensity. To better tackle convection onset prediction, we adopt a novel importance sampling strategy that targets convective initiation by identifying convective cells based on a 35 dBZ threshold and fitting a linear growth trend across frames. Samples with steeper growth and fewer initial convective cells are prioritized to emphasize early-stage development. To enhance physical realism in deep tropics, we further propose a physics-informed deep generative model that incorporates diurnal and seasonal cycles to reflect tropical weather variability. Moreover, the model includes three-dimensional physical information such as Doppler wind and multi-altitude reflectivity. With the incorporation of additional physical information, the proposed generative framework consistently outperforms baseline models, particularly at early forecast lead times. Relative to the original DGMR driven solely by precipitation inputs, the physics-informed model achieves substantially higher skill across multiple rainfall thresholds. Over a 90-min forecast horizon, the average probabilities of detection (POD) reach 0.70, 0.47, and 0.21 at 1.0, 4.0, and 16.0 mm h⁻¹, corresponding to relative improvements of 27%, 25%, and 25%, respectively, with associated critical success indices (CSI) of 0.47, 0.30, and 0.15. In addition, spatial correlation is enhanced across pooling scales of 0.5, 2.0, and 8.0 km, yielding average Pearson correlation coefficients (PCC) of 0.27, 0.32, and 0.46, representing relative gains of 15–16% compared with the baseline. Attribution analysis further indicates that multi-altitude reflectivity contributes most strongly to nowcasting skill, followed by composite reflectivity, while the influence of time-regime information increases with forecast lead time and the contribution of three-dimensional wind fields remains comparatively modest. Our novel physics-informed deep generative model provides valuable insight into convective precipitation processes, supports more reliable nowcasting, and helps guide future data collection in tropical regions.

How to cite: Niu, Z., Chen, S., Xu, Z., Lee, J., Zhang, H., Ma, S., Wang, Y., Liu, X., and He, X.: Advancing Rainfall Nowcasting in Tropical Southeast Asia with Physics-Informed Deep Generative Models, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7775, https://doi.org/10.5194/egusphere-egu26-7775, 2026.

X4.25
|
EGU26-18999
Takeshi Enomoto, Aki Saito, and Saori Nakashita

Data-driven forecasting of the atmosphere and ocean is evolving rapidly. Recent reports on machine learning weather prediction (MLWP) demonstrate that these models rival or even outperform traditional numerical weather prediction (NWP) from leading operational centres. While the inference is faster than physics-based models, MLWP typically requires Graphical Processing Units (GPUs) or Tensor Processing Units (TPUs) with significant memory, and the computational requirements for training remain enormous.

Certain applications prioritize efficiency, such as sea-surface temperature (SST) prediction on research vessels with limited communication bandwidth. We address this problem by proposing a light-weight alternative to convolutional neural networks (CNNs) or vision transformers (ViTs). To this end, we utilize gradient boosting, specifically XGBoost, which is highly efficient for tabular data. To incorporate spatial patterns, we conduct the Singular Value Decomposition (SVD) to derive Empirical Orthogonal Functions (EOFs). We train the model on the four years of 0.1° SST data based on Himawari over the Western Pacific (120°E–150°E, 20°N–50°N). Preliminary 5-day forecasts show a median error improvement to −0.082 K from 0.10 K and a reduction in standard deviation to 0.68 K from 0.74 K compared to the persistence baseline.

Acknowledgements: This work was supported by JSPS KAKENHI 24H02226.

How to cite: Enomoto, T., Saito, A., and Nakashita, S.: Machine learning sea-surface temperature forecasting based on empirical orthogonal functions, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18999, https://doi.org/10.5194/egusphere-egu26-18999, 2026.

Please check your login data.