ESSI1.1 | Development and Explainability of Large Scale and Foundations Models for Weather and Climate
Development and Explainability of Large Scale and Foundations Models for Weather and Climate
Convener: Christian Lessig | Co-conveners: Todd Jones, Tom Dunstan, Anna-Louise Ellis, Sebastian Hickman, Ilaria Luise, Sebastian Schemm
Orals
| Mon, 04 May, 16:15–18:00 (CEST)
 
Room -2.92
Posters on site
| Attendance Mon, 04 May, 10:45–12:30 (CEST) | Display Mon, 04 May, 08:30–12:30
 
Hall X4
Posters virtual
| Wed, 06 May, 14:15–15:45 (CEST)
 
vPoster spot 1b, Wed, 06 May, 16:15–18:00 (CEST)
 
vPoster Discussion, Wed, 06 May, 14:15–15:45 (CEST)
 
vPoster spot 1b, Wed, 06 May, 16:15–18:00 (CEST)
 
vPoster Discussion
Orals |
Mon, 16:15
Mon, 10:45
Wed, 14:15
Recent advances in machine learning are transforming weather and climate science, from the emergence of large‑scale foundation models (e.g. Aurora, ORBIT, WeatherGenerator and Walrus) to the rapid development of explainable and trustworthy AI methods that aim to make these models scientifically credible and operationally usable. This session brings together contributions on the development, evaluation, and application of large‑scale and foundation‑style machine learning systems, alongside state‑of‑the‑art research on interpretability, trust, diagnostics, and validation of ML models across Earth system applications. We welcome studies that address the methodological and scientific challenges associated with pre‑training and scaling ML models on diverse atmospheric and climate datasets; the assessment of training strategies, physical consistency, and model behaviour at scale; and post/pre‑training adaptation approaches such as fine‑tuning, distillation, and latent‑space steering. We equally encourage contributions that advance explainable AI (XAI) for weather and climate science, including feature attribution, causal inference, model bias diagnosis, uncertainty communication, human‑in‑the‑loop validation, and stakeholder‑oriented interpretability. Contributions that develop scalable, robust XAI techniques for high‑dimensional geoscientific problems are particularly welcome. By bridging foundation‑model development with explainability, trust, and scientific insight, this session aims to support a more transparent, reliable, and physically grounded development of ML tools for weather, climate, and environmental applications that push the boundary in terms of skill and quality.

Orals: Mon, 4 May, 16:15–18:00 | Room -2.92

The oral presentations are given in a hybrid format supported by a Zoom meeting featuring on-site and virtual presentations. The button to access the Zoom meeting appears 15 minutes before the time block starts.
Chairpersons: Ilaria Luise, Anna-Louise Ellis, Sebastian Schemm
16:15–16:20
16:20–16:30
|
EGU26-18011
|
ECS
|
On-site presentation
Firat Ozdemir, Yun Cheng, Salman Mohebi, Fanny Lehmann, Simon Adamov, Leonardo Trentini, Langwen Huang, Levi Lingsch, Zhenyi Zhang, Oliver Fuhrer, Benedikt Soja, Siddhartha Mishra, Torsten Hoefler, Sebastian Schemm, and Mathieu Salzmann

With increased availability of high quality diverse weather data, including reanalysis, satellite, surface stations, climate model data, the amount of data-driven foundation models (FM) in the environmental field has increased significantly over the past years with forecasting performances matching and sometimes exceeding physics-based numerical model predictions.  However, most FMs are trained with one dataset or a few datasets with similar sampling and/or resolution properties. While the proposed models achieve remarkable results with the datasets and variables they are trained on; it would be hard to anticipate similar performance under partially missing observations across different dimensions at test time. Similarly, typical design considerations risk limiting usage of these FMs to other heterogeneous datasets concerning the broader Earth sciences community.

We propose Earth System Foundation Model (ESFM), an FM capable of handling heterogeneous observations (i) across different resolutions, (ii) with spatially gridded and non-gridded nature, and (iii) with little to extreme sparsity. We achieve this through simple architectural design considerations and a masked training protocol. Namely, we bin similar ranges of grid resolutions together, while optimizing a different set of tokenizers for significantly different resolution bins to accommodate a single FM for observations across different resolutions. Similarly, we tokenize non-gridded data (i.e., station) separately with a single pixel patch size. Finally, we use variable specific tokenizers, coupled with learnable missing observation tokens, that allow ESFM to naturally accommodate for various subsets of available variables across different spatiotemporal positions. 

In this exploratory study, we show that ESFM is a flexible FM that can achieve impressive forecasting performance under different adverse setups with missing test data across any dimension on ERA5; spatio-temporal and inter-variable. We further test forecasting performance of ESFM in very sparse satellite imagery (3% pixel occupancy) data as well as station data. 

The proposed framework; also compatible for different backbone architectures than the one we experimented with; provides a general approach for integrating diverse Earth system data sources with varying resolutions, sampling patterns, and availability. This makes ESFM particularly relevant for the broader environmental sciences and Earth and space sciences, where challenges related to data heterogeneity and missing observations are central to the development of next-generation data-driven environmental modeling systems.

How to cite: Ozdemir, F., Cheng, Y., Mohebi, S., Lehmann, F., Adamov, S., Trentini, L., Huang, L., Lingsch, L., Zhang, Z., Fuhrer, O., Soja, B., Mishra, S., Hoefler, T., Schemm, S., and Salzmann, M.: ESFM - A foundation model framework for heterogeneous data integration, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18011, https://doi.org/10.5194/egusphere-egu26-18011, 2026.

16:30–16:40
|
EGU26-19620
|
ECS
|
On-site presentation
Yun Cheng, Firat Özdemir, Salman Mohebi, Fanny Lehmann, Simon Adamov, Leonardo Trentini, Langwen Huang, Levi Lingsch, Zhenyi Zhang, Oliver Fuhrer, Benedikt Soja, Siddhartha Mishra, Torsten Hoelfer, Sebastian Schemm, and Mathieu Salzmann

Weather foundation models are increasingly expected to operate under heterogeneous and imperfect observation settings while remaining computationally scalable. Building on the Earth System Foundation Model (ESFM) setting for heterogeneous data integration, we explore how Mixture-of-Experts (MoE) can support robust and efficient learning in multi-modal weather foundation models.

We introduce ESFM-MoE, an exploratory direction that combines conditional computation with climate-semantic routing, a routing principle that encourages expert specialization aligned with meaningful geophysical structure, rather than treating expert selection as a purely generic scaling mechanism. The motivation is that Earth-system data exhibit strong spatial organization, regime-like variability, and modality-dependent uncertainties; MoE offers a natural way to allocate capacity adaptively and promote structured specialization under such heterogeneity.

In this work, we discuss the design space and practical considerations of integrating MoE into Earth-system foundation models, focusing on how routing objectives and inductive biases can shape expert behavior and improve utilization. We highlight potential benefits for robustness to missing observations, scalable training and inference, and outline promising directions for climate-aware expert specialization in next-generation weather foundation models.

How to cite: Cheng, Y., Özdemir, F., Mohebi, S., Lehmann, F., Adamov, S., Trentini, L., Huang, L., Lingsch, L., Zhang, Z., Fuhrer, O., Soja, B., Mishra, S., Hoelfer, T., Schemm, S., and Salzmann, M.: ESFM-MoE: Climate-semantic routing for Earth System Foundation Model (ESFM) , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19620, https://doi.org/10.5194/egusphere-egu26-19620, 2026.

16:40–16:50
|
EGU26-13486
|
ECS
|
Virtual presentation
Sebastian Hickman, Sophie Xhonneux, Ilaria Luise, Julian Kuehnert, Matthias Karlbauer, Kerem Tezcan, Yura Perugachi Diaz, Timothee Hunter, and Christian Lessig

In general, pre-training of large machine learning models uses self-supervised learning to generate expressive latent representations. These can then be used for downstream applications with little to no fine-tuning. The WeatherGenerator project follows this paradigm and aims to train a foundation model from a large number of weather and climate datasets to learn general and useful representations that may be used for a variety of downstream tasks, such as forecasting, downscaling or data assimilation. A wide variety of self-supervised tasks and training paradigms exist from other domains such as computer vision, that provide impressive performance. However, the extent to which these strategies transfer to atmospheric dynamics, and the physical sciences in general, has not been widely explored except for a few notable cases (Lessig et al., 2023, Parker et al., 2025).  

We explore how different pre-training approaches, including masked token modelling and student-teacher methods (Caron et al.,2021, Zhou et al, 2022, Assran et al., 2023), can be adapted to learn representations for atmospheric dynamics using reanalysis, forecast, and observation datasets. We then show how linear probing and small non-linear decoders can be used to evaluate the quality of the representations learned by different pre-training strategies. The relationship between the pre-training task and the quality of the representations learned for different tasks is explored. Finally, we illustrate the importance of including varied and representative datasets during pre-training and compare this to the specific pre-training method used. 

Parker, L., Lanusse, F., Shen, J., Liu, O., Hehir, T., Sarra, L., Meyer, L., Bowles, M., Wagner-Carena, S., Qu, H. and Golkar, S., 2025. AION-1: Omnimodal Foundation Model for Astronomical Sciences. arXiv preprint arXiv:2510.17960. 

Lessig, C., Luise, I., Gong, B., Langguth, M., Stadtler, S. and Schultz, M., 2023. AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning. arXiv preprint arXiv:2308.13280. 

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A., 2021. Emerging Properties in Self-Supervised Vision Transformers. https://doi.org/10.48550/arXiv.2104.14294

Assran, M., Duval, Q., Misra, I., Bojanowski, P., Vincent, P., Rabbat, M., LeCun, Y., Ballas, N., 2023. Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture. https://doi.org/10.48550/arXiv.2301.08243

Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A., Kong, T., 2022. iBOT: Image BERT Pre-Training with Online Tokenizer. https://doi.org/10.48550/arXiv.2111.07832 

How to cite: Hickman, S., Xhonneux, S., Luise, I., Kuehnert, J., Karlbauer, M., Tezcan, K., Perugachi Diaz, Y., Hunter, T., and Lessig, C.: Learning representations from different pre-training strategies in the WeatherGenerator , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13486, https://doi.org/10.5194/egusphere-egu26-13486, 2026.

16:50–17:00
|
EGU26-5252
|
On-site presentation
Cristian Lussana, Rolf Heilemann Myhre, Amélie Neuville, Even Marius Nordhagen, Ivar Ambjørn Seierstad, and Thomas Nils Nipen

Within the WeatherGenerator project, the Norwegian Meteorological Institute (MET Norway) is applying the project’s foundation model to reconstruct several decades -ideally the most recent 40 years- of atmospheric fields over Scandinavia. The primary objective of this work is to assess the potential of the WeatherGenerator framework for climate monitoring applications.

WeatherGenerator is a pan-European initiative that combines state-of-the-art machine-learning architectures with high-performance computing to develop an open, kilometer-scale foundation model of the coupled Earth system. The project is organized into four thematic areas; the application presented here is one of twenty-two applications developed by project partners within Theme 3.

The reconstructed datasets include near-surface atmospheric variables as well as variables at multiple pressure levels. The approach integrates heterogeneous data sources -ranging from in situ observations and reanalysis products to numerical model output- leveraging the foundation model to generate consistent, high-resolution fields suitable for climate and weather monitoring. The target spatial resolution is 1 km, achieved through data fusion techniques and a task-specific tail network trained to produce gridded analyses at this scale. Multiple temporal resolutions are explored, including hourly data and daily to monthly aggregations.

This contribution represents MET Norway’s first presentation of WeatherGenerator-related results at a scientific conference. The focus is therefore on preliminary results, outlining the overall methodological framework and demonstrating the potential of these novel approaches for high-resolution climate monitoring.

Note: The WeatherGenerator project (grant agreement No101187947) is funded by the European Union. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the Commission. Neither the European Union nor the granting authority can be held responsible for them.

How to cite: Lussana, C., Heilemann Myhre, R., Neuville, A., Nordhagen, E. M., Seierstad, I. A., and Nipen, T. N.: Retrospective reconstruction of 40 years of atmospheric fields in Northern Europe using the WeatherGenerator foundation model, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5252, https://doi.org/10.5194/egusphere-egu26-5252, 2026.

17:00–17:10
|
EGU26-14545
|
On-site presentation
Wael Almikaeel and the WeatherGenerator Team

Foundation models have shown strong potential for data-driven weather and climate forecasting by supporting multiple tasks with limited task-specific engineering. The use of architectures that extract maximum value from large, heterogeneous datasets is important in this approach. WeatherGenerator follows this paradigm by learning from diverse observational and reanalysis sources to encode a latent representation of atmospheric dynamics. In this work, we examined the impact of integrating Mixture of Experts (MoE) layers, developed for large language models, into WeatherGenerator, and assessed how MoE can best be incorporated within its decoder architecture.

The motivation behind MoE is straightforward: during training, a router learns to assign tokens to specialized experts, allowing different parts of the decoder to focus on distinct spatial regions or physical regimes. We build on this idea by introducing spatially aware routing, in which geographic context is provided to the router, and by evaluating loss-aware routing strategies that favor experts by minimizing local prediction errors.

We evaluate four MoE decoder configurations, based on the use of spatial context and loss-aware routing, and compare them against the baseline model. Experiments are conducted using ERA5 reanalysis data, with performance measured using global RMSE and MAE for wind components (u, v), temperature (2t, t850), geopotential height (z500), and specific humidity over three 6-hour autoregressive forecast steps.

Across experiments, MoE architectures consistently improve performance for thermodynamic and large-scale variables. In particular, z500 RMSE is reduced by 26–31% at the first forecast step, with spatially aware routing performing best. Near-surface temperature shows a 7% improvement in RMSE and an 11% improvement in MAE when combining spatial and loss-aware routing. These improvements appear early in training within the first few epochs, indicating efficient use of the available data. On the other hand, MoE variants show limited or slightly negative effects at the second and last forecasting step when evaluating wind component variables, while the baseline performance shows similar or better results, especially on the last forecasting step.

These preliminary results indicate that MoE provides variable-dependent benefits, with notable improvements for slowly varying, large-scale thermodynamic fields, but less impact on highly dynamic momentum variables. Ongoing work will further assess performance across longer forecast horizons, different climatic regions, and training with multiple datasets from different sources.

How to cite: Almikaeel, W. and the WeatherGenerator Team: Mixture of Experts with Spatial Routing in a Weather Foundation Model: Early Results from WeatherGenerator, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14545, https://doi.org/10.5194/egusphere-egu26-14545, 2026.

17:10–17:20
|
EGU26-3879
|
ECS
|
Virtual presentation
Ryan O'Loughlin

AI-driven climate models are often criticized as “black boxes,” raising concerns about their credibility for scientific and policy-relevant decision making. Explainable artificial intelligence (XAI) is frequently proposed as a solution, focusing on identifying systematic relationships between model input and output data to characterize model behavior. This paper builds on prior work arguing that trust in both dynamical and AI models depends not on such input-output characterizations alone but on scientists’ component-level understanding of their models (O’Loughlin et al. 2025). Component-level understanding refers to scientists’ ability to point to specific model components or parts in the model architecture as the culprit for erratic model behaviors or as the crucial reason why the model functions well.

We argue that component-level understanding plays a distinctive role in establishing credibility because it expands scientists’ ability to answer a wider range of what-if-things-had-been-different questions. For example, when a model exhibits unexpected sensitivity or instability, component-level understanding enables scientists to ask (and design targeted tests to determine) whether the behavior would persist if a specific parameterization, architectural module, or physically informed constraint were altered. We see examples of this in CMIP, e.g., diagnosing the effect of a cloud microphysics scheme on a model’s climate sensitivity (Gettelman et al. 2019; Zelinka et al. 2020) and in AI-driven climate science as well, e.g., attributing model instability to particular architectural choices such as unconstrained neural network layers or inappropriate spectral representations (e.g., Beucler et al., 2019; Bonev et al., 2023). By linking model behavior to specific components or architectural features, scientists are better positioned to diagnose misbehavior, explore counterfactual scenarios, and explain why a model behaves as it does under varying conditions. This explanatory capacity enables scientists to establish credibility with decision-makers by demonstrating when, why, and under what conditions AI-driven climate models can be trusted.

Such explanations will inevitably be incomplete and context-dependent, particularly in complex models whose components interact in nonlinear ways and are often intended to represent emergent climate phenomena. Nevertheless, we argue that credibility is built through explanatory practices involving model successes and failures alike. We conclude by outlining several pathways for strengthening component-level understanding in AI-driven climate science: scientists may develop such understanding themselves; work in close collaboration with AI model builders and domain experts; design model intercomparison projects that explicitly support component-level diagnosis; or adopt evaluation and benchmarking practices that prioritize explanatory and counterfactual insight alongside predictive performance. On this view, establishing credibility requires organizing scientific work so that explanation remains a central and achievable activity.

References

Bonev, B., et al.: Spherical Fourier Neural Operators…, arXiv [preprint], https://doi.org/10.48550/arXiv.2306.03838. 2023.

Beucler, T., et al. Enforcing analytic constraints in neural networks…. Physical review letters, 126(9), p.098302. 2021.

Gettelman, A. et al. High Climate Sensitivity in the Community Earth System Model Version 2 (CESM2), Geophys. Res. Lett., 46, 8329–8337, https://doi.org/10.1029/2019GL083978, 2019

O'Loughlin RJ. Moving beyond post hoc explainable artificial intelligence… https://doi.org/10.5194/gmd-18-787-2025 2025

Zelinka, M. D., et al. Causes of Higher Climate Sensitivity in CMIP6 Models, Geophys. Res. Lett., 47, e2019GL085782, https://doi.org/10.1029/2019GL085782, 2020

How to cite: O'Loughlin, R.: Earning Credibility in AI-Driven Climate Science: The Role of Component-Level Understanding, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-3879, https://doi.org/10.5194/egusphere-egu26-3879, 2026.

17:20–17:30
|
EGU26-23208
|
ECS
|
Virtual presentation
Ieuan Higgs, Kieran Hunt, Todd Jones, and Anna-Louise Ellis

As artificial intelligence (AI) systems transition from research prototypes to operational tools in Earth system science and forecasting, establishing confidence and trust in their predictions becomes increasingly critical. Although the inputs and outputs of AI models are observable, their internal decision-making processes are often highly complex and difficult for humans to interpret, leading to their frequent characterisation as “black boxes” which are potentially untrustworthy.

In this work, we examine a range of explainable artificial intelligence (XAI) techniques designed to provide insight into AI model predictions. Many of these methods have been developed primarily with classification tasks in mind, raising important questions about their suitability for the regression-based problems that dominate geoscientific applications. We investigate the application of XAI methods to a machine learning derived emulator of the Lorenz ’63 system (an archetypal chaotic dynamical model) and review existing case studies that apply XAI in regression settings relevant to Earth sciences.

We highlight key challenges and limitations of current, general-purpose XAI approaches when applied to chaotic, continuous, high-dimensional, and physically constrained systems. Finally, we identify gaps in existing methodologies and discuss future directions for developing XAI techniques better aligned with the context-specific needs of regression problems in geoscientific modelling and forecasting.

How to cite: Higgs, I., Hunt, K., Jones, T., and Ellis, A.-L.: Evaluating explainable AI Methods for geoscientific regression: insights from applications and a chaotic toy model, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-23208, https://doi.org/10.5194/egusphere-egu26-23208, 2026.

17:30–17:40
|
EGU26-6319
|
On-site presentation
Hyoungnyoun Kim and Jeong Hoon Cho

This study proposes a novel diagnostic framework for systematically and intuitively evaluating the performance of medium-range weather forecasting models from a multivariate perspective. Traditional evaluation methods have primarily relied on point-wise error metrics (e.g., RMSE) for single variables at specific altitudes, which limits the analysis of inter-variable correlations and the dynamic evolution of forecast structures over lead times. To address these limitations, we present a methodology that integrates multivariate data into a shared, topology-preserving feature space, enabling the comparison and diagnosis of model-specific prediction trajectories.

The framework first represents multivariate atmospheric variables as images to extract semantic feature representations. To ensure robustness against spatial shifts and noise, we employ contrastive learning with data augmentation, effectively capturing the core physical characteristics of the atmospheric state. Subsequently, we apply a parametric manifold embedding specifically designed to preserve both the local neighborhood relationships of the high-dimensional feature space and its temporal continuity. This approach allows for a coherent and aligned comparison of prediction trajectories from diverse forecasting models within a unified coordinate system.

For the experimental setup, the feature space was defined using ERA5 reanalysis data from 2020 to 2024, with the 2025 ECMWF analysis serving as the reference ground truth. We analyzed a total of nine forecast configurations, combining three AI-based models (FourCastNet, GraphCast, and Pangu-Weather) with three operational numerical weather prediction initializations (IFS, KIM, and UM). By tracking trajectories at 6-hour intervals for up to 48 lead times, we visually analyzed model-specific dispersion and bias characteristics. Furthermore, the diagnostic validity of the framework was verified by comparing trajectory evolutions across different pressure levels and analyzing structural changes induced by varying variable compositions.

The proposed framework supplements conventional univariate and direction-agnostic metrics by enabling structure-aware, directional diagnostics in a multivariate feature space. It provides deep analytical insights into model-specific behaviors, serving as a critical diagnostic tool for future research on atmospheric pattern analysis and inter-variable correlation structures.

How to cite: Kim, H. and Cho, J. H.: Topology-preserving Feature-Space Analysis for Diagnostic Comparison of Weather Forecasting Models, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-6319, https://doi.org/10.5194/egusphere-egu26-6319, 2026.

17:40–17:50
|
EGU26-15558
|
ECS
|
Virtual presentation
Rio Fear

Foundation models trained on text and images are known to develop abstract internal features that align with human concepts, and that can be directly manipulated via activation steering in order to alter model behaviour. Whether scientific foundation models learn similarly abstract and domain-general representations has remained an open question. Inspired by recent work identifying single directions in activation space which control complex behaviours in LLMs, we show that a Walrus, a large physics foundation model, learns linearly steerable representations of physical phenomena. By computing the delta between activations representing contrasting physical regimes, we identify single directions in activation space that correspond to vorticity, diffusion, and even temporal progression. We find that injecting these concept directions back into the model during inference enables fine-grained causal control: vortices can be induced or removed, diffusion enhanced or suppressed, and simulations sped up or slowed down. Moreover, the concept directions we identified also appear to transfer successfully between unrelated physical systems, indicating that they are domain-general. These results suggest that scientific foundation models indeed learn general representations of physical principles and provides further evidence for the Linear Representation Hypothesis.

How to cite: Fear, R.: Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-15558, https://doi.org/10.5194/egusphere-egu26-15558, 2026.

17:50–18:00
|
EGU26-7781
|
ECS
|
On-site presentation
Philine Lou Bommer, Marlene Kretschmer, Anna Hedstroem, Fanny Lehmann, and Marina M.-C. Hoehne

While AI-based weather foundation models have revolutionized predictive capabilities, their opaque nature and susceptibility to training data biases pose significant challenges for operational trust. A prominent example is Aurora, a state-of-the-art foundation model that demonstrates exceptional hurricane tracking accuracy but consistently underestimates cyclone wind speeds. Because this bias is inherited from the underlying reanalysis data, standard retraining often fails to alleviate the systematic error.

In this work, we propose a novel paradigm for bias correction by adapting AI Steering, a technique recently established for monitoring and adjusting Large Language Model (LLM) behavior, to the domain of climate science. Rather than relying on traditional post-processing or computationally expensive retraining, steering allows us to interrogate and shift the internal neural representations of Aurora without modifying the underlying weights. By identifying the latent features associated with wind speed intensity, we can shift the model’s internal state to align more closely with high-resolution observations.

To evaluate this approach, we run forecasts initialized with IFS-HRES conditions and validate our results against IBTrACS observations. Our results demonstrate that this interpretability-driven approach helps to improve systematic biases by significantly reducing wind speed errors while preserving model integrity and maintaining Aurora’s high-fidelity track accuracy. Furthermore, we show that steering enables a form of "Human-in-the-Loop" oversight, providing a transparent mechanism for meteorologists to adjust model outputs based on physical constraints and domain expertise. By bridging the gap between LLM interpretability and AI-based weather forecasting, we highlight the potential of steering to improve operational forecasts and offer a scalable, transparency-first framework for diagnosing and mitigating failure modes in complex AI-based climate and weather models.

How to cite: Bommer, P. L., Kretschmer, M., Hedstroem, A., Lehmann, F., and Hoehne, M. M.-C.: Guiding the Forecast: Interpretability and AI Steering in Climate Science, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7781, https://doi.org/10.5194/egusphere-egu26-7781, 2026.

Posters on site: Mon, 4 May, 10:45–12:30 | Hall X4

The posters scheduled for on-site presentation are only visible in the poster hall in Vienna. If authors uploaded their presentation files, these files are linked from the abstracts below.
Display time: Mon, 4 May, 08:30–12:30
Chairpersons: Anna-Louise Ellis, Tom Dunstan, Sebastian Schemm
X4.46
|
EGU26-4957
Robin Guillaume-Castel, Stefan Sobolowski, and Camille Li

Neural networks are powerful and widely used tools in weather and climate sciences, but their reliability under climate change remains uncertain as future conditions may be different from their training distribution. One way to build trust in these models is to assess whether they learn physically meaningful relationships rather than spurious correlations. Here, we present a case study investigating whether a simple convolutional neural network (CNN) predicts the occurrence of heavy rainfall in Western Norway for physically interpretable reasons. Since such rainfall is primarily associated with North Atlantic cyclones, we use explainable AI to assess whether the CNN identifies and uses the “correct” cyclones for its predictions.

Using ERA5 reanalysis data, we train a CNN to predict the occurrence of daily heavy rainfall events up to six days ahead from gridded wind and pressure fields. We apply layer-wise relevance propagation (LRP) to identify which regions of the atmospheric input fields contribute most to the model’s predictions. We find that model relevance is spatially aggregated into a small number of coherent patches, with one to three positive relevance patches dominating the prediction in more than 90% of the cases. Physical consistency is assessed by comparing the relevance patterns to objectively tracked cyclones. Interpreting cyclones as being “used” by the network when they spatially overlap with a patch, we show that cyclones contribute positively to the network’s predictions in about 95% of heavy rainfall events. In addition, we show that cyclones highlighted by the network are physically plausible; their trajectories follow the North Atlantic storm track, shifting from the western and central North Atlantic towards the eastern Atlantic and the Norwegian coast as the prediction lead time decreases. These results demonstrate that the CNN learns physically interpretable large-scale dynamics associated with North Atlantic cyclones, providing evidence that explainable AI methods can be used to assess and build trust in machine learning models for weather and climate applications.

How to cite: Guillaume-Castel, R., Sobolowski, S., and Li, C.: A convolutional network learns about the North Atlantic storm track to predict heavy rainfall in Western Norway, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-4957, https://doi.org/10.5194/egusphere-egu26-4957, 2026.

X4.47
|
EGU26-7283
|
Thomas Chitson, James Fallon, Piers Buchanan, and James Shapland

Weather forecasting for aviation allows tens of thousands of flights to operate daily with prior warning of global hazards including in-flight icing, turbulence, convection, and fog. Machine learning (ML) methods have begun to be utilised across the aviation weather forecasting sector and can provide greater skill, lower false alarm rates, and cheaper running costs than conventional equivalent products. Often these products are built on existing numerical weather prediction techniques, but can also be standalone products that make predictions only based on observations. Aviation is a highly regulated and safety-critical industry, so weather forecasting products must meet stringent quality-control standards, and machine learning processes must be trusted by customers.

The Aviation Applications Team at the Met Office has developed a set of 'Trustworthy AI' principles that ML products must strive to adhere to. These principles have guided the recent development of a range of ML driven weather forecasting solutions for aviation including convective cloud detection at UK airfields, auto-TAF (Terminal Aerodrome Forecast) verification, and global convective forecasting capability. In each of these use cases the aviation sector end-users have been considered to ensure the products are trustworthy and explainable.

This study showcases a range of aviation weather forecasting case studies and how they have utilised trustworthy AI techniques including,  explainable AI (XAI), representative AI, and considered how existing 'research to operations' pipelines can be exploited to add trust to machine learning models. The research group has worked with the UK's aviation regulator, the Civil Aviation Authority, to consider what the industry requires to be able to use machine learning safely in UK aviation operations and what can be learned from the long-standing collaboration between the two organisations in developing trusted weather forecasting products.

Future challenges in operationalising ML driven weather forecasting products in the aviation sector include; sparsity of observations for some hazards, shifting baselines for long-term deployment of products, and regulatory hurdles for the approval of AI products.

How to cite: Chitson, T., Fallon, J., Buchanan, P., and Shapland, J.: Integrating 'Trustworthy AI' Principles into Machine Learning for Aviation Weather Forecasting, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7283, https://doi.org/10.5194/egusphere-egu26-7283, 2026.

X4.48
|
EGU26-16587
Jens Pruschke and Roland Potthast

Machine Learning (ML) models, particularly deep neural networks, are often seen as black boxes, offering limited insight into how their predictions are made. This lack of transparency becomes especially important when ML is applied to critical domains such as numerical weather prediction, where traditional models are based on physical laws and differential equations.

Explainable AI (XAI) methods aim to address the black-box behavior by providing tools to interpret and understand model decisions. One such method is Layer-Wise Relevance Propagation (LRP), which traces the output of a neural network backward to assign relevance scores to input features based on their contribution to the prediction.

LRP has since been extended to Graph Neural Networks (GNNs) through the introduction of relevant walks, enabling interpretability in graph-structured data (GNN-LRP). These extensions have shown promise in areas such as image classification, sentiment analysis, and quantum chemistry. At the German National Weather Service (DWD), the AICON forecasting model employs a GNN architecture with message passing, similar in design to the GraphCast model.

In this work, we present an initial exploration of applying GNN-LRP to a simplified, toy version of a GNN model used as a representative of the AICON model. We investigate both saliency map-like visualizations and relevance walks, aiming to identify the most influential input features and their geographical location. While the current results are preliminary and limited in scope, this study tries to lay the groundwork for potential further research into explainability in graph-based weather prediction models.

How to cite: Pruschke, J. and Potthast, R.: Exploring Explainability for Graph-Based Weather Forecasting Models Using Layer-Wise Relevance Propagation, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16587, https://doi.org/10.5194/egusphere-egu26-16587, 2026.

X4.49
|
EGU26-19181
|
ECS
From Prediction to Understanding: Using Explainable AI to Reveal Temporal Drivers of Ecosystem Productivity
(withdrawn)
Thomas Hughes
X4.51
|
EGU26-16880
Ahmet Melih Afşar and Güven Bölükbaşı

The rapid growth of wind and solar power is transforming energy markets, but their inherent variability makes accurate, real-time forecasting more essential than ever. Errors in day-ahead forecasting directly drive up imbalance costs, while the fast-paced nature of intra-day trading requires model inference that is much faster than traditional weather simulations. Foundation weather models such as the WeatherGenerator (WG) offer strong generalization and the potential for low-latency deployment, but their value for the energy sector depends on effective adaptation, as they are not originally designed for plant-specific tasks.

We will present results from applying WG to site-level wind and solar production forecasting in Turkey. The downstream task targets individual plants and is trained and evaluated on historical production observations across a multi-site portfolio. Our focus is on adapting WG for this operational setting by evaluating a spectrum of adaptation strategies, ranging from training task-specific 'tail' networks to fine-tuning the entire model. We report how these choices affect forecast performance and consistency across different sites and conditions, and we describe the resulting workflow in a form that can be carried over to portfolio-scale deployment.

Performance is benchmarked against our current operational baseline, which combines NWP results with machine-learning post-processing. We report MAE as the primary metric and discuss application-oriented indicators that relate forecast improvements to operational value in day-ahead and intra-day settings. The goal is to provide practical guidance on how to translate a foundation weather model into measurable benefits for renewable energy forecasting workflows.

How to cite: Afşar, A. M. and Bölükbaşı, G.: From foundation weather models to renewable operations: Adapting the WeatherGenerator for wind and solar production forecasting, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16880, https://doi.org/10.5194/egusphere-egu26-16880, 2026.

X4.52
|
EGU26-10318
|
ECS
Even Nordhagen, Jesica Pinon Rodriguez, Kjetil Thøgersen, Fabio Zeiser, Erik Tjøtta, and Gaute Lappegard

Long-term energy system planning requires realistic weather scenarios that capture both short-term variability and long-term climate statistics, as well as rare but high-impact events. By preserving spatial and inter-variable correlations, we ensure robust multi-year energy market modelling in systems with large storage capacities, such as the Nordic power market. 

Current weather scenarios are based on ERA5 (Hersbach et al., 2020), where a period of 20 years (2003-2022) is used to establish synthetic weather scenarios eriod of 20 years (2003-2022) is used to establish synthetic weather scenarios (Martino et al., 2017). These weather scenarios consist of real weather but stitched together by different segments of 10 days pulled from the 20 years of samples.  Several statistical techniques, including quantile mapping, are applied during this process. However, this pipeline can introduce unphysical results and is both complex and time-consuming. In contrast, data-driven models offer a cost-effective solution for generating long-term forecasts efficiently.

In this study, the WeatherGenerator is employed to generate year-long independent weather scenarios by running the model under varying initial conditions. The analysis focuses on the Nordic region, where we evaluate the capability of the WeatherGenerator to reproduce long-term climate statistics for key variables. 

Its performance is benchmarked against weather scenarios produced by current in-house methodology and potentially alternative data-driven models such as AIFS or Bris.

Note: The WeatherGenerator project (grant agreement No101187947) is funded by the European Union. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the Commission. Neither the European Union nor the granting authority can be held responsible for them.

 

(Hersbach et al., 2020) H. Hersbach, B. Bell, P. Berrisford, S. Hirahara, A. Horányi, J. Muñoz-Sabater, J. Nicolas, C. Peubey, R. Radu, D. Schepers, A. Simmons, C. Soci, S. Abdalla, X. Abellan, G. Balsamo, P. Bechtold, G. Biavati, J. Bidlot, M. Bonavita, G. De Chiara, P. Dahlgren, D. Dee, M. Diamantakis, R. Dragani, J. Flemming, R. Forbes, M. Fuentes, A. Geer, L. Haimberger, S. Healy, R. J. Hogan, E. Hólm, M. Janisková, S. Keeley, P. Laloyaux, P. Lopez, C. Lupu, G. Radnoti, P. de Rosnay, I. Rozum, F. Vamborg, S. Villaume, and J.-N. Thépaut, “The ERA5 global reanalysis,” Quarterly Journal of the Royal Meteorological Society, vol. 146, no. 730, pp. 1999– 2049, 2020

(Martino et al., 2017) S. Martino, T. N. Nipen, C. Lussana and S. Kolberg “A stochastic weather generator based on resampling historical ensemble weather forecasts and its application to hydrological simulation”, 2017, SINTEF Energi AS, ISSN: 1504-9795 

 

How to cite: Nordhagen, E., Pinon Rodriguez, J., Thøgersen, K., Zeiser, F., Tjøtta, E., and Lappegard, G.: Data-Driven Weather Scenario Generation for Long-Term Energy System Planning in the Nordic Region, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-10318, https://doi.org/10.5194/egusphere-egu26-10318, 2026.

X4.53
|
EGU26-4637
Congyi Nai, Xi Chen, Shangshang Yang, Ziniu Xiao, and Baoxiang Pan

Accurate weather forecasting is essential for a broad range of socioeconomic activities. While emerging data-driven models match numerical weather prediction accuracy with reduced computational cost, their deterministic nature overlooks uncertainties in initial state estimates, model systematic biases, and stochasticity arising from unresolved subgrid physical processes. This obliviousness results in over-confident deterministic predictions that render uncertainty quantification inaccessible, thereby limiting their utility for risk-based decision-making.

To address these challenges, we present the Generative Ensemble Prediction System (GenEPS), a framework that systematically explores uncertainties in initial states, model formulations, and model stochasticity. GenEPS functions as a foundation model that has explicitly learned the probability distribution of high-dimensional atmospheric states. It provides a plug-and-play solution for ensemble forecasting with arbitrary deterministic models. Specifically, GenEPS utilizes deterministic forecasts as conditions to perform generative sampling, producing an ensemble of states projected back into the realistic atmospheric phase space defined by ERA5. This stochastic sampling process quantifies uncertainties in initial conditions and forecast dynamics while ensuring physical consistency. Crucially, by treating each step as a re-initialization within the valid state space, the framework decouples state evolution from specific model formulations, enabling seamless cross-model integration to mitigate systematic biases.

By explicitly representing all three sources of uncertainty, GenEPS outperforms state-of-the-art numerical ensemble predictions and data-driven predictions when evaluated against ERA5 reanalysis data using both deterministic and probabilistic metrics. GenEPS also enhances extreme event predictions, offering physically consistent forecast fields. These advances establish a new paradigm in ensemble forecasting through multi-model generative integration, combining a surging number of data-driven weather forecasting models and potentially numerical models, to achieve more reliable predictions.

How to cite: Nai, C., Chen, X., Yang, S., Xiao, Z., and Pan, B.: GenEPS: A Generative Foundation Model for Probabilistic Weather Forecasting , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-4637, https://doi.org/10.5194/egusphere-egu26-4637, 2026.

X4.55
|
EGU26-11888
|
ECS
Tropospheric Transport Simulation with the WeatherGenerator Prototype Model
(withdrawn)
Belkis Asma Semcheddine, Savvas Melidonis, Martin G. Schultz, and Christian Lessig
X4.56
|
EGU26-14846
|
ECS
|
Piotr Wilczyński, Fanny Lehmann, Firat Ozdemir, Salman Mohebi, Yun Cheng, Oliver Fuhrer, Siddhartha Mishra, Mathieu Salzmann, Benedikt Soja, Sebastian Schemm, and Torsten Hoefler

Foundation models for the Earth system have gained popularity, as they are starting to surpass numerical solvers in the accuracy of predicting Earth’s condition while requiring fewer computational resources. The Earth System Foundation Model (ESFM) contributes to this research direction by further extending the foundation models' flexibility.

The forecasting capabilities of ESFM are achieved in an autoregressive manner, using data from the t0 - Δt and t0 timesteps to produce a prediction for t0 + Δt. This approach is effective on weather timescales. Moreover, we find that it also delivers encouraging results for long-term forecasts, showing reasonable zero-shot subseasonal-to-seasonal (S2S) predictions (15–40 days). 

However, S2S predictions can be further improved while preserving weather skills. This work investigates strategies for this purpose. On such timescales, it is crucial to produce probabilistic predictions to better represent inherent uncertainty. Probabilistic predictions are realised with the introduction of multiple decoder heads (tails) for each variable. Each tail is intended to simulate a different possible trajectory, which, when combined, provides an estimate of the most probable outcome together with the spread of feasible values. To better estimate the distribution of possible values on the S2S timescale, additional trajectories are generated by running multiple predictive rollouts with different initial conditions.

Another strategy to improve S2S rollouts is to fine-tune the model to produce outputs for more distant steps. To this end, we leverage LoRA adapters (Hu et al., 2022), which are trained for each subsequent rollout step. This approach effectively improves predictive performance on long horizons, without significantly affecting training complexity or inference cost.

We also observe that some predictive variables of the model, such as climate forcings, are slowly evolving and can benefit from incorporating inputs from a more distant past than the t0 - Δt and t0 timesteps commonly used. To investigate this, we introduce an Attention Temporal Aggregator in the encoder, which leverages learned patch embeddings from an arbitrary number of previous timesteps and attends to those that are most informative for a given variable. In this way, for rapidly changing variables such as wind speed, the model focuses on the most recent data, whereas for slowly evolving variables such as sea surface temperature, it can utilise a broader range of inputs.

Overall, our experiments provide new insights into the development of foundation models for the Earth system, enabling improved predictions on S2S timescales, while conserving performance for weather forecasts.

References:
E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022

How to cite: Wilczyński, P., Lehmann, F., Ozdemir, F., Mohebi, S., Cheng, Y., Fuhrer, O., Mishra, S., Salzmann, M., Soja, B., Schemm, S., and Hoefler, T.: Subseasonal-to-Seasonal strategies for the Earth System Foundation Model (ESFM), EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14846, https://doi.org/10.5194/egusphere-egu26-14846, 2026.

X4.57
|
EGU26-19816
|
ECS
Ivica Obadic, Luca Rigon, and Xiaoxiang Zhu

Attention-based deep learning models are becoming a ubiquitous approach for modeling the complex temporal dependencies in many vital Earth observation applications, such as agricultural monitoring. They typically consist of multiple attention heads, with each head containing attention weights that determine how temporal information is combined for the model's prediction. While analyzing the attention weights can provide insights into the model's workings, the existence of multiple heads makes it difficult to comprehend the extracted temporal information by the model. To overcome this issue, we propose an inherently interpretable approach that automatically weights the head importance during the model's forward pass. Our evaluation on the task of crop-type classification shows that the model maintains high accuracy while simplifying interpretation by highlighting only the most significant attention heads.

How to cite: Obadic, I., Rigon, L., and Zhu, X.: An Inherently-Interpretable Approach to Uncover the Head Importance of Attention Networks, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19816, https://doi.org/10.5194/egusphere-egu26-19816, 2026.

X4.58
|
EGU26-20782
|
ECS
Neural Compression of Remote Sensing Data for the Pre-Training of Geospatial Foundation Models
(withdrawn)
Sebastian Hoffmann, Markus Zehner, Vitus Benson, Marieke Wesselkamp, Georg Martius, and Markus Reichstein
X4.59
|
EGU26-21352
Bart Schilperoort, Robin Richardson, Peter Kalverla, and Gijs van den Oord

Accurate nowcasting of high-intensity precipitation is essential for flood modeling, disaster management, and decision making. Due to the nature of precipitation, the intensity and timing can strongly vary spatially. While some areas of the world have dense networks of openly available automated weather stations or weather radars, these are not available everywhere. In sub-Sahara Western Africa, high-intensity precipitation has a high risk of causing hazardous flash floods, and with very little radar data available in the region, nowcasting is mostly restricted to available satellite products. 

Using WeatherGenerator atmospheric foundation model, we explore the viability of training a machine learning model to accurately nowcast heavy precipitation in Western Africa. We investigate fine-tuning the pre-trained WeatherGenerator to SEVIRI output, training a tail network that predicts rainfall retrieval from the MSG-CPP product. We also explore transfer learning with WeatherGenerator, using a decoder trained to EURADCLIM over the European continent with SEVIRI input and assessing its accuracy over the target region. 

This effort adds to our understanding of the flexibility and added value of WeatherGenerator as a foundation model for weather and climate. It also serves as a pilot for upcoming service projects that the WeatherGenerator consortium will offer to the earth-scientific community, focusing on a broad range of applications and stakeholders. 

How to cite: Schilperoort, B., Richardson, R., Kalverla, P., and van den Oord, G.: Precipitation nowcasting over Western Africa using transfer learning with WeatherGenerator, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21352, https://doi.org/10.5194/egusphere-egu26-21352, 2026.

Posters virtual: Mon, 4 May, 14:00–18:00 | vPoster spot 1b

The posters scheduled for virtual presentation are given in a hybrid format for on-site presentation, followed by virtual discussion on Zoom. Attendees are asked to meet the authors during the scheduled presentation & discussion time for live video chats; onsite attendees are invited to visit the virtual poster sessions at the vPoster spots (equal to PICO spots). If authors uploaded their presentation files, these files are also linked from the abstracts below. The button to access the Zoom meeting appears 15 minutes before the time block starts.
Discussion time: Mon, 4 May, 16:15–18:00
Display time: Mon, 4 May, 14:00–18:00
Chairperson: Filippo Accomando

EGU26-1413 | Posters virtual | VPS21

Bridging Global AI Models and Local Extremes: A Dual-Stream Framework for Correcting and Downscaling GraphCast Rainfall Predictions 

Dandan Chen
Mon, 04 May, 14:30–14:33 (CEST)   vPoster spot 1b

Data-driven global weather models, such as GraphCast, have revolutionized medium-range forecasting but often exhibit systematic limitations in quantitative precipitation forecasting (QPF). Specifically, these models tend to produce over-smoothed blurry rainfall fields and underestimate localized extremes , primarily due to the inherent uncertainties in their reanalysis training data (e.g., ERA5) and the use of mean-squared-error-based loss functions.

To bridge the gap between coarse-resolution global AI forecasts and the need for precise, high-impact weather prediction, we introduce SynQPF-Net, a deep learning framework designed to synergize GraphCast’s dynamical background fields with high-resolution observational analyses. The model employs a dual-stream spatiotemporal encoder to process heterogeneous inputs: the 0.25o dynamical forecasts from GraphCast and the 0.0625o precipitation analyses from the China Meteorological Administration Land Data Assimilation System (CLDAS) . A specialized hybrid loss function, combining classification (Dice) and regression (Weighted MSE) objectives, is utilized to jointly optimize the spatial structure and intensity of precipitation.

Evaluated on warm-season events in Southern China, our approach demonstrates significant skill improvements. SynQPF-Net effectively sharpens the forecast, doubling the Critical Success Index (CSI) for heavy rainfall (>=10 mm) at the 6-hour lead time compared to the raw GraphCast output. Crucially, interpretability analysis reveals that the model learns physically consistent meteorological principles: it predominantly relies on extrapolating recent observational patterns for short lead times (<=12 h) and dynamically shifts its focus to large-scale circulation and moisture variables (e.g., 700 hPa specific humidity) as the forecast horizon extends. This work provides a validated pathway for correcting and downscaling global AI weather models, offering a robust solution for short-range extreme precipitation forecasting.

How to cite: Chen, D.: Bridging Global AI Models and Local Extremes: A Dual-Stream Framework for Correcting and Downscaling GraphCast Rainfall Predictions, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-1413, https://doi.org/10.5194/egusphere-egu26-1413, 2026.

Posters virtual: Wed, 6 May, 14:00–18:00 | vPoster spot 1b

The posters scheduled for virtual presentation are given in a hybrid format for on-site presentation, followed by virtual discussion on Zoom. Attendees are asked to meet the authors during the scheduled presentation & discussion time for live video chats; onsite attendees are invited to visit the virtual poster sessions at the vPoster spots (equal to PICO spots). If authors uploaded their presentation files, these files are also linked from the abstracts below. The button to access the Zoom meeting appears 15 minutes before the time block starts.
Discussion time: Wed, 6 May, 16:15–18:00
Display time: Wed, 6 May, 14:00–18:00
Chairperson: Andrea Barone

EGU26-7344 | Posters virtual | VPS22

Investigations into the Reaction of the Pangu ML Weather Model to Different Initial Conditions
(withdrawn)

Helen Buttery
Wed, 06 May, 14:15–14:18 (CEST)   vPoster spot 1b

EGU26-16232 | Posters virtual | VPS22

Bias Correction of Numerical Weather PredictionWind Fields in Southern Tamil Nadu RegionUsing Machine Learning Techniques 

Vishnu Pm and Balaji Chakravarthy
Wed, 06 May, 14:18–14:21 (CEST)   vPoster spot 1b

Accurate high resolution wind field prediction is essential for wind resource as-
sessment, renewable energy planning, and regional weather analysis. Although
Numerical Weather Prediction (NWP) models such as the Weather Research
and Forecasting (WRF) model provide physically consistent wind forecasts, their
outputs often suffer from systematic biases arising from uncertainties in surface
characteristics, simplified physical parameterizations, and resolution limitations.
Furthermore, increasing model resolution to the kilometer scale significantly
raises computational cost. To address these challenges, this study presents a
machine learning–based framework for bias correction of WRF-simulated wind
fields over the Southern Tamil Nadu region, with particular focus on the Mup-
pandal wind farm area.
An extensive validation of WRF configurations was first performed using mul-
tiple physics scheme combinations and domain setups, evaluated against ERA5
reanalysis data. The optimal configuration was identified and used to gener-
ate three years (2023–2025) of wind simulations at 3 km × 3 km resolution.
Significant biases were observed in the raw WRF outputs, motivating the appli-
cation of an Artificial Neural Network (ANN) based bias correction approach.
A Random Forest algorithm was employed for feature selection, followed by
Principal Component Analysis (PCA) to reduce dimensionality while retaining
95% of the variance. A feedforward neural network with multiple hidden layers
was trained to correct the U10 and V10 wind components, with the hyperbolic
tangent activation function yielding the best performance. The bias-corrected
wind fields exhibited substantial improvement in mean and extremes, achieving low error metrics and
strong correlation with ERA5 data.
The results demonstrate that combining physically based NWP simulations with
machine learning driven bias correction provides an accurate and computation-
ally efficient approach for generating high-resolution wind fields. This hybrid
framework offers significant potential for wind energy assessment and localized
meteorological applications in data-sparse regions.

How to cite: Pm, V. and Chakravarthy, B.: Bias Correction of Numerical Weather PredictionWind Fields in Southern Tamil Nadu RegionUsing Machine Learning Techniques, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16232, https://doi.org/10.5194/egusphere-egu26-16232, 2026.

Login failed. Please check your login data. Lost login?