ESSI3.4 | Scientific software development in the Geosciences: good practices, pitfalls and solutions
Scientific software development in the Geosciences: good practices, pitfalls and solutions
Co-organized by AS5/BG10/GD6/GI2/GMPV12
Convener: Diego Jiménez de la Cuesta OteroECSECS | Co-conveners: Clarissa KrollECSECS, Iris Ehlert
Orals
| Tue, 05 May, 16:15–17:50 (CEST)
 
Room -2.33
Posters on site
| Attendance Mon, 04 May, 16:15–18:00 (CEST) | Display Mon, 04 May, 14:00–18:00
 
Hall X4
Posters virtual
| Mon, 04 May, 14:27–15:45 (CEST)
 
vPoster spot 1b, Mon, 04 May, 16:15–18:00 (CEST)
 
vPoster Discussion
Orals |
Tue, 16:15
Mon, 16:15
Mon, 14:27
Motivation

Although in some communities (e.g., meteorology, climate science) the tradition of software writing has a long history, most scientists are not trained software engineers. For early-stage scientific software projects, which are typically developed within small research groups, there is often little expectation that the code will (1) be used by a larger community, (2) be further developed or extended by others, or (3) be integrated into larger projects. This can lead to an “organic” evolution of code bases that result in challenges related to documentation, maintainability, usability, reusability, and the overall quality of the software and its results.

The wider availability of large computing resources in recent decades, along with the emergence of large datasets and increasingly complex numerical models, has made it more important than ever for scientific software to be well-designed, documented, and maintainable. However, (1) established practices in scientific programming, (2) pressures to produce high-quality results efficiently, and (3) rapidly growing user and developer communities, can make it challenging for scientific software projects to

- follow a common set of standards and a style,
- are fully documented,
- are user-friendly, and
- can be maintained, easily extended or reused.

Session content and objectives

We invite developers or users of software projects to prepare presentations about the challenges and successes in the following topics

- Good practices for developing scientific software
- Modularization
- Documentation
- Linting
- Version control
- Open source and open development
- Automatization of quality checks and unit testing
- Planning new projects
- User requirements and the user-turned-developer problem
- Painless and energy-efficient programming solutions across computing architectures
- Modularization and reliability vs performance and multiplatform capacity
- Large-dataset compression and storage workflows

These presentations will show how different projects across geoscientific fields tackle these problems. We can discuss new strategies for bettering scientific software development and raising awareness within the scientific community that robust and well-structured software development enables meaningful and reproducible results, supports researchers —especially doctoral and post-doctoral students— in their work, and accelerates advances in data- and modelling-driven science.

Orals: Tue, 5 May, 16:15–17:50 | Room -2.33

The oral presentations are given in a hybrid format supported by a Zoom meeting featuring on-site and virtual presentations. The button to access the Zoom meeting appears just before the time block starts.
Chairpersons: Diego Jiménez de la Cuesta Otero, Clarissa Kroll, Iris Ehlert
16:15–16:20
Community-friendly software development
16:20–16:30
|
EGU26-1645
|
On-site presentation
Wilton Jaciel Loch

natESM is a project that brings together German resources to develop a seamless, multiscale Earth System Modelling framework that can serve multiple purposes. This system is composed of several independent and diverse software models from the community, each addressing different parts of the Earth system. Given the variety of programming languages, model sizes and software architectures involved, as well as different experience among the responsible model developers, challenges arise in portability, performance and software quality. 

A key part of the natESM approach is the technical support to model developers provided by Research Software Engineers (RSEs). Their work focuses not only on integration, portability and performance, but also on systematically improving software quality within and across model components. This talk will outline the progress made so far, highlight lessons learned from the RSE-scientist collaborations, and present our future plans for assessing and enhancing software quality. The experiences and methods developed in natESM might serve as an example for improving software sustainability in Earth System Modeling more broadly.

How to cite: Loch, W. J.: The natESM Journey for Improving Software Quality in Earth System Modelling, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-1645, https://doi.org/10.5194/egusphere-egu26-1645, 2026.

16:30–16:40
|
EGU26-8712
|
Virtual presentation
Aidan Heerdegen, Tommy Gatti, Harshula Jayasuriya, Thomas McAdam, Johanna Basevi, and Kelsey Druken

Modern software development practices such as continuous integration compilation, testing and deployment are a requirement for robust and trusted climate model development. However, this can be very challenging to achieve with climate models that often include legacy code requiring very specific versions of scientific libraries and that must run on complex HPC systems.  In addition, climate models have very long support timeframes (5+ years), with a requirement for absolute bitwise reproducibility, which requires precise control and provenance of the entire software stack. 

Australia’s Climate Simulator (ACCESS-NRI), is a national research infrastructure tasked with supporting the development and use of the Australian Community Climate and Earth System Simulator (ACCESS) model suite for the research community. At ACCESS-NRI we use spack, a build from source package manager targeting HPC, to create infrastructure to easily build ACCESS climate models and their supporting software stacks with full provenance and build reproducibility.  

Now the challenge for us at ACCESS-NRI, as an infrastructure supporting a wide range of user needs, is to scale this effort to multiple models, with many permutations of components and versions, without creating a very large support burden for our software engineers.  

We do this by focusing on modularity and generic workflows to achieve our desired scale efficiently. Spack's modular design has meant ACCESS-NRI has been able to create entirely generic GitHub workflows for building, testing and deploying many climate models on our target HPC, Australia’s National Computational Infrastructure (NCI), as well as run test builds on standard Linux virtual machines.  

As a result there is dramatically less support burden, as the CI/CD code is centralised and maintained in one location, and reused in many places. It is also extremely simple to add CI testing for new model components with just a few lines of GitHub Actions code. 

The choice of tools allowing a focus on a modular approach and generic workflows has been validated: we currently support seven models, with nineteen discrete components, and have grown from one deployment in 2023, eleven in 2024 and now twenty-nine in 2025,  as well as many thousands of pre-release test builds in the last quarter alone. This gives us confidence that we can continue to scale efficiently, without a large support burden requiring onerous resourcing that might otherwise place a technical limit on future activities. 

How to cite: Heerdegen, A., Gatti, T., Jayasuriya, H., McAdam, T., Basevi, J., and Druken, K.:  Modern tools to scale the compilation, testing and deployment of scientific software , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8712, https://doi.org/10.5194/egusphere-egu26-8712, 2026.

16:40–16:50
|
EGU26-13932
|
On-site presentation
Rene Gassmöller, Wolfgang Bangerth, Juliane Dannberg, Daniel Douglas, Menno Fraters, Anne Glerum, Timo Heister, Lorraine Hwang, Robert Myhill, John Naliboff, Arushi Saxena, and Cedric Thieulot

Modeling software is integral to computational geodynamics, enabling quantitative investigation of planetary mantle, lithosphere and core dynamics across a wide range of spatial and temporal scales. Over the past two decades, the field’s software ecosystem has shifted significantly: codes that were once developed and maintained within single research groups have increasingly evolved into large, modular packages sustained by multi-institutional and often international collaborations. One important factor in this transition has been the establishment of community organizations like the Computational Infrastructure for Geodynamics (CIG), which has provided coordination and shared capacity that individual groups typically cannot sustain on their own.
In this contribution, I highlight benefits and lessons learned from work within CIG and from the development of the geodynamic modeling software ASPECT (Advanced Solver for Planetary Evolution, Convection, and Tectonics). Community organizations can accelerate scientific software development in several ways. Shared infrastructure (project landing pages, established user forums) improves discoverability and supports software adoption by the community. Targeted support, including seed funding, helps projects invest in feature development and maintenance. By streamlining software release and distribution and promoting robust development and testing workflows, community organizations improve software quality and reliability. Training the next generation of computational geoscientists through workshops, tutorials, and user support, builds shared expertise and makes community software more sustainable. Collectively, these activities reduce duplicated effort, lower barriers to entry for new users and contributors, and create pathways for software to evolve in step with scientific and numerical-method advances.
ASPECT provides a concrete example of this community-driven model. Designed to simulate thermal convection with a primary emphasis on Earth’s mantle, it has now been used for a broad range of applications including crustal deformation, magma dynamics, and fluid flow, convection on icy satellites, deformation of the inner core, and digital twins of mineral physics experiments. This widening scope has been possible because ASPECT prioritizes usability and extensibility, to accommodate evolving model complexity, and leverages modern numerical methods such as adaptive mesh refinement and robust linear/nonlinear solvers. From the start, ASPECT has been designed for large-scale parallel simulations required for problems with small-scale features embedded in mantle-scale domains.  It also strategically builds on established external libraries (e.g., deal.II, Trilinos, p4est) rather than re-implementing core algorithms. ASPECT’s success has been enabled by a well-tested framework, extensive documentation, a plugin architecture that simplifies customization, and active encouragement of community contributions through support and recognition. Together, these elements illustrate how organizational infrastructure and software design choices support long-term development and continued methodological innovation in geodynamic modeling, enabling robust simulations that address increasingly complex scientific questions.

How to cite: Gassmöller, R., Bangerth, W., Dannberg, J., Douglas, D., Fraters, M., Glerum, A., Heister, T., Hwang, L., Myhill, R., Naliboff, J., Saxena, A., and Thieulot, C.: Software as Scientific Infrastructure: CIG’s Role  in Computational Geodynamics and Lessons from Developing ASPECT, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13932, https://doi.org/10.5194/egusphere-egu26-13932, 2026.

Adapting to new architectures
16:50–17:00
|
EGU26-21175
|
On-site presentation
Daniel Hupp, Mauro Bianco, Anurag Dipankar, Till Ehrengruber, Nicoletta Farabullini, Abishek Gopal, Enrique Gonzalez Paredes, Samuel Kellerhals, Xavier Lapillonne, Magdalena Luz, Christoph Müller, Carlos Osuna, Christina Schnadt, William Sawyer, Hannes Vogt, and Yilu Chen

MeteoSwiss uses the ICON model to produce high-resolution weather forecasts at kilometre scale, with GPU support enabled through an OpenACC-based Fortran implementation. While effective, this approach limits portability, maintainability, and development flexibility. Within the EXCLAIM project, we focus on the dynamical core of the model—responsible for approximately 55% of the total runtime—and explore alternatives based on a domain-specific Python framework. In particular, we reimplemented the computational stencils using GT4Py and integrated them into the existing Fortran codebase, enabling the partial replacement of key components. This hybrid approach aims to improve developer productivity and code adaptability while preserving performance. In this contribution, we present our strategy for developing software for a weather and climate model involving multiple institutions and stakeholders. We present several optimisation techniques and compare the performance of the new implementation with the original OpenACC version. Our results show improved computational efficiency alongside a substantial improvement in the development workflow. Finally, we discuss the practical challenges of integrating Python components into operational numerical weather prediction systems.

How to cite: Hupp, D., Bianco, M., Dipankar, A., Ehrengruber, T., Farabullini, N., Gopal, A., Gonzalez Paredes, E., Kellerhals, S., Lapillonne, X., Luz, M., Müller, C., Osuna, C., Schnadt, C., Sawyer, W., Vogt, H., and Chen, Y.: A Python Dynamical Core for Numerical Weather Prediction, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21175, https://doi.org/10.5194/egusphere-egu26-21175, 2026.

17:00–17:10
|
EGU26-17569
|
On-site presentation
Annika Lauber, Chiara Ghielmini, Daniel Hupp, and Claire Merker

Porting large numerical models to heterogeneous computing architectures introduces significant challenges for software validation and testing, as results from CPU- and GPU-based executions are typically not bit-identical. These differences arise from variations in floating-point arithmetic, execution order, and the use of architecture-specific mathematical libraries. Traditional regression testing approaches based on exact reproducibility therefore become inadequate, particularly in continuous integration (CI) workflows.

Probtest is a lightweight testing framework developed to address this problem in the ICON numerical weather and climate model. It implements a probabilistic, tolerance-based testing strategy that enables robust numerical consistency checks between CPU and GPU runs while remaining fast and resource-efficient. Tolerances are derived from ensembles generated by perturbing prognostic variables in the initial conditions. From a larger ensemble of CPU reference runs, a representative subset is selected to compute variable-specific tolerance ranges that define acceptable numerical deviations. This approach allows reliable validation across architectures without constraining model development or optimization.

Recent developments focus on improving extensibility, usability, and reproducibility. Support for Feedback Output Files (FOF) has been added, enabling consistency checks for observation-based diagnostics in addition to model state variables. Furthermore, Probtest has been fully containerized, with each release published on Docker Hub. This removes local installation barriers, ensures reproducible testing environments, and simplifies integration into CI pipelines and collaborative development workflows. These developments strengthen Probtest as a practical and portable tool for validating ICON across heterogeneous computing platforms.

How to cite: Lauber, A., Ghielmini, C., Hupp, D., and Merker, C.: Latest Developments in Probtest: Probabilistic Testing for Robust CPU/GPU Validation of Scientific Models, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-17569, https://doi.org/10.5194/egusphere-egu26-17569, 2026.

Best Practices
17:10–17:20
|
EGU26-7565
|
On-site presentation
Patrick Jöckel, Astrid Kerkweg, Kerstin Hartung, and Bastian Kern

Earth System Models (ESMs) aim at replicating the essence of the Earth Climate System in numerical simulations on high performance computing (HPC) systems. The underlying software is often rather complex, comprising several source code entities (modules and libraries, sometimes combining different programming languages), and has in many cases grown over decades. ESMs are usually structured as “multi-compartment” models, i.e. disassembled into a set of different components, each of which describes a different compartment in the Earth System, such as the atmosphere, the land surface, the ocean, the cryosphere, the biosphere, etc. Each compartment model, in turn, comprises a series of algorithms (numerical solvers, parametrizations), each of which represents a specific physical, chemical or socio-economic process. The behaviour of the “system as a whole” (i.e., the development of its state over time, its response to perturbations) is characterized by non-linear interactions and feedbacks between the different compartments and processes.

The implementation of such numerical models representing these inter-compartment and inter-process connections (i.e. the coupling) poses a challenging task for the software development, in particular given the need for (scalable) continuous further development and integration of new components, aiming at keeping pace with our knowledge about the real Earth System. Common requirements to such software are maintainability, sustainability (e.g. for new HPC architectures), resource efficiency (performance at run-time), but also development scalability.

More than twenty years ago (in 2005) we proposed the Modular Earth Submodel System (MESSy) as a potential new approach to Earth System modelling. Here, we present how we started as an “atmospheric chemistry add-on” to a specific General Circulation Model, but already with a wider range of applications in mind. We further show, how we went through our 2nd development cycle, finally arriving at our current state, the MESSy Integrated Framework that is soon to be released Open Source. Although our 4 major software design principles (will be presented!) did not change significantly from the early stage, we had to undergo several implementation revisions to reach its current state. Despite the continuous development, MESSy was always “state-of-the art” and “in operation”, i.e. used for scientific research. Thus, in retrospect, we present some of the milestones achieved by “pragmatic” software engineering in practice.

How to cite: Jöckel, P., Kerkweg, A., Hartung, K., and Kern, B.: The Modular Earth Submodel System (MESSy): lessons learned from 20+ years of continuous development, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7565, https://doi.org/10.5194/egusphere-egu26-7565, 2026.

17:20–17:30
|
EGU26-8659
|
On-site presentation
Micael J. T. Oliveira, Edward Yang, Manodeep Sinha, and Kelsey Druken

Australia’s Climate Simulator, ACCESS-NRI, is Australia’s National Research Infrastructure (NRI) for climate modelling, supporting the development and community use of the Australian Community Climate and Earth System Simulator (ACCESS). 

As the ACCESS modelling system evolves to meet user requirements, so does the basic infrastructure that underpins our ability to efficiently run the models, with HPC architectures rapidly shifting towards GPUs, and new developments in Machine Learning disrupting how new models are developed and used. Under such circumstances, it's easy for scientists and software engineers to focus on more pressing matters and spend less time worrying about software maintainability. Although such type of "tactical" programming might bring benefits in the short term, long-term software maintainability and sustainability requires a more strategic approach. 

Using ACCESS-NRI as a case study, this presentation argues that addressing these challenges is not about any single tool or practice, but about adopting an integrated and coordinated strategy for scientific software development. I will describe how ACCESS-NRI is tackling these challenges by bridging skills and training gaps between scientists and software engineers, adopting well-established industry standards where appropriate (e.g. CMake, Git), and embedding software engineering best practices across development workflows. Alongside these technical efforts, addressing the social challenges of collaboratively developing large, open-source software is a key part of our approach, ensuring contributors can work effectively towards shared goals. 

A concrete example is GPU porting within the ACCESS modelling system. Successfully porting code to GPUs has required close collaboration with existing code owners, careful consideration of scientific and performance constraints, and a strong emphasis on avoiding divergent code paths that are difficult to maintain. This experience highlights the importance of the social dimensions of software development: changes cannot simply be imposed, but must be developed collaboratively to balance reliability, performance, portability, and long-term sustainability. 

By reflecting on what has worked—and what has not—this talk aims to share practical lessons that are transferable to other scientific software projects as they grow beyond small research teams into widely used, community-supported systems.

How to cite: Oliveira, M. J. T., Yang, E., Sinha, M., and Druken, K.: Improving long-term maintainability of the ACCESS models while transitioning to new architectures: challenges and opportunities, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8659, https://doi.org/10.5194/egusphere-egu26-8659, 2026.

17:30–17:40
|
EGU26-3484
|
ECS
|
On-site presentation
Lionel Constantin and Dominik Brunner

Scientific software often begins as an internal research tool developed by scientists rather than trained software engineers, resulting in limited usability, documentation, and maintainability. emiproc, a tool for processing emission inventories for atmospheric chemistry and transport models, originally followed this trajectory: it grew organically within our laboratory, offered only a command-line interface, and lacked a clear structure, extensibility, and user-oriented documentation. We recently undertook a full modernization of emiproc following the best practices in scientific software development: redesign of the code base into modular components, consistent object oriented Python API, automated testing with continuous integration, extensive documentation for both users and developers and publication in the Journal of Open Source Software. The updated software now supports some of the most widely used emission inventories such as EDGAR and CAMS, and more specific ones like the City of Zurich inventory, and produces output for various transport models like ICON-ART, WRF, or GRAL. We will highlight our approaches for transforming emiproc into a sustainable and user-friendly tool and reflect on the challenges we encountered along the way. By sharing our experience, we aim both to contribute to the discussion on improving scientific software development and to learn from the approaches used by others. 

How to cite: Constantin, L. and Brunner, D.: Scientific Software Developement: Lessons from our Emission inventory processing software emiproc  , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-3484, https://doi.org/10.5194/egusphere-egu26-3484, 2026.

17:40–17:50
|
EGU26-16877
|
On-site presentation
Eric Hutton, Gregory Tucker, Mark Piper, and Tian Gan

Lowering the barrier to scientific contribution requires more than adopting good software practices; it requires software structures and standards that make contribution and reuse safe, scoped, and sustainable. We describe how the Community Surface Dynamics Modeling System (CSDMS) addresses these challenges through two complementary efforts: the Landlab modeling framework and the Basic Model Interface (BMI).

Landlab is a Python package designed as a platform for building Earth-surface process models. Over time, we discovered its architecture also promoted the user-turned-developer pathway, which has been critical to its success. While good software practices such as automated testing, continuous integration, documentation, and linting provide a foundation of reliability, Landlab’s component-based architecture has been central to enabling contribution. This design offers contributors clearly scoped and isolated entry points for adding new process models without needing to understand or modify the entire codebase. By enabling contributions from a growing set of domain experts and supporting them through shared maintenance infrastructure, this model expands the pool of invested contributors and reduces reliance on a small number of core developers, strengthening the prospects for long-term project sustainability.

The Basic Model Interface (BMI) complements this approach by providing a lightweight, language-agnostic interface standard that defines how models expose their variables, parameters, and time-stepping controls to the outside world. By separating scientific algorithms from model orchestration, BMI enables models to be reused, coupled, and tested across different frameworks without requiring changes to their internal implementations. Ongoing, community-guided work toward BMI 3.0 aims to extend these capabilities by improving support for parallel execution, clearer state management, and optional interface extensions.

Together, Landlab and BMI illustrate how framework design and community-driven standards can reduce technical debt and enable researchers to contribute reusable and interoperable software without requiring them to become full-time software engineers.

How to cite: Hutton, E., Tucker, G., Piper, M., and Gan, T.: Beyond Good Practices: Designing Scientific Software for Contribution and Reuse, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16877, https://doi.org/10.5194/egusphere-egu26-16877, 2026.

Posters on site: Mon, 4 May, 16:15–18:00 | Hall X4

The posters scheduled for on-site presentation are only visible in the poster hall in Vienna. If authors uploaded their presentation files, these files are linked from the abstracts below.
Display time: Mon, 4 May, 14:00–18:00
Chairpersons: Diego Jiménez de la Cuesta Otero, Clarissa Kroll, Iris Ehlert
X4.120
|
EGU26-6222
|
ECS
Diego Jiménez de la Cuesta Otero and Andrea Kaiser-Weiss

Modern scientific projects typically rely on software, e.g., for implementing numerical models, performing data pre- and postprocessing, solving inverse problems, or assimilating observations. Consequently, the reliability and reproducibility of scientific results critically depend on software quality. Scientific results are also intended to be shared or reused, and so is the software that produces them: especially in operational settings, where traceability and maintainability are essential. Therefore, a sustainable software development strategy becomes key to a project's success. Nevertheless, often software standards are treated as a secondary concern. This can lead to difficulties when introducing new features, delays in users' projects, limited reproducibility, strained collaborations, and ultimately lack of suitability for operational use.
 
We present the case of the German Weather Service (DWD) contributions within the Integrated Greenhouse Gas Monitoring System for Germany (ITMS). The primary objective of ITMS is the verification of greenhouse gas emissions, which imposes particularly high requirements on the results' traceability and reproducibility. Accordingly, most if not all software-based components of our system should adhere to software development standards that ensure these requirements. We provide an overview of our software development standards and their application, and discuss lessons learned that are transferable to both legacy and newly developed scientific software projects.

How to cite: Jiménez de la Cuesta Otero, D. and Kaiser-Weiss, A.: Preparing for an Operational Environment: Software Development Standards in the Integrated Greenhouse Gas Monitoring System for Germany, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-6222, https://doi.org/10.5194/egusphere-egu26-6222, 2026.

X4.121
|
EGU26-5192
J.Zhou ZhangZhou

Geochemistry π is an open-source automated machine learning Python framework. Geochemists need only provide tabulated data (e.g. excel spreadsheet) and select the desired options to clean data and run machine learning algorithms. The process operates in a question-and-answering format, and thus does not require that users have coding experience. Version 0.7.0 includes machine learning algorithms for regression, classification, clustering, dimension reduction and anomaly detection. After either automatic or manual parameter tuning, the automated Python framework provides users with performance and prediction results for the trained machine learning model. Based on the scikit-learn library, Geochemistry π has established a customized automated process for implementing machine learning. The Python framework enables extensibility and portability by constructing a hierarchical pipeline architecture that separates data transmission from algorithm application. The AutoML module is constructed using the Cost-Frugal Optimization and Blended Search Strategy hyperparameter search methods from the A Fast and Lightweight AutoML Library, and the model parameter optimization process is accelerated by the Ray distributed computing framework. The MLflow library is integrated into machine learning lifecycle management, which allows users to compare multiple trained models at different scales and manage the data and diagrams generated. In addition, the front-end and back-end frameworks are separated to build the web portal, which demonstrates the machine learning model and data science workflow through a user-friendly web interface. In summary, Geochemistry π provides a Python framework for users and developers to accelerate their data mining efficiency with both online and offline operation options. All source code is available on GitHub  (https://github.com/ZJUEarthData/geochemistrypi), with a detailed operational manual catering to both users and developers (https://geochemistrypi.readthedocs.io/en/latest/).

How to cite: ZhangZhou, J. Z.: Geochemistry π: Machine Learning for Geochemists Who Don’t Want to Code, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5192, https://doi.org/10.5194/egusphere-egu26-5192, 2026.

X4.122
|
EGU26-5393
Sebastian G. Mutz
 

Advances in computing, statistics, and machine learning (ML) techniques have significantly changed research practices across disciplines. Despite Fortran’s continued importance in scientific computing and long history in data-driven prediction, its statistics and ML ecosystem remains thin. FSML (Fortran Statistics and Machine Learning) is developed to address this gap and make data-driven research with Fortran more accessible. 

The following points are considered carefully in its development and each come with their own challenges, solutions, and successes: 

  • Good sustainable software development practices: FSML is developed openly, conforms to language standards and paradigms, uses a consistent coding and comment style, and includes examples, tests, and documentation. A contributor’s guide ensures consistency for future contributions. 
  • Accessibility: FSML keeps the code clean and simple, avoids overengineering, and has minimal requirements. Additionally, an example-rich html documentation and tutorials are automatically generated with the FORtran Documenter (FORD) from code, comments, and simple markdown documents. Furthermore, it is developed to support compilation with LFortran (in addition to GFortran), so it can be used interactively like popular packages for interpreted languages. 
  • Community: FSML integrates community efforts and feedback. It uses the linear algebra interfaces of Fortran’s new de-facto standard library (stdlib) and the fortran package manager (fpm) for easy building and distribution. Its permissive licence (MIT) allows developers to integrate FSML into their projects without the restrictions often imposed by other licenses. Its simplicity, documentation, contributor’s guide, and GitHub templates remove barriers for new contributors and users. 
  • Communication: FSML updates are shared through a variety of methods with different communities. This includes a journal article (https://doi.org/10.21105/joss.09058) for visibility among academic colleagues, frequently updated online documentation (https://fsml.mutz.science/), social media updates, as well as a blog and Fortran Discourse posts to keep Fortran’s new and thriving online community updated. 

Early successes of FMSL’s approach and design include: 1) Students with little coding experience were able to learn the language and use library with only Fortran-lang’s tutorials and FSML’s documentation; 2) early career researchers with no prior experience in Fortran used FSML’s functions to conduct research for predicting future climate extremes; 3) FSML gained a new contributor and received a pull request only days after its first publicised release. 

The development of FSML demonstrates the merits of using good and open software development practices for academic software, as well as the potential of using the new Fortran development ecosystem and building bridges to the wider (non-academic) developer community. 

How to cite: Mutz, S. G.: Developing a modern Fortran statistics and machine learning library (FSML) , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5393, https://doi.org/10.5194/egusphere-egu26-5393, 2026.

X4.123
|
EGU26-1499
|
ECS
Lakshmi Aparna Devulapalli

As a Research Software Engineer in the natESM project, you have the opportunity to work with a wide range of Earth System Models (ESMs) developed by the German scientific community. Many of these models, originating in the 1990s, were predominantly written in Fortran. While the broader scientific software world has since transitioned toward languages such as C/C++ and Python, the ESM community is still in the process of catching up. As a result, legacy Fortran code—often 20 years old or more—presents unique and sometimes amusing challenges when attempting to adapt or port to modern technologies.

This talk offers a humorous look at these challenges through the eyes of an RSE navigating outdated code in order to accomplish present-day tasks. Topics will include unsustainable methods of structuring software, relic configuration files used for input, ambiguous naming conventions, unused or nonfunctional code that has never been removed, version control practices that can be improved, and other long-standing programming habits that need to evolve. The session will also highlight more modern and maintainable alternatives to these practices, offering a lighthearted yet constructive perspective on bringing legacy ESM code into the future.

How to cite: Devulapalli, L. A.: Navigating legacy Earth System Model software, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-1499, https://doi.org/10.5194/egusphere-egu26-1499, 2026.

X4.124
|
EGU26-10884
Dominic Kempf, Hannah Weiser, Dmitrii Kapitan, and Bernhard Höfle

The Heidelberg LiDAR Operations Simulator (HELIOS) is a scientific software for high-fidelity general-purpose virtual laser scanning (VLS) [1]. Using models for virtual scenes, scanner devices, and platforms, HELIOS allows to reproduce diverse scan scenarios over various geographical environments (forests, cities, mountains) and laser scanning systems (airborne and UAV-borne, mobile, terrestrial). Used for algorithm development, data acquisition planning, and method training for supervised machine learning, HELIOS has been successfully integrated into research workflows across the international laser scanning community.

HELIOS was initially developed in a research-driven environment in Java and released as open-source software [2]. Motivated by growing interest in the scientific community, the codebase was re-implemented in C++ to improve its memory footprint, runtime performance and functionality [3]. Since then, we are actively developing new features. Recent additions include support for dynamic scenes [4], new deflector mechanisms, and plug-ins for other open-source software such as Blender. Considering the continually growing user community, current software development specifically prioritizes quality assurance, reliability, long-term maintainability, and user-friendliness.

Supported by the DFG under the program "Research Software - Quality Assured and Re-usable" [5], the HELIOS++ developer team partnered up with the Scientific Software Center (SSC), a research software engineering service department at Heidelberg University. Combining the expertise of the domain scientist from the HELIOS team and the research software engineers (RSEs) of the SSC, we are strengthening the sustainability and usability of HELIOS. Measures presented in our talk include: Improving testing strategies and Continuous Integration, rewriting the CMake build system, packaging HELIOS as a Conda package, creating standalone installers, introducing a new Python API, and developing new strategies for sharing and reproducing HELIOS simulations. Additionally, we will reflect on the benefits as well as key challenges in fostering fruitful collaborations between domain scientists and RSEs. To this end, we will present as a domain scientist/RSE tandem.

References:

[1] HELIOS++: https://github.com/3dgeo-heidelberg/helios

[2] Bechtold, S., & Höfle, B. (2016): https://doi.org/10.5194/isprs-annals-III-3-161-2016

[3] Winiwarter, L et al. (2022): https://doi.org/10.1016/j.rse.2021.112772

[4] Weiser, H., & Höfle, B. (2026): https://doi.org/10.1111/2041-210x.70189

[5] Project website: https://www.geog.uni-heidelberg.de/en/3dgeo/projects-of-the-3dgeo-research-group/fostering-a-community-driven-and-sustainable-helios-scientific-software

How to cite: Kempf, D., Weiser, H., Kapitan, D., and Höfle, B.: Teaming up as domain scientists and research software engineers for a sustainable HELIOS++ scientific software, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-10884, https://doi.org/10.5194/egusphere-egu26-10884, 2026.

X4.125
|
EGU26-21322
Matthieu Leclair, Julian Geiger, Alexander Goscinski, and Rico Häuselmann

With the increase in simulation resolution, climate and weather models are now potentially outputting petabytes of data. The largest projects can thus require complex workflows tightly integrating pre-processing, computing, post-processing, monitoring, potential downstream applications or archiving. We introduce here Sirocco, a new climate and weather workflow tool written in Python in collaboration between ETHZ, PSI and CSCS with a special care for the ICON model. 

Sirocco is written with separation of concerns in mind, where users should only care about expressing their desired workflow and bringing the scripts/sources for each task independently. That's why "Sirocco" first designates a user-friendly yaml based configuration format. Inspired by cylc and AiiDA, it describes the workflow graph by equally integrating data nodes (input and output) alongside task nodes. Workflows thus become truly composable, in the sense that no task is making any assumption on the behavior of others.

Sirocco currently defines two types of tasks, called "plugins". The "shell" plugin is dedicated to tasks for which users provide their own main executable, including any auxiliary set of files. The only requirement is the ability to interface with Sirocco, either with executables accepting command line arguments and environment variables and/or by parsing a yaml file providing the necessary context for task execution. The "icon" plugin is a dedicated user friendly interface to the ICON model. On top of the integration to Sirocco workflows, it provides easy ways of handling matters like date changing, namelist modifications, restart files or predefined setups for target machine and architecture. By design, other plugins can be written to facilitate the integration of any other application/model.

Once an internal representation is generated from the configuration file, two possible back-ends can orchestrate the workflow. The first one, called "stand-alone", is entirely implemented inside Sirocco and runs autonomously on the target machine, only relying on the HPC scheduler daemon to keep the workflow running. The second one interfaces with the low-level workflow library AiiDA and its satellite packages, running on a dedicated server with its own daemon and dumping workflow metadata in a queryable database. Both orchestrators implement the novel concept of a deep dynamical task front that propagates through the graph, enabling the ahead-of-time submission of an arbitrary number of task generations.

At the end of the day, Sirocco not only provides the ability to run complex workflows and a nice interface to ICON but also, through its workflow manager nature, facilitates shareability and reproducibility in the community.

How to cite: Leclair, M., Geiger, J., Goscinski, A., and Häuselmann, R.: SIrocco: a new workflow tool for Climate and Weather including explicit data representation and ICON support, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21322, https://doi.org/10.5194/egusphere-egu26-21322, 2026.

X4.126
|
EGU26-20382
Max Eysholdt, Maximilian Zinnbauer, and Elke Brandes

Many countries in the EU fail to protect their waters adequately from nitrogen and phosphorus inputs (European Environment Agency. 2024), often originating from agricultural sources (Sutton 2011). Germany was found guilty by the European Court of Justice for insufficient implementation of the EU Nitrates Directive, for protection of waters from nutrient pollution from agriculture (European Court of Justice 2018). In response, Germany introduced a monitoring system for assessing the impact of the recently updated application ordinance, which implements the EU Nitrates Directive. This monitoring creates time series of pollution-related spatial indicators ranging from land use to modelled nutrient budgets. Input data on land use sources the Integrated Administration and Control System. The results are used by German authorities for reporting to the EU as well as national and regional water protection policy.

We present the technical concept, infrastructure and workflows established for this data-intensive, long-term project and discuss challenges and limitations when operating in the science-policy nexus. We aim to share good practices in modularization, automation, and reproducibility, and discuss strategies for efficient maintenance of scientific software development in context of long-term, policy-relevant monitoring projects.

Our system is designed to handle heterogeneous data with different levels of data protection requirements related to General Data Protection Regulation (GDPR). A modular structure was chosen to enhance usability and maintenance. Reproducibility is ensured through version-controlled, script-based software development. For efficiency, consistency and the streamlining of workflows reporting is automated and an ever-growing set of user-faced functions is bundled into a package. To ensure the possibility of advances in data preparation and modelling, a submission-based approach was chosen, recalculating all indicator times series each reporting year. This requires robust data management, reproducibility, and resilient workflows to accommodate evolving input data.

We still face challenges in handling Open Science principles, political stakeholder interests as well as GDPR. Similarly, scientific advances lead to updated results which may conflict with the need for clear and unambiguous outcomes of the authorities. Regular deadlines and stakeholder needs resulted in an organically grown code base, and sometimes cause neglection of quality checks and unit testing. Additionally, interaction between reproducible, script-based solutions and “traditional” workflows based on Microsoft Word are inefficient. The changing structure of the yearly gathered data hinders automatization of data processing. Due to this and the annual advances in the processing of the input data, maintaining the database is also challenging.  This we would like to share and discuss with other teams facing similar problem

Our system is tailored to handle heterogeneous and sensitive data of different sources producing reliable results and accommodating advances in data preparation and modelling in the long run. However, navigating technical limitations, good scientific practice and policymakers’ interests is challenging for us.

Literature

European Court of Justice (2018). European Commission against Federal Republic of Germany. Infringement Proceedings ‐ Directive 91/676/EEC.

European Environment Agency. (2024). Europe's state of water 2024: the need for improved water resilience. Publications Office.

Sutton, Mark A. (Ed.) (2011). The European nitrogen assessment. Sources, effects and policy perspectives. Cambridge 2011.

 

How to cite: Eysholdt, M., Zinnbauer, M., and Brandes, E.: User-turned-developer: Scientific software development for a national nutrient policy impact monitoring in Germany, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-20382, https://doi.org/10.5194/egusphere-egu26-20382, 2026.

X4.127
|
EGU26-21348
Florian Wagner, Camilla Lüttgens, Andrea Balza Morales, Marc S. Boxberg, Marcel Nellesen, and Marius Politze

Scientific software is essential for accelerating research and enabling transparent, reproducible results, but increasing adoption also increases support demands that can overwhelm small academic development teams. Since most scientists are not trained as software engineers, early-stage research software often lacks the resources and structure needed for broader use, making streamlined support workflows crucial for both users and developers. Addressing these issues is essential to ensure that researchers can focus on their core activities while streamlining processes that benefit both users and developers.

Our project CAES3AR (Collaborative and Efficient Scientific Software Support Architecture) aims to provide researchers with a more open and efficient infrastructure for software support by developing a collaborative architecture. The framework is currently being developed and evaluated using pyGIMLi, an open-source library for modeling and inversion in geophysics (www.pygimli.org), while being designed to remain transferable to a broad range of open-source projects. Thanks to its practicality and gallery of existing examples, pyGIMLi has become widely adopted in the near-surface geophysical community. At the same time, its use across diverse user environments introduces recurring support challenges, since variations in operating systems and installed dependencies can make issue reproduction and debugging time-intensive, which often reduces the capacity for methodological and software innovation.

To address these challenges efficiently, the CAES3AR framework aims to automate key aspects of user support through a generic toolchain that integrates seamlessly with existing infrastructures such as GitHub and Jupyter. It facilitates user engagement by allowing them to create GitHub or GitLab issues that include links to temporary code execution environments (e.g., JupyterLab) equipped with collaborative editing features—potentially integrated with existing JupyterHub and cloud-based infrastructures. Additionally, automated bots powered by GitHub Actions or GitLab jobs will provide real-time feedback on whether issues exist across all platforms and with the latest software versions. If a problem persists, supporters can directly modify the user's code within Jupyter without requiring any downloads or installations. Proposed changes will be presented as formatted code alterations (“diffs”) attributed to their authors in the Git issue for future reference, ensuring clarity and continuity even after the temporary JupyterHub instance is no longer available.

We recently hosted a community workshop to assess developer and user needs, identify challenges in current support practices, and gather requirements for practical adoption. This presentation summarizes key findings from those discussions and introduces early CAES3AR prototypes developed for the pyGIMLi ecosystem. As CAES3AR remains in active development, we conclude by inviting community feedback on additional features and design priorities, with the broader aim of ensuring transferability and long-term utility across multiple open-source scientific software projects.

Project website: https://caesar.pages.rwth-aachen.de/

 

How to cite: Wagner, F., Lüttgens, C., Balza Morales, A., Boxberg, M. S., Nellesen, M., and Politze, M.: CAES3AR: Collaborative and Efficient Scientific Software Support Architecture, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21348, https://doi.org/10.5194/egusphere-egu26-21348, 2026.

X4.128
|
EGU26-17128
Antonia Degen, Yi-Chen Pao, and Andrea Ackermann

In Germany each federal state is committed to collect required information on funding, farming practices and land use with an “Integrated Administration and Control System” (IACS) (Deutscher Bundestag 2014).

Based on the land parcel identification system (LPIS) as one of the core elements of IACS (European Commission, 2025), georeferenced data along with ancillary data are collected annually since 2005. Mandatory requirements for checks and on-site validations ensure a high data quality which makes IACS data very suitable for research purposes (Leonhardt 2024). Our goal is to create a nation-wide timeseries based on IACS data, that contains detailed information on land use, animal husbandry and farm statistics and can be used for comprehensive land use, soil, agricultural-policy and biodiversity research. Despite this, IACS data remain underused for scientific research due to the following challenges:

  • Data protection: Obtaining and handling IACS data requires a legal agreement between the research project and the respective federal state including Data Usage Agreements.
  • Data heterogeneity: All federal states have unique data processing workflows and historical changes in processing practices resulting in different data-types, -formats, structure, keys, encodings, etc.
  • Data volume: Large storage volume, processing capacities and back-up systems with high security levels are required. Efficiency and data minimization is an important framework for the design of the processing workflows.

 

In this contribution we as user-turned-developers, want to show how we utilize our toolbox of open-source software (Linux, Bash, R, PostgreSQL/PostGIS, Python, GitLab), for a suitable modularized workflow to meet these challenges.

The first module is tailored to pre-process the data to its specific federal state qualities. Module two and three contain more general functions to grant machine readability. All data is then processed in a data cleaning workflow and imported into our PostgreSQL/PostGIS database.

We use our database for data harmonization by implementing modularized functions to handle different use cases.

The resulting harmonized datasets are provided to research teams with data protection clearance for federal state and year respectively. Harmonized tables are versioned as releases, to either grant reproducibility as well as to provide necessary updates.

Figure 1 Modularized workflow for IACS data processing towards a nation-wide harmonized timeseries

Reproducibly is granted by using script-based procedures that are stored and versioned in GitLab as well as extensive code documentation and automized file-based processing documentation.

Our modularization process lays the foundation for sustainable handling of complex administrative agricultural data and is a first step towards a software development approach.

Literature

European Commission (2025): Integrated Administration and Control System (IACS). Online available  https://agriculture.ec.europa.eu/common-agricultural-policy/financing-cap/assurance-and-audit/managing-payments_en

Deutscher Bundestag (2014): Gesetz über die Verarbeitung von Daten im Rahmen des Integrierten Verwaltungs- und Kontrollsystems nach den unionsrechtlichen Vorschriften für Agrarzahlungen. InVeKoS- Daten-Gesetz - InVeKoSDG, vom 5 (2019). Online available: https://www.gesetze-im-internet.de/invekosdg_2015/

Heidi Leonhardt, Maximilian Wesemeyer, Andreas Eder, Silke Hüttel, Tobia Lakes, Henning Schaak, Stefan Seifert, Saskia Wolff (2024): Use cases and scientific potential of land use data from the EU’s Integrated Administration and Control System: A systematic mapping review, Ecological Indicators, Volume 167, ISSN 1470-160X, https://doi.org/10.1016/j.ecolind.2024.112709.

How to cite: Degen, A., Pao, Y.-C., and Ackermann, A.: A modularized workflow for processing heterogeneous agricultural land use data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-17128, https://doi.org/10.5194/egusphere-egu26-17128, 2026.

X4.130
|
EGU26-3667
Plug-and-Play Collaboration: Zero-Translation APIs Bridge the Communication Gulf Between Geoscientists and Information Engineers
(withdrawn)
Yi Ding, Tao Wang, Ying Tong, and He Huang
X4.131
|
EGU26-21181
|
ECS
Yi-Chen Pao and Boineelo Moyo

The Integrated Administration and Control System (IACS) is a key instrument of the European Union's (EU) Common Agricultural Policy to monitor agricultural subsidies and support evidence-based policy. IACS provides the most comprehensive EU-wide dataset that combines detailed geospatial data with thematic attributes related to land use, livestock and measures, making it highly valuable for research on agri-environmental policies and agrobiodiversity (Leonhardt, et.al., 2024). In Germany, these data are collected independently by 14 federal states, resulting in substantial heterogeneity across datasets in terms of file format, encoding, data structure and level of completeness. These inconsistencies present major challenges for efficient data management, scientific assessments, reproducibility and the long-term reuse of the data.

This contribution presents an ongoing automated framework designed to standardise and validate raw IACS datasets across our data management pipeline, from data collection and harmonisation to data import and long-term management. Our main goal is to reduce redundancy and manual effort in the data quality check process, while enabling scalable and reproducible data quality assurance. The objective is to therefore develop an optimised, non-redundant data check system that captures structural, semantic and geospatial metadata from heterogenous datasets using a single-pass folder scan. To achieve this objective, we focus on the following approaches:

  • Develop an inventory-based data pipeline / architecture: A lightweight inventory object containing metadata for each file in the delivery folder
  • Automate routine and error – prone data quality scripts: Replace manual checks with modular and reusable automated components from a central inventory system
  • Enable reproducible execution and reporting: Implement a Quarto based framework (an open-source system for reproducible computational documents combining code, results and narrative) that produces human readable visualisations for technical and non-technical users

Our system leverages a diverse set of programming tools including R, Quarto, Bash, Python and SQL, from data delivery or collection to data management in the database. The approach is based on an inventory-first architecture: a lightweight yet expressive data structure generated from a single scan of raw input folder with different types of data formats. The inventory then captures essential metadata of each file such as file types, attribute schemas, geospatial extents, and identifier patterns (e.g., farm identifier, land parcel identifier). A consolidated framework of all data check scripts then enables all subsequent quality-check modules to operate efficiently without repeated file access. Executing the consolidated framework performs a range of automated data quality checks such as file integrity verification, cross-file joinability analysis, schema consistency assessment, and geospatial coherence analysis.

The resulting output in the form of an interactive Quarto dashboard then provides a comprehensive first assessment of the delivered data, where all essential metadata and errors of each file can be derived and inspected in one instance. This workflow not only minimises manual work of checking each file separately and error propagation but also ensures traceable, documented logs.

Our results show how implementing such automated data checks considerably accelerates harmonization processes and improves the data management lifecycle.

How to cite: Pao, Y.-C. and Moyo, B.: Automating Data Quality Checks for Heterogenous Datasets: A scalable approach for IACS data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21181, https://doi.org/10.5194/egusphere-egu26-21181, 2026.

X4.132
|
EGU26-15058
Alby Duarte Rocha and Basem Aljoumani

Process‑based models that explicitly couple soil water and heat transport, canopy radiative transfer, photosynthesis, and surface–atmosphere exchange are increasingly used to connect in‑situ observations with remote‑sensing–relevant land‑surface processes. However, their practical adoption—particularly in heterogeneous urban environments—remains challenging due to complex software dependencies, fragmented preprocessing pipelines, and limited transparency in model configuration. These challenges are exacerbated when such models are accessed through low‑level implementations that are difficult to adapt, reproduce, or extend by domain scientists.

We present rSTEMMUS‑SCOPE, an open‑source R interface to the coupled STEMMUS‑SCOPE modelling framework, designed to apply good practices in scientific software development to a hybrid soil–canopy model that is frequently used by practitioners and researchers interested in ecohydrology, urban climate, and remote sensing. The interface lowers barriers for reproducible experimentation by providing a modular, script‑based workflow that integrates eddy‑covariance forcing, in‑situ soil measurements, vegetation parameters, and multilayer soil discretisation within a transparent R‑based environment that supports from data pre-processing to the visualization of the results.

From a software‑engineering perspective, rSTEMMUS‑SCOPE adopts a modular, script‑based architecture that separates data inputs, model settings, execution, and post‑processing. The package provides reproducible pipelines for preprocessing eddy‑covariance meteorological forcing, precipitation, vegetation parameters, and multilayer soil discretisation (>50 layers), enabling fully scripted end‑to‑end simulations within R. Version‑controlled configuration files, consistent function interfaces, and documented defaults are used to support transparency and extensibility, while example workflows and vignettes lower the entry barrier for users who are domain scientists rather than trained software developers. The design follows a “user‑turned‑developer” paradigm, allowing advanced users to adapt parameterisations and forcing strategies while preserving a stable core interface.

We demonstrate these design choices using an urban case study in a temperate green space in Berlin, where hourly simulations were performed for 2019–2020. Observations from an eddy‑covariance tower and in‑situ soil moisture sensors are used as a software stress test rather than as the primary scientific result. Volumetric soil water content at 60 cm depth was reproduced well (Kling–Gupta Efficiency = 0.82; r = 0.88; α = 1.01), while simulated evapotranspiration captured diurnal and seasonal dynamics (r ≈ 0.67), with systematic biases during low‑energy conditions. Sensitivity experiments illustrate how differences in input data sources and parameter choices propagate through the modelling workflow, highlighting the importance of transparent, reproducible pipelines for diagnosing model behaviour.

We conclude by discussing practical lessons learned in wrapping complex process‑based models in high‑level languages: trade‑offs between modularity and performance, documenting urban‑specific parameter choices without constraining expert use, and testing strategies when upstream physics models are computationally expensive. rSTEMMUS‑SCOPE demonstrates how applying robust software practices enables meaningful, reproducible results and supports early‑career researchers working at the interface of modelling, data, and urban environmental science.

Software availability

rSTEMMUS‑SCOPE (open source): https://github.com/EcoExtreML/rSTEMMUS_SCOPE

How to cite: Duarte Rocha, A. and Aljoumani, B.: rSTEMMUS‑SCOPE: a user‑friendly open‑source R package wrapping a coupled soil–canopy process-based model for urban soil‑moisture and ET — good practices and lessons learned, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-15058, https://doi.org/10.5194/egusphere-egu26-15058, 2026.

X4.133
|
EGU26-12310
Jennie L. Thomas, Lucas Bastien, Ruth Price, Rémy Lapere, Ian Hough, Erfan Jahangir, Lucas Giboni, and Louis Marelle

Over the past 15 years, substantial developments have been made to adapt the regional chemistry-climate model WRF-Chem for applications in polar environments, with a main focus on the Arctic. These developments address key processes that are either absent from, or insufficiently represented in, the standard WRF-Chem distribution, particularly those controlling aerosol-cloud interactions, boundary layer chemistry, and surface-atmosphere coupling over snow, sea ice, and the polar ocean. However, until now, these advances have been distributed across multiple publications, code branches, and project-specific implementations, limiting transparency, reproducibility, and community use.

Here we present WRF-Chem-Polar, a consolidated and openly available modeling framework that integrates our polar-specific model developments into a single, traceable code base. The framework is hosted on GitHub and is structured around two tightly linked components: (i) a unified WRF-Chem-Polar model code that incorporates developments for polar aerosol and cloud processes and (ii) a dedicated infrastructure for compiling, running, and analyzing simulations.

A key objective of WRF-Chem-Polar (including the model code and infrastructure) is to enable transparent model evolution. All developments are tracked through version control, with automated test cases designed to systematically compare model behavior across code versions. This approach allows scientific changes to be evaluated quantitatively, supports regression testing, and facilitates controlled experimentation when introducing new parameterizations or process representations. The infrastructure also provides transparent workflows for simulation setup, post-processing, and diagnostics, improving reproducibility across users and platforms. Code quality, readability, and consistency is improved via coding style guides and modern software tools that include unit testing and automatic enforcement of linting rules.

By making these developments openly accessible and actively maintained, WRF-Chem-Polar lowers the barrier for the community to apply advanced polar chemistry–aerosol–cloud representations, while providing a robust framework for continued development and evaluation. This effort supports both fundamental process studies and applied research and contributes to broader open-science and FAIR modeling and furthers our objective of uptake of our work within the Earth system modeling community.

How to cite: Thomas, J. L., Bastien, L., Price, R., Lapere, R., Hough, I., Jahangir, E., Giboni, L., and Marelle, L.: WRF-Chem-Polar: an open, collaborative, and reproducible framework for modeling the polar atmosphere, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12310, https://doi.org/10.5194/egusphere-egu26-12310, 2026.

X4.134
|
EGU26-7637
Konstantin Gregor, Benjamin Meyer, Joao Darela-Filho, and Anja Rammig

The complexity of geoscientific models, from pre-processing, model execution, and post-processing, poses major challenges to maintainability, reproducibility, and accessibility, even when FAIR data principles are followed.

Based on a survey of the 20 dynamic global vegetation models participating in the Global Carbon Project, we present the current state of, and potential improvements in, practices of software engineering and reproducibility within the community.
We also share notable successful practices from the community that could be helpful for all geo-scientists, including
- version control
- workflow management systems
- containerization
- automated documentation
- continuous integration
- automated visualizations

These approaches enable reproducible, portable, and automated workflows, improve code reliability, and enhance access to scientific results.

We conclude with a showcase of a fully reproducible and portable workflow implemented for one model, illustrating how these practices can be implemented by other modeling communities. This example can serve as a practical resource for improving reproducibility, accessibility, and software engineering standards across the geosciences.

How to cite: Gregor, K., Meyer, B., Darela-Filho, J., and Rammig, A.: Insights and tips for maintainability, robustness, usability, and reproducibility of geo-scientific models, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7637, https://doi.org/10.5194/egusphere-egu26-7637, 2026.

X4.135
|
EGU26-9441
Markus Konkol, Simon Jirka, Sami Domisch, Merret Buurman, Vanessa Bremerich, and Astra Labuce

More and more funders, reviewers, and publishers ask researchers to follow Open Science principles and make their research results publicly accessible. In the case of a computational analysis workflow, this means providing access to data and code that produced the figures, tables, and numbers reported in a paper. However, doing so, even in consideration of the FAIR Principles, does not mean others can easily reuse the materials and continue the research. It still requires effort to understand an analysis script (e.g., written in R or python) and extract those parts of a workflow (i.e. the code snippets) that generate, for instance, a particular figure.

In this contribution, we demonstrate the concept and realization of the Data-to-Knowledge Package (D2K-Package), a collection of digital assets which facilitate the reuse of computational research results [1]. The heart of a D2K-Package is the reproducible basis composed of the data and code underlying, for instance, a statistical analysis. Instead of simply providing access to the analysis script as a whole, the idea is to structure the code into self-contained and containerized functions making the workflow steps more reusable. Each function follows the input-processing-output-logic and fulfills a certain task such as data processing, analysis, or visualization. Creating such a reproducible basis allows inferring the following components that are also part of the D2K-Package:

A virtual lab is a web application, for example, in the form of a JupyterLab environment provided with the help of MyBinder. Users can access it via the browser and obtain a computational environment with all dependencies and the runtime pre-installed. Creating such a virtual lab is possible since all code is containerized and the image is built based on a specification of the used libraries, runtime, and their versions. A virtual lab can help users with programming expertise to engage with the code in a ready-to-use programming environment.

A web API service exposes the encapsulated and self-contained functions such that every function has a dedicated URL endpoint. Users can send requests from their analysis script to that endpoint and obtain the results via HTTP. Hence, they can reuse the functions without copying the code snippets or struggling with dependencies. Such a service can be realized using OGC API Processes and pygeoapi.

The computational workflow connects the functions to an executable analysis pipeline and acts as an entry point to a complex analysis. Such a workflow can help users obtain a better understanding of the functions and relevant input parameters. By using workflow tools such as the Galaxy platform, also users without programming experience receive the chance to change the parameter configuration and see how the new settings affect the final output.

Besides the concepts as outlined above, this contribution will also report on real demonstrators which showcase the idea of a D2K-Package.

This project has received funding from the European Commission’s Horizon Europe Research and Innovation programme. Grant agreement No 101094434.

1) Paper: Konkol et al. (2025) https://doi.org/10.12688/openreseurope.20221.3

How to cite: Konkol, M., Jirka, S., Domisch, S., Buurman, M., Bremerich, V., and Labuce, A.: The Data-to-Knowledge Package - A Framework for publishing reproducible and reusable analysis workflows in Earth System Science, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9441, https://doi.org/10.5194/egusphere-egu26-9441, 2026.

X4.136
|
EGU26-23282
Ulrich Loup, Werner Küpper, Christof Lorenz, Rainer Gasche, Ralf Kunkel, Ralf Gründling, Jannis Groh, Nils Brinckmann, Jan Bumberger, Marc Hanisch, Tobias Kuhnert, Rubankumar Moorthy, Florian Obersteiner, David Schäfer, and Thomas Schnicke

Abstract:

Scientific software in geosciences often grows organically: initial solutions
are developed within small teams to meet immediate research needs, and over time
they evolve into critical infrastructure. While this organic growth can be
highly effective, it frequently leads to challenges in maintainability,
documentation, and reuse when systems are expected to support larger communities
or integrate with new platforms. In this contribution, we share lessons learned
from evolving the software infrastructure of the TERENO environmental observatories.

For more than a decade, TERENO relied on tightly coupled systems in which
observational data and sensor metadata were managed together. This data
infrastructure proved robust in daily operations but gradually accumulated
inconsistencies, implicit conventions, and project-specific extensions that were
insufficiently documented. As TERENO is now being integrated into the Earth &
Environment DataHub, these limitations became visible and required a systematic
rethinking of how sensor and measurement metadata are managed.

As part of the infrastructure redesign within the Earth & Environment DataHub
initiative, we adopted the Helmholtz Sensor Management System (SMS), an open,
community-driven software platform. To support the transition, we developed and
extended the Python tool ODM2SMS, which enables reproducible and configurable
migration of metadata from the legacy system into SMS. This process exposed
several common pitfalls in scientific software development: hidden assumptions
in data structures, incomplete documentation, and software that worked well for
its original developers but was hard to adapt for new use cases.

We addressed these challenges by applying a set of pragmatic good practices.
These included increasing modularity and configurability in ODM2SMS, explicitly
documenting previously implicit rules, and combining automated migration steps
with manual review where scientific context was required. A particularly
instructive example is the migration of complex lysimeter installations,
involving hundreds of interconnected devices. This case highlighted the
importance of clear abstractions, shared terminology, and close interaction
between users and developers.

Our contribution reflects on how community engagement, open development, and
incremental refactoring can improve long-lived scientific software without
disrupting ongoing research. We conclude by discussing transferable lessons for
researchers facing similar challenges: balancing rapid development with
sustainability, making software usable beyond its original context, and turning
legacy systems into maintainable, future-ready tools.

How to cite: Loup, U., Küpper, W., Lorenz, C., Gasche, R., Kunkel, R., Gründling, R., Groh, J., Brinckmann, N., Bumberger, J., Hanisch, M., Kuhnert, T., Moorthy, R., Obersteiner, F., Schäfer, D., and Schnicke, T.: Evolving Scientific Software in Long-Running Observatories: Lessons from the TERENO Sensor Management Migration, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-23282, https://doi.org/10.5194/egusphere-egu26-23282, 2026.

X4.137
|
EGU26-17829
Marco Salvi, Valerio Vinciarelli, Rossana Paciello, Daniele Bailo, Alessandro Crocetta, Kety Giuliacci, Manuela Sbarra, Alessandro Turco, Mario Malitesta, Jean-Baptiste Roquencourt, Martin Carrere, Jan Michalek, Baptiste Roy, and Christopher Card

The development of sustainable and reusable scientific software infrastructures remains a significant challenge in geosciences, particularly when transitioning from single-purpose systems to platforms intended for broader community adoption. This presentation shares experiences and lessons learned from developing the EPOS Platform as an open-source, reusable data integration and visualization system, demonstrating how intentional architectural decisions and tooling investments can transform research infrastructure software into widely adoptable solutions.

The EPOS Platform (European Plate Observing System) initially served as the technical backbone for EPOS ERIC (https://www.epos-eu.org/epos-eric), providing integrated access to solid Earth science data across ten thematic domains. Built on a choreography architecture using Docker and Kubernetes, the system successfully fulfilled its original mandate. However, as other research infrastructures expressed interest in similar capabilities, we recognized the potential for broader impact and initiated a strategic shift toward creating a genuinely reusable open-source platform.

The transition required addressing fundamental challenges in software reusability. Initially, deployment necessitated manual configuration and deep infrastructure knowledge, creating significant adoption barriers. To overcome this, we developed the epos-opensource CLI tool (https://github.com/EPOS-ERIC/epos-opensource), a command-line interface with an integrated terminal user interface (TUI) that reduces deployment from a complex manual process to a single command. This tool enables researchers and developers to deploy fully functional instances locally using either Docker Compose or Kubernetes, significantly accelerating both external adoption and internal development workflows.

We released the complete platform under GPL v3 license, ensuring that all code, including that powering the production EPOS Platform (https://www.ics-c.epos-eu.org/), remains open and community-accessible. Within EPOS ERIC, the open-source release and deployment tooling facilitate rapid provisioning of testing environments for developers and metadata contributors. Comprehensive documentation was developed using Docusaurus, following standard open-source practices to provide installation guides, system architecture references, and user tutorials. The EPOS Platform Open Source has been leveraged to enhance data sharing by multiple research initiatives, including ENVRI-Hub NEXT (https://envri.eu/envri-hub-next/), DT-GEO (https://dtgeo.eu/), IPSES (https://www.ipses-ri.it), and Geo-INQUIRE (https://www.geo-inquire.eu/), demonstrating the platform's versatility across different research contexts.

Our experience demonstrates that developing reusable scientific software requires deliberate investment beyond initial functionality. Key factors include comprehensive documentation following community standards, simplified deployment through user-friendly tooling, architectural flexibility for diverse use cases, and genuine open-source practices where production and community code remain unified. These principles, while resource-intensive, are essential for scientific software to achieve meaningful impact and contribute to a more sustainable, collaborative research infrastructure ecosystem.

This presentation will explore the evolution of the EPOS Platform Open Source, demonstrating how strategic investments in deployment tooling, comprehensive documentation, and architectural flexibility enabled the transformation from a single-purpose infrastructure to a widely adoptable community resource.

How to cite: Salvi, M., Vinciarelli, V., Paciello, R., Bailo, D., Crocetta, A., Giuliacci, K., Sbarra, M., Turco, A., Malitesta, M., Roquencourt, J.-B., Carrere, M., Michalek, J., Roy, B., and Card, C.: Evolution of the EPOS Platform Open Source, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-17829, https://doi.org/10.5194/egusphere-egu26-17829, 2026.

Posters virtual: Mon, 4 May, 14:00–18:00 | vPoster spot 1b

The posters scheduled for virtual presentation are given in a hybrid format for on-site presentation, followed by virtual discussions on Zoom. Attendees are asked to meet the authors during the scheduled presentation & discussion time for live video chats; onsite attendees are invited to visit the virtual poster sessions at the vPoster spots (equal to PICO spots). If authors uploaded their presentation files, these files are also linked from the abstracts below. The button to access the Zoom meeting appears just before the time block starts.
Discussion time: Mon, 4 May, 16:15–18:00
Display time: Mon, 4 May, 14:00–18:00
Chairperson: Filippo Accomando

EGU26-11154 | ECS | Posters virtual | VPS21

Choosing an I/O approach for Earth system models: lessons learned from a modular I/O server for MESSy 

Aleksandar Mitic, Patrick Jöckel, Astrid Kerkweg, Kerstin Hartung, Bastian Kern, and Moritz Hanke
Mon, 04 May, 14:27–14:30 (CEST)   vPoster spot 1b

Modern Earth system models increasingly hit I/O limits—not only in performance, but also in reproducibility, maintainability, and developer productivity. As data volumes and workflows evolve, tightly coupled, file-centric I/O approaches can become hard to scale and hard to extend.

We present the design and lessons learned from introducing an asynchronous, modular I/O server concept in the Modular Earth Submodel System (MESSy). I/O operations were decoupled from the Fortran-based scientific core and implemented as separate Python services, where the communication between the two components was implemented using the Yet Another Coupler (YAC) library. This architecture was chosen to improve flexibility and long-term maintainability, while enabling heterogeneous workflows and evolving storage backends.

Using MESSy as a case study, we discuss practical decision criteria for selecting an I/O concept in large models (e.g., scaling behavior, accessibility for developers, testing and CI strategies, and reproducibility).  We conclude with lessons learned from bridging Fortran and Python communities and from lowering entry barriers for user-developers in a large modeling system.

How to cite: Mitic, A., Jöckel, P., Kerkweg, A., Hartung, K., Kern, B., and Hanke, M.: Choosing an I/O approach for Earth system models: lessons learned from a modular I/O server for MESSy, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11154, https://doi.org/10.5194/egusphere-egu26-11154, 2026.

Please check your login data.