Session ESSI2.3

[Programme]

ESSI2.3 | HPC and Cloud workflows for Earth Observation and Modeling data and Pangeo and DGGS, Enabling FAIR, Open, and Reusable Workflows for Transdisciplinary Earth Science

HPC and Cloud workflows for Earth Observation and Modeling data and Pangeo and DGGS, Enabling FAIR, Open, and Reusable Workflows for Transdisciplinary Earth Science

Convener: Vasileios Baousis | Co-conveners: Tina Odaka, Anne Fouilloux, Marica Antonacci, Max Jones, Stathes Hadjiefthymiades, Mohanad Albughdadi

Orals

| Mon, 04 May, 08:30–10:15 (CEST)

Room -2.92

Posters on site

| Attendance Tue, 05 May, 08:30–10:15 (CEST) | Display Tue, 05 May, 08:30–12:30

Hall X4

Cloud computing and high-performance computing (HPC) have become essential infrastructures for processing large-scale Earth Observation (EO) and Earth System modeling data. The convergence of these paradigms—combined with containerization, AI/ML frameworks, and cloud-native storage—is reshaping how we manage, analyze, and share geoscientific information.
Pangeo (pangeo.io) is a global open community developing scalable, interoperable workflows using tools such as Xarray, Dask, Zarr, and Jupyter. Discrete Global Grid Systems (DGGS) offer a complementary paradigm: equal-area, multi-resolution indexing that enables seamless integration across domains and scales. Together, these approaches support FAIR (Findable, Accessible, Interoperable, Reusable) data management and reproducible, transdisciplinary research.
We invite contributions that explore Cloud and HPC workflows for Earth science, including but not limited to:
• Big data platforms, cloud federations, and interoperable infrastructures (IaaS, PaaS, SaaS)
• Cloud-HPC convergence for EO and modeling workloads
• DGGS-based data organization, indexing, and multi-resolution analysis
• Cloud-native AI/ML applications for geoscientific data
• Reproducible workflows and executable notebooks using Pangeo tools
• Cloud storage solutions, data lakes, and FAIR data management
• Sustainable and green computing practices
We welcome case studies, technical developments, and community-driven initiatives that advance open, scalable, and interoperable Earth data science.

Orals: Mon, 4 May, 08:30–10:15 | Room -2.92

The oral presentations are given in a hybrid format supported by a Zoom meeting featuring on-site and virtual presentations. The button to access the Zoom meeting appears 15 minutes before the time block starts.

Chairpersons: Vasileios Baousis, Tina Odaka, Marica Antonacci

08:30–08:33

HPC and Cloud integration

08:33–08:43

EGU26-19899

On-site presentation

Cloud-based orchestration of the biogeochemical forecasting system for Italian seas within the MER project

Jacopo Nespolo, Matteo Poggi, Cecilia Zagni, Alberto Pastorutti, Stefano Querin, Giorgio Bolzon, Stefano Piani, Fabio Di Sante, Gian Franco Marras, Gabriella Scipione, Antonello Bruschi, and Francesca Catini

We present the biogeochemical forecasting system developed within the MER (Marine Ecosystem Restoration) project (Actions B32-B35). This case study leverages cloud-based workflow orchestration and traditional HPC systems to deliver daily operational marine biogeochemistry forecasts for Italian seas, as a downscaling of the Copernicus Marine Service (CMS).

The basins are divided into 7 regional high-resolution domains at ~500 m resolution and further 10 selected very high-resolution nested sites at ~100 m resolution. The downscaling pipelines we implemented are responsible for retrieving heterogeneous input data from multiple third-party sources (CMS, EFAS, ItaliaMeteo, ECMWF), their preprocessing to feed the MITgcm-BFM coupled physical-biogeochemical model, the postprocessing of the outputs and the publication of the final products. The implementation further provides observability, failsafes and fallbacks in case of missing data, and notifications regarding the status of operations.

Such a complex operational oceanographic system faces competing requirements: on one hand, computationally intensive numerical simulations demand HPC resources. On the other, the orchestration of several interdependent extract-transform-load workflows whilst guaranteeing monitoring and observability require capable management systems. These are often incompatible with HPC cluster policies (e.g., length of standing processes, security, …) and better suited for a cloud environment. On top of this, care must be taken to manage large volumes of data between the orchestrator and the HPC cluster.

We address these competing requirements through a hybrid architecture that combines cloud computing with HPC infrastructures for workflow orchestration and compute-intensive simulations, respectively. Our system, rewritten following software engineering best practices (modular architecture, separation of concerns, CI testing, …), employs Apache Airflow as the workflow manager, deployed in a fully containerised fashion on CINECA's OpenStack-based cloud infrastructure. A custom integration layer allows interfacing with the Slurm workload manager, offloading computationally intensive tasks onto CINECA's Leonardo HPC cluster. Parallel computing and distributed filesystems are efficiently exploited through modern technologies, particularly the cloud-native Zarr data format in conjunction with xarray and dask as Python-based numerical computing libraries.

Our setup demonstrates the viability of hybrid cloud-HPC architectures for operational Earth system modelling. It meets efficiency and scalability goals that would be challenging with either infrastructure alone. The software is planned to be open-sourced in the second half of 2026.

This work is developed by eXact lab Srl in partnership with OGS and CINECA within the MER project, led by ISPRA, funded by the NextGenerationEU program (Italian National Recovery and Resilience Plan, investment M2C4 ‐ I3.5).

How to cite: Nespolo, J., Poggi, M., Zagni, C., Pastorutti, A., Querin, S., Bolzon, G., Piani, S., Di Sante, F., Marras, G. F., Scipione, G., Bruschi, A., and Catini, F.: Cloud-based orchestration of the biogeochemical forecasting system for Italian seas within the MER project, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19899, https://doi.org/10.5194/egusphere-egu26-19899, 2026.

08:43–08:53

EGU26-9485

On-site presentation

The Kilometer-Scale Cloud by DKRZ

Fabian Wachsmann and Chaturika Wickramage

We present the Kilometer-Scale Cloud data server (https://km-scale-cloud.dkrz.de) and its underlying software stack Cloudify (https://gitlab.dkrz.de/data-infrastructure-services/cloudify) developed and deployed at the German Climate Computing Center (DKRZ). The km-scale cloud provides open and analysis-ready access to prominent climate datasets from projects such as the European Eddy-Rich Earth System Models (EERIE) stored across heterogeneous storage tiers through standardized cloud-native interfaces, without requiring physical data reformatting or migration.

Within the EERIE project, kilometer-scale Earth System Models (~10 km atmosphere and ~5 km ocean) generate petabyte-scale output that exceeds the capabilities of traditional file-based access patterns. Cloudify addresses this challenge by emulating Zarr data originating from sources stored on file system or tape enabling efficient HTTP access to large datasets. This approach allows users to interact with HPC-resident data using established cloud-native tools and workflows.

The km-scale cloud offers several key advantages: (i) seamless bridging of HPC and cloud ecosystems, enabling interactive and scalable analysis without data duplication; (ii) analysis-ready data access, supporting chunk-based and parallel I/O patterns suited for modern data analytics and machine-learning workflows; (iii) improved data discoverability and reuse, facilitated by standardized interfaces and metadata services such as STAC catalogs; and (iv) lower entry barriers for external users, who can access large climate datasets without requiring direct HPC accounts or specialized system knowledge.

By deploying Cloudify as a data service, the km-scale cloud demonstrates a scalable pathway towards interoperable, cloud-enabled access to next-generation climate model output.

How to cite: Wachsmann, F. and Wickramage, C.: The Kilometer-Scale Cloud by DKRZ, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9485, https://doi.org/10.5194/egusphere-egu26-9485, 2026.

08:53–09:03

EGU26-8000

ECS

On-site presentation

TACO: Operationalizing AI-Ready EO datasets

Oscar J. Pellicer-Valero, Cesar Aybar, Mikolaj Czerkawski, Carmen Oliver, Kevin Monsálvez, Julio Contreras, and Gustau Camps-Valls

The field of Artificial Intelligence for Earth Observation (AI4EO) currently suffers from significant data friction, especially when moving Petabyte-scale archives from cloud object storage to High Performance Computing (HPC) nodes. We present the TACO (Transparent Access to Cloud-Optimized datasets), a production-grade standard designed to replace file-centric legacy workflows with a high-throughput streaming paradigm.

We showcase the practical implications of this architecture for the deployment of geospatial Foundation Models (FMs), by running pretrained FMs on downstream inference tasks (such as semantic segmentation or land-cover classification) directly on arbitrary samples of arbitrary cloud-hosted datasets, quickly, and without the need for local staging or any specific preprocessing. TACO bridges the gap between static cloud archives and dynamic HPC processing, allowing seamless, scalable AI4EO workflows, and fulfilling the so far unfulfilled promise of FMs of "train once, apply everywhere".

References:

Cesar Aybar, et al. (2025). The Missing Piece: Standardising for AI-ready Earth Observation Datasets. Poster at TerraBytes-ICML 2025 Workshop. Vancouver, Canada
TACO Foundation. (2025, November 21). The TACO specification (Version 2.0.0). https://tacofoundation.github.io/specification

How to cite: Pellicer-Valero, O. J., Aybar, C., Czerkawski, M., Oliver, C., Monsálvez, K., Contreras, J., and Camps-Valls, G.: TACO: Operationalizing AI-Ready EO datasets, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8000, https://doi.org/10.5194/egusphere-egu26-8000, 2026.

09:03–09:13

EGU26-14414

On-site presentation

Advancing the Research-to-Industry Continuum for Earth Observation AI in Europe via the AI-on-Demand Platform

Antonis Troumpoukis, Mohanad Albughdadi, Ioannis Kakogeorgiou, Giorgos Petsangourakis, Theodoros Aivalis, Vasileios Vatellis, and Vasileios Baousis

Earth Observation (EO) and environmental research increasingly relies on AI methods that require access to large datasets, scalable cloud infrastructures, and high-performance computing (HPC) resources. At the same time, the transition of research outcomes into operational, industry-ready services remains challenging, often demanding substantial re-engineering of data pipelines, execution environments, and deployment models. This separation between research-oriented and industry-oriented infrastructures continues to limit the reuse, scalability, and real-world impact of EO innovations.

Addressing this gap, the European AI-on-Demand Platform (AIoD) [1] was recently expanded to support both research and industry within a unified digital infrastructure. The platform brings together research-driven AI assets (such as models, workflows, and datasets) with industry-grade tools and services for the development, training, and operationalisation of AI applications across cloud and HPC infrastructures, in an efficient and responsible manner. As a unified gateway, the AIoD connects previously fragmented resources across the European AI ecosystem, making them accessible, reusable, and adaptable to diverse user needs. In parallel, efforts are underway to explore interoperability with emerging European AI Factory initiatives, including PHAROS (the Greek AI Factory) [2], aiming to support future federated access to specialised AI computing resources.

We illustrate this approach through Earth Observation and environmental services and use cases that are jointly accessible to researchers and practitioners, including the mapping of sea surface features and marine pollutants, satellite image enhancement through super-resolution, and AI-based prediction and analysis of extreme weather events, enabling a seamless transition from experimentation and validation to scalable, operational deployment. These developments extend earlier work on European AI and Earth Observation convergence [3].

[1] http://aiodp.eu
[2] https://www.pharos-aifactory.eu
[3] A. Troumpoukis et al., European AI and EO convergence via a novel community-driven framework for data-intensive innovation. Future Gener. Comput. Syst. 160: 505-521 (2024) https://doi.org/10.1016/j.future.2024.06.013

This work has received funding from the European Union’s Digital Europe Programme (DIGITAL) under grant agreement No 101146490.

How to cite: Troumpoukis, A., Albughdadi, M., Kakogeorgiou, I., Petsangourakis, G., Aivalis, T., Vatellis, V., and Baousis, V.: Advancing the Research-to-Industry Continuum for Earth Observation AI in Europe via the AI-on-Demand Platform, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14414, https://doi.org/10.5194/egusphere-egu26-14414, 2026.

09:13–09:23

EGU26-7413

On-site presentation

Provisioning a Cloud-Native Training Infrastructure for the MediTwin Summer School 2025

Federico Fornari, Claudio Pisa, Marica Antonacci, Vasileios Baousis, Tolga Kaprol, and Mohanad Albughdadi

MediTwin is a European research initiative aimed at developing digital twin technologies for the Mediterranean region, integrating Earth observation data, numerical modelling and artificial intelligence to support environmental monitoring and decision-making. In this context, the MediTwin Summer School 2025 was organised to provide hands-on training on data-driven workflows, cloud-native tools and AI/ML techniques for Earth system applications. The school targeted early-career researchers, PhD students and technical staff from research institutions, with a total of 20 participants.

The School required a scalable, secure and reproducible cloud infrastructure capable of supporting hands-on training activities in Earth system digital twins, data analysis and AI/ML workflows. This contribution presents the design and provisioning of the cloud-native infrastructure deployed to support the school, with a focus on Infrastructure as Code (IaC), Kubernetes-based orchestration and hybrid GPU-enabled workloads.

The infrastructure was deployed on the ECMWF on-premises cloud, based on OpenStack and backed by Ceph software-defined storage, providing elastic compute, networking and persistent storage services. The Kubernetes cluster was provisioned in a high-availability configuration using Terraform and Rancher Cluster Manager, following established GitOps best practices. The cluster architecture comprised dedicated control-plane, worker, ingress and GPU nodes, enabling both standard cloud-native services and accelerated AI/ML workloads. Cluster lifecycle management, configuration drift prevention and application delivery were handled through a GitOps approach using Rancher Fleet.

GitLab acted as the central orchestration platform for source control, CI/CD pipelines and IaC automation, hosting Terraform modules, Helm charts, Rancher cluster definitions and configuration templates. This ensured full traceability, auditability and reproducibility of both infrastructure and application deployments. Sensitive credentials and API keys were securely managed using HashiCorp Vault and dynamically injected into workloads.

To support interactive training activities, a JupyterHub service was deployed on Kubernetes using the official Helm chart, customised for resource management, authentication and storage integration. GPU acceleration was enabled via the NVIDIA GPU Operator, which automated driver installation, device discovery and scheduler integration. In addition, outside the Kubernetes environment, 20 GPU-enabled virtual machines were provisioned directly on OpenStack using an Ansible role executed through AWX, itself deployed on the Kubernetes cluster, to accommodate specific student exercises requiring isolated VM-based access.

This experience demonstrates how modern cloud-native and DevSecOps practices can be effectively applied to provision short-lived yet production-grade scientific training infrastructures, ensuring scalability, security and reproducibility for future Earth observation and digital twin education initiatives.

How to cite: Fornari, F., Pisa, C., Antonacci, M., Baousis, V., Kaprol, T., and Albughdadi, M.: Provisioning a Cloud-Native Training Infrastructure for the MediTwin Summer School 2025 , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7413, https://doi.org/10.5194/egusphere-egu26-7413, 2026.

09:23–09:33

EGU26-10862

On-site presentation

KuHOP: Kubernetes-Based Orchestration for Hybrid Earth Science Computing

Layla Loffredo, Tim Kok, George Bampilis, and Marco Kerstens

Numerical Weather Prediction (NWP), Earth Observation (EO), and Earth System Modeling workflows commonly span high-throughput computing (HTC) and high-performance computing (HPC): EO ingestion and pre-processing on HTC, data assimilation and model execution on HPC, and verification back on HTC. Most infrastructures lack integrated mechanisms to coordinate these heterogeneous environments, leading to manual workflow orchestration and ad-hoc data transfers.

We present KuHOP (Kubernetes-orchestrated Hybrid Operations Platform), an architecture that applies cloud-native orchestration concepts to hybrid HTC/HPC workflows. KuHOP uses Kubernetes as a unified control plane to describe, schedule, and monitor workflows across heterogeneous environments while preserving existing SLURM schedulers through native job submission. Containerized services and Kubernetes operators translate declarative workflow specifications into scheduler-specific jobs and manage data movement between clusters, enabling consistent observability without replacing established resource managers.

By treating HTC and HPC systems as backends within a single orchestration framework, KuHOP aims to improve portability and reproducibility of hybrid workflows through version-controlled, declarative definitions. The modular design supports GPU-accelerated AI/ML components and envisions multi-tiered resource federation. The Kubernetes control plane also allows institutions to deploy complementary services such as data streaming pipelines, AI inference endpoints, and custom dashboards as their needs evolve.

Developed at SURF, the Dutch national digital infrastructure provider, KuHOP targets both operational and non-operational Earth science workflows where hybrid computing is essential. It offers institutions a practical path to automate HTC/HPC coordination without abandoning existing infrastructure or losing the specialized capabilities of each environment.

How to cite: Loffredo, L., Kok, T., Bampilis, G., and Kerstens, M.: KuHOP: Kubernetes-Based Orchestration for Hybrid Earth Science Computing, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-10862, https://doi.org/10.5194/egusphere-egu26-10862, 2026.

09:33–09:35

PANGEO and DGGS

09:35–09:45

EGU26-21395

On-site presentation

From Disparate Datasets to Analysis-Ready Data Cubes with Pangeo on EarthCODE

Krasen Samardzhiev, Deyan Samardzhiev, Anca Anghelea, and Ewelina Dobrowolska

The EarthCODE Open Science Catalog (https://opensciencedata.esa.int/catalog) contains over 300 data products at this moment, most of them the result of peer-reviewed scientific research. Currently, these exist as disparate individual datasets, mostly grouped under themes or variables. This fragmentation creates a barrier to interoperability, where a scientist has to manually combine these datasets—for example reprojecting, regridding, or temporally resampling heterogeneous data.

EarthCODE is creating a new category of products-combined data cubes for each of the Open Science Catalog’s themes-to streamline access for science researchers and ensure the data is truly "Analysis-Ready" (ARD). Combining the data products into a single grid and a single projection will drastically reduce researcher overhead needed to harmonize the appropriate datasets. This workflow focuses on the combination of different datasets and collaborating with scientists to curate the appropriate data and to minimise disruption during the transformation process, since any reprojection or regridding introduces uncertainties.

We demonstrate the efficacy of this Pangeo-aligned workflow through the Antarctica InSync project (https://discourse-earthcode.eox.at/t/antartica-insync-data-cubes/107). This was a multi-stage pipeline that included close collaboration with the scientific community. The first step was aggregating the relevant Antarctic datasets. This step by itself is important, since it centralizes domain knowledge and ensures the Open Science Catalog contains the latest datasets relevant to the research community.

The second step involved processing the data using cloud-native tools to convert it to the same projection, common grid, and in some cases the same resolution (creating coherent STAC Collections). The third step involved the generation of detailed metadata at the variable level for all datasets to ensure high Findability and Reusability. Furthermore, we also provide the visualisation tools to explore the data cube via cloud-optimized formats, without downloading it, in addition to a discussion forum. To foster open science and reproducibility, our accompanying library will contain all generalizable functions that were used to generate this data, allowing the community to reuse these workflows for other domains.

How to cite: Samardzhiev, K., Samardzhiev, D., Anghelea, A., and Dobrowolska, E.: From Disparate Datasets to Analysis-Ready Data Cubes with Pangeo on EarthCODE, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21395, https://doi.org/10.5194/egusphere-egu26-21395, 2026.

09:45–09:55

EGU26-18064

ECS

Virtual presentation

Virtual Zarr for Ensemble Prediction Systems: VirtualiZarr Custom Parsers for Cloud-Native GRIB Access

Hillary Koros, Nishadh Kalladath, Max Jones, Sean Harkins, Jason Kinyua, Mark Lelaono, Ezra Limo, Masilin Gudoshava, and Ahmed Amdihun

Virtual Zarr for Ensemble Prediction Systems: VirtualiZarr Custom Parsers for Cloud-Native GRIB Access

Hillary Koros, Nishadh Kalladath, Max Jones, Sean Harkins, Jason Kinyua, Mark Lelaono, Ezra Kiplimo Masilin Gudoshava and Ahmed Amdihun

IGAD Climate Prediction and Applications Centre, Nairobi, Kenya

Development Seed, United States of America

Global Ensemble Prediction Systems (EPS) from ECMWF and NOAA such as IFS, GEFS generate petabyte-scale datasets essential for early warning systems, probabilistic forecasting, and AI/ML weather applications. However, the GRIB format designed for efficient archival storage—resists cloud-native random access patterns. Converting archives to Analysis Ready Cloud Optimized (ARCO) formats would require prohibitive storage duplication. Virtual Zarr datasets enabled by Virtualizarr library offer a transformative alternative: lightweight reference layers exposing original GRIB files through cloud-native interfaces without data conversion.

This approach creates a win-win-win solution. Data producers maintain GRIB files without additional processing. Cloud providers serve data efficiently through byte-range requests. End users access ensemble forecasts via familiar tools (xarray, Dask) as if data were in Zarr format. Previous work on Grib-Index-Kerchunk (https://github.com/icpac-igad/grib-index-kerchunk ) method demonstrated this paradigm by exploiting a critical insight: GRIB index files (.idx text for GEFS, .index JSON for ECMWF) contain all byte offset information needed for virtual reference creation. Rather than scanning entire corpus of GRIB files— compute expensive at ~2,400 files per GEFS run or ~85 files of 5GB each for ECMWF—the GIK method reads only lightweight index files (~KB/ few MB each) plus 1-2 sample GRIB files to extract metadata structure. This achieves regional data access with less than 5% of original GRIB data read.

Building on this foundation, we develop GEFS and ECMWF custom parsers following the VirtualiZarr Parser protocol with native Zarr v3 ArrayBytesCodec using gribberish, a Rust-based decoder delivering order-of-magnitude performance improvements. Following HRRRparser (https://github.com/virtual-zarr/hrrr-parser ) patterns, our parsers construct chunk manifest store. Virtual references persist to Icechunk transactional storage following zarr specification, enabling version-controlled datasets where chunks reference original GRIB bytes. The resulting stores integrate with xarray and Dask for parallel ensemble processing across 30-51 members and 85+ forecast timesteps.

For regional climate centers, this replaces custom pipelines with community-extensible parsers. By contributing GEFS or IFS product-specific custom parsers to VirtualiZarr, we transform operational necessity into reusable infrastructure—enabling cloud-native ensemble access: `xr.open_zarr("icechunk://gefs")`.

How to cite: Koros, H., Kalladath, N., Jones, M., Harkins, S., Kinyua, J., Lelaono, M., Limo, E., Gudoshava, M., and Amdihun, A.: Virtual Zarr for Ensemble Prediction Systems: VirtualiZarr Custom Parsers for Cloud-Native GRIB Access , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18064, https://doi.org/10.5194/egusphere-egu26-18064, 2026.

09:55–10:05

EGU26-10033

ECS

On-site presentation

Creating and visualising DGGS native data cubes with DGGS.jl

Daniel Loos, Gregory Duveiller, and Fabian Gans

Discrete Global Grid Systems (DGGS) have emerged as a transformative approach to minimizing spatial distortions in geospatial data processing. They are not only used for geocoding, but also offer a highly efficient data structure due to the lack of tile overlap, as used in Sentinel-2 imagery and elsewhere. The performance of lookup operations on DGGS native data cubes is intrinsically linked to the cell index, which plays a crucial role in data management and retrieval. Most DGGS implementations utilize a hierarchical one-dimensional index to name and sort cells, optimizing them for parent-child queries like up and downsampling. However, many real-world applications, such as visualization or convolutions, require efficient handling of distant neighbour queries based on spatial distances.

Here, we present the tools DGGS.jl and DGGSexplorer to create and visualise DGGS native data cubes. Hereby, a three-dimensional index based on Icosahedral Snyder equal-area projection is utilized, enabling compact and efficient data cube arrays stored in the cloud-optimized Zarr format. Furthermore, we developed a XYZ Tile Map Server generating maps on the fly, allowing to view DGGS data in QGIS, in the web browser, and elsewhere. This is especially helpful in integrating multi sensor data at different spatial resolutions while minimising spatial distortions and computational resources in all subsequent processing steps.

How to cite: Loos, D., Duveiller, G., and Gans, F.: Creating and visualising DGGS native data cubes with DGGS.jl, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-10033, https://doi.org/10.5194/egusphere-egu26-10033, 2026.

10:05–10:15

EGU26-14554

Highlight

On-site presentation

GRID4EARTH: Toward an Ellipsoidal HEALPix Grid for Analysis-Ready Earth Observation and Climate Data

Jean-Marc Delouis, Benoît Bovy, Anne Fouilloux, Alexander Kmoch, Justus Magin, Pablo Richard, Vincent Dumoulin, and Tina Odaka

The increasing volume and diversity of Earth Observation (EO) and climate data produced by Copernicus missions and the Destination Earth (DestinE) initiative pose a major challenge for interoperability and large-scale analysis. Today, global datasets are distributed on heterogeneous spatial grids, forcing users to repeatedly perform ad-hoc regridding steps, which are costly, error-prone, and difficult to reproduce.

The GRID4EARTH project addresses this issue by promoting a common Discrete Global Grid System (DGGS) as a foundation for analysis-ready Earth data. In this context, HEALPix emerges as a strong candidate due to its equal-area property, hierarchical structure, and long-standing adoption in global modelling and large-scale data analysis. These properties enable efficient multi-resolution workflows, scalable data access, and natural integration with modern cloud-native formats such as Zarr.

However, a key limitation remains: HEALPix is formally defined on the sphere, whereas EO are naturally referenced to the WGS84 ellipsoid. While often ignored at coarse resolutions, this mismatch introduces non-negligible area distortions at Copernicus resolutions and may bias zonal or regional analyses.

To overcome this limitation, GRID4EARTH explores anextension of HEALPix to the WGS84 ellipsoid using the authalic sphere associated with the ellipsoid. By preserving equal-area properties on the ellipsoid, this approach provides a consistent spatial framework bridging spherical climate models and ellipsoidal EO. It enables a unified representation for DestinE model outputs and Copernicus satellite data, while remaining compatible with existing HEALPix-based tools and workflows.

This contribution presents the motivation, principles, and expected benefits of ellipsoidal HEALPix within GRID4EARTH, and discusses its role as a practical and scalable DGGS for next-generation Earth system data infrastructures.

How to cite: Delouis, J.-M., Bovy, B., Fouilloux, A., Kmoch, A., Magin, J., Richard, P., Dumoulin, V., and Odaka, T.: GRID4EARTH: Toward an Ellipsoidal HEALPix Grid for Analysis-Ready Earth Observation and Climate Data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14554, https://doi.org/10.5194/egusphere-egu26-14554, 2026.

Posters on site: Tue, 5 May, 08:30–10:15 | Hall X4

The posters scheduled for on-site presentation are only visible in the poster hall in Vienna. If authors uploaded their presentation files, these files are linked from the abstracts below.

Display time: Tue, 5 May, 08:30–12:30

Chairpersons: Max Jones, Mohanad Albughdadi, Vasileios Baousis

X4.82

EGU26-3863

ECS

Hydrological simulations with seamless scaling between Could and High Performance Computing environments on DestinE from the comfort of your own browser using eWaterCycle.

Mark Melotto, Rolf Hut, and Claudia Vitolo

The goal of the eWaterCycle project is to facilitate hydrological modelling being done Findable, Accessible, Interoperable & Reproducible (FAIR).

High (hyper) resolution and / or large sample hydrological modelling, including those driven by Destination Earth (DestinE) Digital Twin (DT) inputs, often require HPC infrastructure for model runs. Designing such studies, however, benefit from users working on interactive Cloud Infrastructures. Migrating workflows from Cloud Infrastructure to HPC infrastructure requires deep knowledge of the systems in place, which typical (hydrological expert) users don’t have. A core design philosophy of the eWaterCycle platform is that domain (hydrology) users should not need to become computer science experts to carry out their hydrological research.

To address this, we have developed a workflow that seamlessly upscales any hydrological workflow designed on Cloud Infrastructure to a SLURM high performance compute cluster, with small changes compared to working from the cloud environment: setting up paths (e.g. scratch folders) and supplying key argument parameters (e.g. the specified region). Users are not required to have any prior knowledge of HPC systems beforehand.

This 'seamless' workflow can be run from any jupyterhub environment. We are using and are in development of integrating with the services that are part of DestinE.

As a ‘large sample hydrology’-example: we run a climate change impact on flood frequency analysis on each of the 6830 catchments in the entire . We facilitate using ERA5, ERA-Interim and CMIP6 data as well as the data provided by the Digital Twin (DT) as input to these model runs. In this presentation we will be sharing our results obtained using eWaterCycle with the DT data. This workflow will serve as an example of our seamless scaling between Cloud Infrastructure and HPC systems and provide lessons learned for others setting up similar services.

How to cite: Melotto, M., Hut, R., and Vitolo, C.: Hydrological simulations with seamless scaling between Could and High Performance Computing environments on DestinE from the comfort of your own browser using eWaterCycle., EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-3863, https://doi.org/10.5194/egusphere-egu26-3863, 2026.

X4.83

EGU26-21049

Advancing cloud-native data access and processing for Natural and Engineering Sciences: CLOUD-NES

Serkan Girgin, Francesco Nattino, Martin Brandt, and Maarten Plieger

Data accessibility is crucial in modern research across the Natural and Engineering Sciences (NES), including Geosciences, and is central to the push toward Open Science. Yet, accessing and efficiently processing rapidly growing datasets, such as Earth-related spatiotemporal data, remains challenging as sources diversify and collection frequency increases. Most of these datasets are hosted on the Cloud, and cloud-native data access and processing are ramping up as modern digital competences. Cloud-based processing is especially beneficial because bringing computation close to the data boosts efficiency and reduces analysis time. Despite this, many researchers still rely on the inefficient approach of downloading data for local analysis. Sometimes this is unavoidable because the data is not provided in cloud-friendly formats, but often it also reflects a lack of skills for cloud-based access and processing. A similar problem occurs in data publishing, where research datasets are frequently shared in formats that impede efficient cloud access and interoperability, even though cloud-optimized formats could be used at no additional cost.

The CLOUD-NES project, funded by the Dutch Research Council (NWO) via the Thematic Digital Competence Centre NES (TDCC-NES), aims to advance cloud-native tools and workflows for publishing, accessing, and processing research data in the Netherlands. The project demonstrates the benefits of cloud-native approaches through reproducible performance benchmarks and equipes researchers with practical training to strengthen digital competencies. A public cloud-native data repository with co-located analysis capabilities is being developed, featuring object-based scalable storage and STAC-compliant data catalog, and hosting selected datasets from large-scale geospatial data providers in the Netherlands such as PDOK and KNMI, transformed into cloud-optimized formats. Through iterative benchmarking, we are assessing the performance of cloud-native storage formats, access patterns, and analysis workflows, generating reproducible evidence of efficiency gains to support community adoption. All infrastructure, ingestion pipelines, and benchmarking code will be open-source, accompanied by detailed guidelines and documentation. To further accelerate adoption, domain-specific open training materials will be developed and hands-on workshops for researchers and data providers will be organized. Training covers cloud-native data access, workflow design, dataset publishing, and infrastructure deployment, using common domain-specific workflows as case studies. Community events and mini symposia will foster community building and knowledge exchange, while lessons learned and best practices will be disseminated nationally and internationally.

By combining demonstrable benchmarks, practical training, and clear guidance for data providers, CLOUD-NES aims to accelerate the adoption of cloud-native research practices across the Dutch research community and beyond, improving efficiency, reproducibility, and accessibility of large, complex datasets. This presentation provides an overview of the CLOUD-NES project, covering the design and operation of its reproducible cloud-native benchmarking framework and the structure of its open training materials. Planned project activities, including community-building events and mini-symposia on effective cloud-native practices, will also be highlighted.

How to cite: Girgin, S., Nattino, F., Brandt, M., and Plieger, M.: Advancing cloud-native data access and processing for Natural and Engineering Sciences: CLOUD-NES, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21049, https://doi.org/10.5194/egusphere-egu26-21049, 2026.

X4.84

EGU26-7440

A Cloud-Based Infrastructure for EO-Driven Climate Resilience Services

Federico Fornari, Marica Antonacci, Claudio Pisa, Vasileios Baousis, Tolga Kaprol, and Mohanad Albughdadi

CLIMRES (LEADERSHIP FOR CLIMATE RESILIENT BUILDINGS) is a European project addressing the growing vulnerability of buildings and urban environments to climate change impacts. The project combines climate data, Earth Observation (EO) products, and impact assessment methodologies to identify climate-driven hazards, assess building and urban-scale vulnerabilities, and support decision-making through dedicated tools and measures. These solutions are validated through large-scale pilot demonstrations across several European countries.

A key enabling outcome of CLIMRES is the design and deployment of a scalable, cloud-native infrastructure hosting the project’s Federated Data Exchange Platform (FDXP) and digital services. The infrastructure is hosted by ECMWF on a dedicated Kubernetes cluster within the Common Cloud Infrastructure (CCI), part of the European Weather Cloud co-managed by ECMWF and EUMETSAT. The underlying cloud environment is based on OpenStack with Ceph storage, providing elastic compute and scalable object storage capabilities for data-intensive workloads. This infrastructure provides the technical backbone for integrating heterogeneous datasets, executing data-processing workflows, and delivering operational services that underpin climate resilience assessments and decision-support applications.

The CLIMRES platform follows cloud-native design principles and adopts containerization and microservice-based architectures to ensure modularity, scalability, and operational robustness. Kubernetes is used as the core orchestration layer, while Rancher provides centralized cluster management, monitoring, and operational visibility. All services, including the FDXP and supporting applications, are deployed consistently across environments using GitOps principles, ensuring reproducibility, traceability, and elimination of configuration drift.

Continuous Integration and Continuous Delivery (CI/CD) pipelines automate the full software lifecycle, from source code changes to container image building and deployment. Docker images are built through automated pipelines and deployed via Git-driven workflows, enabling transparent, auditable, and predictable releases. Semantic versioning and changelog generation are fully automated, ensuring consistent release management across services.

This contribution presents the CLIMRES cloud infrastructure as a production-ready case study for EO- and climate-driven applications. It demonstrates how cloud-native technologies can effectively support scalable data management platforms and operational services for climate resilience.

How to cite: Fornari, F., Antonacci, M., Pisa, C., Baousis, V., Kaprol, T., and Albughdadi, M.: A Cloud-Based Infrastructure for EO-Driven Climate Resilience Services , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7440, https://doi.org/10.5194/egusphere-egu26-7440, 2026.

X4.85

EGU26-9640

A Service-Oriented Distributed Zarr Solution for Climate Data Access Across Heterogeneous HPC and Storage Infrastructures

Mostafa Hadizadeh, Martin Bergemann, Etor Lucio Eceiza, Andrej Fast, and Christopher Kadow

Modern climate archives are increasingly distributed across heterogeneous storage systems, while analysis workflows are becoming more interactive, distributed, and cloud-native. Moreover, many high-performance computing centres (HPC) host large climate datasets on traditional file-based storage infrastructures, whereas computational resources are often located at different sites. This separation between data location and compute resources creates significant barriers to efficient, interactive, and scalable data access.

This situation calls for climate data access services that are scalable, flexible, and independent of specific client-side environments, while supporting common climate data formats such as NetCDF, GeoTIFF, Zarr, HDF5, and GRIB. Nevertheless, efficient remote access to large and heterogeneous climate archives remains a major bottleneck for modern scientific workflows.

We present aservice, the Freva Data Loader, which implements the logic required to open datasets from diverse storage backends and expose them as Zarr chunks through a lightweight, web-friendly REST interface with modern authentication mechanisms.

The Freva Data Loader is implemented as a stateless, worker service exposing a REST interface for dataset access and Zarr endpoint generation. Upon receiving an authenticated request, the service resolves dataset metadata, opens the underlying data from the appropriate storage backend (e.g. POSIX file systems or object storage), and exposes the data as a Zarr-compatible, chunked stream. Authentication and authorisation are handled centrally using OAuth2, ensuring secure and controlled access across institutional boundaries. Requests are coordinated by a Loader component and distributed to worker instances via a message broker (Redis), enabling asynchronous execution and horizontal scalability.

The service decouples data access from client-side tooling and enables users and applications to access data stored on traditional posix HPC file systems, tape archives, as well as cloud-based object storage through a unified Zarr interface. Instead of transferring complete files between data centres or downloading them in full, clients retrieve only the required data chunks on demand. Users and client applications can request chunked array access over the network and process data incrementally, supporting interactive exploration and scalable downstream computation using cloud-native, chunked storage semantics, while remaining compatible with existing analysis stacks based on Zarr, xarray, and Dask.

How to cite: Hadizadeh, M., Bergemann, M., Lucio Eceiza, E., Fast, A., and Kadow, C.: A Service-Oriented Distributed Zarr Solution for Climate Data Access Across Heterogeneous HPC and Storage Infrastructures, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9640, https://doi.org/10.5194/egusphere-egu26-9640, 2026.

X4.86

EGU26-21900

Supporting remote access to HDF5 datasets

David Hassell, Valeriu Predoi, Bryan Lawrence, Ezequiel Cimadevilla, and Kai Mühlbauer

Programmatic access to remote high-volume multi-dimensional geophysical data was nearly impossible before the advent of high-speed networks and public cloud storage. Even then, data often had to be made "analysis-ready" before such access was possible. However, once analysis ready data is available, remote access becomes possible, with only the bytes needed by the client transferred across a network to the local client. In many cases such access will be faster and more energy efficient than downloading the entire dataset that contains the relevant variables (or parts of variables). Additionally, even when it is not more efficient than downloading data on a case-by-case basis, it may not be possible to locally cache the data, and remote access may be the only possibility. Hence, the notion of analysis ready data has become very popular, and this has often been understood to mean "made available on an object store in Zarr format". However, the key aspects of analysis ready data can be delivered via other interfaces and formats, provided the right software stack is available. Here we present such a stack in the context of how we expect to enable remote access to NetCDF4 data from the upcoming CMIP7 Assessment Fast Track (and other data to be held in the newly upgraded Earth System Grid Federation, ESGF). The new ESGF will expose data via http servers which will support remote range-get to portions of files, which essentially provides the same remote access capabilities as an object store. The requirements for using such a stack, for the new ESGF, and for object stores are (1) the data to be accessed must be appropriately chunked (partitioned into suitably dimensioned hyperslabs), (2) the chunk indices must be efficiently stored, and (3) the reading software using tools such Dask must be fully parallelisable. If either of the first two criteria are not met, data access can be impossibly slow even for relatively small problems, and if the third is not met, large problems cannot be efficiently addressed. To address the first two issues, we present: `cmip7-repack` a tool to ensure that key aspects of the CMIP7 data are chunked appropriatel
y; `pyfive`, a pure-Python thread-safe library for reading HDF data performantly in both serial and parallel applications; and a `pyfive`-enabled version of the `h5netcdf` library for facilitating remote and/or parallel data access using the NetCDF4 API. With these tools we are able to show that reformatting data from the NetCDF4 data preferred by modellers into additional formats such as Zarr, and/or maintaining duplicate copies of chunk indices made by tools such as kerchunk, will no longer be necessary for most workloads.

How to cite: Hassell, D., Predoi, V., Lawrence, B., Cimadevilla, E., and Mühlbauer, K.: Supporting remote access to HDF5 datasets, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21900, https://doi.org/10.5194/egusphere-egu26-21900, 2026.

X4.87

EGU26-12863

A data lakehouse solution for geoscience workflows

Robert Griffioen, Layla Loffredo, Robert-Jan Bood, Raymond Oonk, Els Kuipers, Oliver Schmitz, and Derek Karssenberg

Geoscience research faces enormous data growth, larger, more versatile datasets from satellites, IoT devices, and measurement instruments. To make full use of these data opportunities and the demand for integrated analysis, there is a need for new IT-solutions. SURF, the Dutch national digital infrastructure provider for research and education, is investigating a data lakehouse architecture in the context of an innovation project and the project of the SAGE European Green Deal Data Space (https://www.greendealdata.eu/). In SAGE we collaborate with geoscientists from the Department of Geography at Utrecht University in processing heterogeneous environmental monitoring datasets into data products for further research.

The data lakehouse architecture combines the flexibility of a datalake for handling heterogeneous data and ML workflows with the properties of a database (ACID transactions) and the governance of data warehouses. We explore this architecture using SURF services, like the object store, and open-source software from existing geoscience ecosystems like Pangeo and Earthmover. The exact properties of the data lakehouse depend on the software packages used. We present the lakehouse solution for UU use-case of serving and publishing exposome data products. Currently, data-processing of the data products is handled by a batch service. We will discuss how the lakehouse architecture could be extended to both serve the resulting data products and cover the processing stage and subsequent analysis-workflows.

How to cite: Griffioen, R., Loffredo, L., Bood, R.-J., Oonk, R., Kuipers, E., Schmitz, O., and Karssenberg, D.: A data lakehouse solution for geoscience workflows, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12863, https://doi.org/10.5194/egusphere-egu26-12863, 2026.

X4.88

EGU26-18770

ECS

Energy Efficiency in Cloud-Based Earth Observation Data Processing: Gap Analysis and Research Directions

Adhitya Bhawiyuga, Serkan Girgin, Rolf A. de By, and Raul Zurita-Milla

The Earth observation (EO) community has increasingly adopted cloud platforms for processing large datasets and EO data archives grow by approximately 100 PB annually. However, energy costs and environmental footprint of this processing remain largely invisible. This oversight is particularly contradictory for a community focused on environmental monitoring and climate mitigation. In this study, we present a gap analysis of energy awareness and energy efficiency in cloud-based EO data processing, using Pangeo's Kubernetes-based architecture as a case study. Through literature review and architectural analysis, we identify five interconnected problems that prevent energy-efficient cloud operations in the EO domain.

According to our analysis, the most critical gap is the absence of granular energy observability. While Pangeo deployments on self-managed Kubernetes can access resource metrics, e.g. through Prometheus, they lack energy attribution at the task level. Tools like Kepler provide pod-level power estimates on bare-metal infrastructure but face limitations in virtualized cloud environments where hypervisors restrict hardware sensor access. On fully managed cloud platforms, provider transparency worsens the problem as they offer only monthly service-level carbon footprints. Without this visibility, researchers could optimize workflows solely based on execution time and cost, leaving energy efficiency as an invisible dimension. Furthermore, the EO community lacks standardized benchmarking frameworks for evaluating energy-performance trade-offs in realistic workflows. Researchers reporting energy improvements for specific algorithms cannot provide reproducible comparisons, as different studies use varying datasets, baseline systems, and measurement methodologies.

From system-level perspective, current Kubernetes orchestration policies optimize for resource availability and load balancing but ignore hardware-specific energy profiles. Pangeo deployments consequently distribute workloads across multiple underutilized nodes rather than consolidating them to enable node shutdown. Similarly, Dask schedulers prioritize data locality and workload balance but cannot incorporate energy awareness when assigning tasks. When processing continent-scale mosaicking operations, schedulers could mismatch task characteristics with hardware capabilities by assigning compute-intensive operations to high-power nodes when energy-efficient alternatives could handle the workload.

In order to address these interconnected gaps, we propose a multi-phase research roadmap. The first phase should focus on developing energy monitoring toolkits that synthesize hardware sensors with application profiling and modeling frameworks to account for hidden energy consumption in unmeasured components such as disk I/O and network peripherals. This phase should also establish standardized benchmarking frameworks comprising representative EOBD workflows to enable reproducible energy-performance evaluation across different platforms and algorithms. Building on this measurement infrastructure, subsequent phases should develop predictive models that estimate task-level energy consumption from workflow characteristics and hardware specifications before execution takes place. This model will enable proactive decisions about algorithm selection, hardware provisioning, and resource allocation. The final phase focuses on system-level optimization by designing energy-aware Kubernetes orchestration through workload consolidation and heterogeneous hardware selection. This phase also includes developing multi-objective task schedulers for distributed frameworks like Dask that co-optimize energy consumption, execution time, and cost when assigning tasks to worker nodes. These directions aim to make energy consumption a measurable, optimizable metric in cloud-based EO processing, aligning computational practices with environmental sustainability goals.

How to cite: Bhawiyuga, A., Girgin, S., de By, R. A., and Zurita-Milla, R.: Energy Efficiency in Cloud-Based Earth Observation Data Processing: Gap Analysis and Research Directions, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18770, https://doi.org/10.5194/egusphere-egu26-18770, 2026.

X4.89

EGU26-9862

GridLook: A browser-based ESM data-viewer

Andrej Fast, Tobias Kölling, and Fabian Wachsmann

Earth System Models (ESMs) produce output on a wide range of structured and unstructured grids. However, exploring these heterogeneous datasets remains challenging, often requiring specialized software, access to high-performance computing resources, or time-consuming regridding to regular latitude-longitude grids that can introduce interpolation artifacts.

We present GridLook, an open-source, browser-based WebGL visualization tool that enables interactive exploration of cloud-hosted Zarr datasets directly on their native grids without any software installation. GridLook leverages the Pangeo ecosystem by consuming Zarr stores from any CORS-enabled cloud storage (including S3, Swift, and Google Cloud), making it immediately compatible with FAIR data principles and cloud-native workflows.

Key features include: (1) client-side GPU rendering; (2) automatic grid type detection from CF-compliant metadata, supporting a wide variety of grids; and (3) shareable URLs that encode the complete visualization state including dataset location, variable selection, and view parameters.

The architecture follows a serverless design in which all rendering occurs in the user’s browser, removing the need for backend infrastructure and enabling real-time interaction with large datasets through Zarr's chunked access patterns. By combining cloud-native data formats (Zarr), standardized metadata conventions (CF), and modern web technologies (WebGL), GridLook reduces time-to-plot and supports lightweight, shareable visualization workflows. This facilitates rapid visual inspection of model output by both data users and model developers, enabling quicker communication of spatial features and identification of potential bugs during model development.

The tool is freely available at https://gridlook.pages.dev with source code on GitHub, and we invite community contributions for additional grid types and features.

How to cite: Fast, A., Kölling, T., and Wachsmann, F.: GridLook: A browser-based ESM data-viewer, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9862, https://doi.org/10.5194/egusphere-egu26-9862, 2026.

Discussion Post a comment

X4.90

EGU26-18208

ECS

Ellipsoidal HEALPix for the Earth sciences: healpix-geo and xdggs integration into the Pangeo ecosystem

Justus Magin, Benoît Bovy, Pablo Richard, Jean-Marc Delouis, and Tina Odaka

Discrete Global Grid Systems (DGGS) and in particular HEALPix have become increasingly popular in the Earth sciences over the past few years, mainly due to its equal-area nature and readily available python libraries. However, this adoption and the ever increasing amounts of data come with its own set of challenges. In particular, the existing python libraries are written for use in astronomy and thus only work on a sphere, resulting in small but often non-negligible variations in the cell areas when applied to the surface of the earth.

Additionally, a large spatial coverage at high resolutions require a large amount of memory just to represent the spatial information in memory (e.g. roughly 100GB for full Earth coverage at 100 m resolution).

Finally, the storage format of HEALPix was not standardized until very recently with the release of the CF conventions version 1.13 and the upcoming zarr DGGS convention.

We present healpix-geo, a HEALPix implementation library for python built on top of the cdshealpix, moc, and geodesy rust crates with minimal python dependencies (numpy and optionally shapely). It supports the most common HEALPix indexing schemes (nested, ring, zuniq), allows the conversion of cell indices to and from ellipsoidal coordinates, and contains a range-based data structure suitable to index a large amount of cells with a small memory footprint.

We further show how healpix-geo integrates with xdggs, an xarray extension that enables high-level interaction with DGGS datasets, including efficient subsetting, analysis-ready representations, and visualization within Pangeo workflows.

xdggs also provides an extensible mechanism to easily import/export DGGS data from/to a variety of models or conventions, with built-in support of the CF HEALPix conventions and the zarr DGGS conventions. Together, healpix-geo and xdggs provide an end-to-end, standards-aligned pathway for scalable HEALPix-based geospatial analysis on the ellipsoid.

How to cite: Magin, J., Bovy, B., Richard, P., Delouis, J.-M., and Odaka, T.: Ellipsoidal HEALPix for the Earth sciences: healpix-geo and xdggs integration into the Pangeo ecosystem, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18208, https://doi.org/10.5194/egusphere-egu26-18208, 2026.

X4.91

EGU26-17943

ECS

Modeling orthorectification and PSF distortions on a HEALPix grid

Pablo Richard, Jean-Marc Delouis, Justus Magin, and Tina Odaka

The quality of analysis-ready Earth Observation (EO) products strongly depends on the ability of the processing chain to accurately model the mapping from the Earth’s surface to the detector geometry. This mapping involves a convolution with the instrumental Point Spread Function (PSF) and an orthorectification step that corrects for terrain-induced geometric distortions using a Digital Elevation Model (DEM).

These distortions are challenging, as they can lead to spatially varying PSF deformations. Even worse, some areas with strong topographic gradients lead to an effective PSF with multiple modes and may cause the failure of standard operational orthorectification algorithms.

To anticipate these failures, we introduce critical incidence maps. For a given point on a DEM, such a map provides the maximum sensor incidence angle at which this point remains visible, i.e. not occluded by surrounding terrain in any azimuthal direction. We show that, for moderate incidence angles (typically below ~20°), rugged areas responsible for orthorectification failures cover only a very small fraction of Earth’s surface, leaving therefore ample room for more robust algorithms to tackle these thorny points.

For smooth regions, we introduce and compare various semi-analytical orthorectification schemes that achieve appealing trade-offs between computational cost and geometric precision. We then combine two distinct orthorectification strategies, tailored respectively to smooth and rugged terrain, and express the overall ground-to-detector mapping as a sparse linear operator.

This formulation yields an efficient forward model that accurately captures terrain-induced PSF distortions, including multimodality. Finally, we apply this model to invert the data projection, from the detector to the Earth’s surface, in the context of a HEALPix discrete grid.

How to cite: Richard, P., Delouis, J.-M., Magin, J., and Odaka, T.: Modeling orthorectification and PSF distortions on a HEALPix grid, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-17943, https://doi.org/10.5194/egusphere-egu26-17943, 2026.

X4.92

EGU26-18919

JupyterGIS: A Flexible, Open-Source Platform for Geospatial Analysis

Sylvain Corlay, Matthias Meschede, Martin Renou, Gregory Mooney, and Arjun Verma

JupyterGIS is an open-source web-GIS (Geographic Information System) designed to bring the iterative and interactive workflows of Jupyter to geospatial data analysis. By leveraging the Jupyter ecosystem, it seamlessly interleaves code and visualization, providing access to the vast range of existing geospatial libraries and interfaces.

The architecture of JupyterGIS is based on a single, serializable JSON project document that encapsulates all project information. This document is
implemented as a collaborative Conflict-free Replicated Data Type (CRDT), a "ydoc," ensuring real-time synchronization when edited by multiple instances or components simultaneously. This design enables teams to collaboratively work on geospatial data in real time, a feature particularly valuable for organizations. Additionally, it opens possibilities for coediting with LLM-based AI agents, greatly expanding the potential for automation and advanced analysis.

JupyterGIS offers very flexible deployment options. It can run on high-performance backend servers, including scalable Kubernetes clusters, to handle large-scale datasets and computationally intensive tasks, such as those commonly encountered in Earth Observation applications. Or, it can be deployed as a static website via WebAssembly and JupyterLite, executing computations directly in the user's browser. The latter eliminates the need for any backend infrastructure, making JupyterGIS suitable for creating embeddable, highly scalable, and accessible applications, such as lightweight embedded maps.

Young, initiated in 2024, JupyterGIS is a rapidly growing project that has garnered significant attention, community contributions, and organizational support, bundled in the Pangeo and GeoJupyter initiatives. As a fully open-source and sovereign solution, it provides a self-hostable alternative to proprietary platforms. This is particularly advantageous for handling sensitive data, as all components are auditable and under the user's control. Its modular and extensible architecture also ensures easy integration into existing systems and adaptability to new use cases.

JupyterGIS thus serves multiple roles for working with geospatial data: as a local or remote Integrated Development Environment (IDE), as an interface integrated into large-scale organizational portals, and as an embedded solution for small maps and web applications.

Our overview of JupyterGIS will include its underlying architecture, showcase the UI and features with examples, comparing its strengths and weaknesses to other platforms. The goal is to provide a comprehensive understanding of this novel tool, enabling listeners to assess its applicability to their use cases and to guide them on how to get started.

How to cite: Corlay, S., Meschede, M., Renou, M., Mooney, G., and Verma, A.: JupyterGIS: A Flexible, Open-Source Platform for Geospatial Analysis, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18919, https://doi.org/10.5194/egusphere-egu26-18919, 2026.