ESSI3.2 | From Principles to Practice: Community-Driven Approaches to FAIR, Open Data and Interoperable FAIR Digital Objects in Earth and Environmental Sciences
From Principles to Practice: Community-Driven Approaches to FAIR, Open Data and Interoperable FAIR Digital Objects in Earth and Environmental Sciences
Co-organized by HS13/OS4/SM9, co-sponsored by AGU
Convener: Alice Fremand | Co-conveners: Shelley Stall, Lesley Wyborn, Marco Kulüke, Natalie Raia, Ivonne Anders, Anne Fouilloux
Orals
| Thu, 07 May, 14:00–18:00 (CEST)
 
Room -2.33
Posters on site
| Attendance Fri, 08 May, 16:15–18:00 (CEST) | Display Fri, 08 May, 14:00–18:00
 
Hall X4
Posters virtual
| Mon, 04 May, 14:09–15:45 (CEST)
 
vPoster spot 1b, Mon, 04 May, 16:15–18:00 (CEST)
 
vPoster Discussion
Orals |
Thu, 14:00
Fri, 16:15
Mon, 14:09
Making data Findable, Accessible, Interoperable and Reusable (FAIR) is now widely recognised as essential to advance open and reproducible research. Increasingly, this requires not only shared principles but also concrete digital implementations that enable interoperability across systems, disciplines, and infrastructures. However, it is very difficult to translate these principles into practical data management guidelines or operational digital solutions across disciplines. The goal of the session is to explore how best data management practices are developed, implemented, and adopted across disciplines, including through interoperable digital objects, persistent identifiers, and emerging data space concepts.

As part of this session, we invite submissions that:
1) Share good or bad experiences developing, implementing, and adopting data practices that align with both FAIR principles and the evolving needs of specific research communities.
2) Propose strategies for engaging researchers in adopting and refining best practices, with a focus on community-driven approaches to technical standards and infrastructures.
3) Explore the role of cultural change in enabling adoption of sustainable data practices.
4) Highlight efforts that harmonise data formats and workflows across disciplines while respecting domain-specific requirements, for example through interoperable data architectures or research data spaces.
5) Present technical or conceptual approaches that support the transition from data silos to interoperable, FAIR-aligned data ecosystems.

This session is aligned with the objectives of the Research Data Alliance (RDA) Earth, Space, and Environmental Sciences (ESES) Data Community of Practice and aims to foster cross-disciplinary dialogue, particularly among researchers in hydrology, seismology, and ocean sciences. However, we welcome contributions from all disciplines, especially where they provide insights or novel approaches to community engagement.
By learning from diverse experiences, this session seeks to advance collective understanding of how to build and sustain data practices that are both FAIR and fit for purpose, from community processes to concrete, interoperable digital implementations.

Orals: Thu, 7 May, 14:00–18:00 | Room -2.33

The oral presentations are given in a hybrid format supported by a Zoom meeting featuring on-site and virtual presentations. The button to access the Zoom meeting appears just before the time block starts.
Chairpersons: Marco Kulüke, Ivonne Anders, Anne Fouilloux
14:00–14:05
14:05–14:15
|
EGU26-7717
|
solicited
|
On-site presentation
Claus Weiland, Lena Perzlmaier, Daniel Bauer, Jonas Grieb, Julian Oeser, Taimur Khan, Sharif Islam, and Niels Raes

The EU’s Biodiversity Strategy for 2030, a core part of the European Green Deal, addresses the complex relationship between human society and its environment by prioritizing the restoration of ecosystems and building resilience against climate change, deforestation, and biodiversity loss.

These environmental stressors do more than just degrade ecosystems; they create a pressing need for policymakers, researchers, and society to actively track and mitigate ecological shifts. In order to design effective mitigation strategies, new political frameworks and massive simulation infrastructures are being developed with the aim to establish a common European Green Deal Data Space. The involved initiatives rely on the integration and standardization of diverse, large-scale datasets, ranging from long-term biodiversity records (e.g., eDNA) to real-time IoT sensor data (e.g., camera traps) and global Earth observation (EO) data combined with model-derived reanalysis datasets like ERA5.

‘Biodiversity Meets Data’ (BMD) is a Horizon Europe project delivering a unified access point for AI-assisted biodiversity monitoring and cross-realm (terrestrial, marine, freshwater) analysis tools representing a key contribution to the thematic expansion of the European Green Deal Data Space ecosystem. By providing a robust technical infrastructure, BMD facilitates the quantification of diverse ecological pressures - ranging from climate change to land-use shifts - on biodiversity. The project is strategically focused on the EU Natura 2000 network, equipping stakeholders such as conservation managers and policy makers with the necessary tools to implement and evaluate EU Nature Directives such as the Birds and Habitats Directives.

In this talk, we will present how BMD leverages FAIR Digital Objects (FDOs) and data space concepts around governance, licensing, and provenance tracking to synthesize computational workflows and diverse datasets into actionable knowledge units (“Workflow Run RO-Crate”, Figure 1). We will demonstrate our implementation path for such data-rich, self-contained digital containers building on web-based technologies such as RO-Crate (lightweight data packages) and FAIR Signposting (machine-interpretable layer describing resources). Those webby FDOs are designed to bridge the gap between practical needs of conservation stakeholders such as supporting data-driven decision making and technical capabilities of the Green Deal Data Space ecosystem.

Integration of targeted feedback from stakeholders, notably Natura 2000 site managers, into our development process ensures that the FAIR-compliant data products and FDO service framework are not only technically robust, but also socially and politically actionable.

 

Figure 1. Throughout its life cycle in the BMD data space, data is represented as RO-Crate. Initially (left), the data and the computational workflow are bundled as Workflow RO-Crate. Following processing, this is combined with the results and enriched with retrospective provenance and metadata to form a Workflow Run RO-Crate (right). Finally, these are presented as webby FAIR Digital Objects, incorporating a machine-interpretable layer based on FAIR Signposting (bottom).

 

How to cite: Weiland, C., Perzlmaier, L., Bauer, D., Grieb, J., Oeser, J., Khan, T., Islam, S., and Raes, N.: Integrating biodiversity in Situ data, Earth observation and stakeholder engagement - from machine- to policy-actionability, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7717, https://doi.org/10.5194/egusphere-egu26-7717, 2026.

14:15–14:25
|
EGU26-9158
|
On-site presentation
Hannes Thiemann, Ivonne Anders, Marco Kulueke, Beate Kruess, and Karsten Peters-von Gehlen

FAIR Digital Objects (FDOs) provide an actionable framework for implementing the FAIR principles by combining persistent identifiers with machine-readable metadata, explicit typing, and structured relations. The FDO Forum, as an open, community-driven initiative, develops and coordinates specifications and reference concepts to support interoperable digital objects across infrastructures. A key challenge, however, is demonstrating how these specifications can be applied in practice within existing data ecosystems, where established domain standards and evolving collections must be integrated rather than replaced.

In this contribution, a practical implementation of FDO specifications is presented using the SpatioTemporal Asset Catalog (STAC) as an example. As a widely adopted standard for spatio-temporal data, STAC's modular design makes it an ideal bridge between established community practices and the FDO paradigm. The demonstration shows how STAC objects are transformed into typed FDOs using Handle-based PIDs and registered object types via a Data Type Registry (DTR). This approach enables machine-actiolnable navigation and interpretation that transcends domain-specific tooling.

The approach is illustrated using a STAC-based catalog developed at the German Climate Computing Center (DKRZ), reflecting typical characteristics of climate research and climate modelling data, such as evolving and versioned collections and multiple levels of aggregation. The focus is on the practical application of FDO specifications, illustrating how typing, identifiers, and relations can be introduced in a standards-compliant manner without disrupting existing infrastructures, while enabling stable referencing, automated discovery, and seamless integration into data-processing workflows.

The results show that implementing FDO specifications through STAC is a pragmatic and transferable pathway from specification-level concepts to operational adoption. The implementation enables the creation of interoperable, machine-actionable data spaces while building on established standards and tooling, and provides lessons learned for other infrastructures aiming to operationalize FAIR Digital Objects in practice.

How to cite: Thiemann, H., Anders, I., Kulueke, M., Kruess, B., and Peters-von Gehlen, K.: Making STAC FDO-ready: A Practical Path toward FAIR Digital Objects in Geoscientific Data Spaces, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9158, https://doi.org/10.5194/egusphere-egu26-9158, 2026.

14:25–14:35
|
EGU26-16338
|
On-site presentation
Tim Rawling, Angus Nixon, Bryant Ware, Alex Hunt, Jens Klump, Anusuriya Devaraju, Rebecca Farrington, and Lesley Wyborn

As the volume and complexity of Earth science data continues to grow, driven by the availability of advanced instrumentation and requirement for new approaches to address geoscience questions and challenges, there is an increasing need for robust, end-to-end approaches to data management across the full data life cycle. Earth science datasets are, however, notoriously heterogeneous, spanning disciplines from geochemistry to geophysics and Earth observation, at observation levels from nanoscale to global, and amassing data volumes from megabytes to multi-petabyte collections. Yet for the vast majority of these datasets, the ‘raw’ observations collected by instrumentation, or Primary Observational Datasets (PODs), are not routinely reported or associated with the downstream, analysis-ready data products used to inform scientific or policy decisions. To enable reproducible and repurposable science particularly in a context where technical advances continue to push the data requirements upstream towards the primary observations, these PODs must be preserved for potential future applications and linked with the outputs they underpin.  

AuScope is Australia’s national geoscience research infrastructure funded through the National Collaborative Research Infrastructure Strategy (NCRIS), supports the geoscience community by providing data, data products, and software that align with the FAIR and CARE principles. Recognising that a single, monolithic repository cannot serve all disciplines, data types, or user communities, AuScope is developing an Earth Science Data Ecosystem that enables seamless access to PODs hosted across high-performance compute–data (HPC-D) and cloud environments, and provides pathways to connect raw observational data with curated, analysis-ready products delivered through distributed platforms and portals. A critical component of this ecosystem is strengthening digital infrastructure at the point of data generation and associating that primary observation with the published output. To address persistent challenges associated with manual data transfer, incomplete metadata capture, and limited long-term reuse, AuScope has embarked on the scoping and implementation of an Australian-first repository and capture system for PODs in geochemistry. By strengthening digital infrastructure at the point of data generation and embedding standards throughout the data life cycle, this work supports more efficient, interoperable, and collaborative Earth science research, maximising the long-term value of publicly funded data. 

How to cite: Rawling, T., Nixon, A., Ware, B., Hunt, A., Klump, J., Devaraju, A., Farrington, R., and Wyborn, L.: Today’s research for tomorrow’s challenges – building national research infrastructure across the full data life cycle, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16338, https://doi.org/10.5194/egusphere-egu26-16338, 2026.

14:35–14:45
|
EGU26-9425
|
ECS
|
On-site presentation
Adam Rynkiewicz, Raul Palma, Paulina Poniatowska-Rynkiewicz, and Malgorzata Wolniewicz

Achieving higher levels of FAIR-ness for research artefacts demands not only structured packaging but also semantic enrichment that links textual resources to knowledge bases. ROHub, a reference platform implementing the Research Object paradigm, enables scientists to package and share research outputs as structured Research Object Crates (RO-Crates) - combining data, methods, software, and associated metadata into a unified, machine-processable entity. 

While RO-Crates inherently improve metadata richness and FAIR compliance by aggregating diverse resources with persistent identifiers and schema-based annotations, many research outputs still contain unlinked textual artefacts (e.g., reports, questionnaires, narratives) whose contextual semantics remain underutilized. Manual semantic annotation to link these textual elements to external knowledge bases - such as domain ontologies or vocabularies - is time-consuming and error-prone, yet crucial for enhancing findability, semantic interoperability, and machine-actionability. 

To address this gap, we extend ROHub with an automated semantic annotation service that identifies entities within text resources and links them to relevant knowledge bases, producing enriched metadata that feeds back into the RO-Crate structure. This service integrates entity linking techniques to reduce manual curation overhead and systematically increase the FAIRness and discoverability of research objects - making them more accountable to machine discovery, integration, and automated workflows. The result is a FAIR research object ecosystem where textual content, semantic context, and structured metadata co-exist in a machine-processable form, enhancing both human and computational reuse.

How to cite: Rynkiewicz, A., Palma, R., Poniatowska-Rynkiewicz, P., and Wolniewicz, M.: Advancing FAIR Digital Objects for Machine-Actionable Research: Integrating Semantic Enrichment in Research Object Ecosystems, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9425, https://doi.org/10.5194/egusphere-egu26-9425, 2026.

14:45–14:55
|
EGU26-5601
|
ECS
|
On-site presentation
Lukas Kluft and Tobias Kölling

During field campaigns, timely data sharing across distributed teams is essential, yet access to central repositories is often constrained by limited bandwidth. As a result, preliminary datasets are frequently exchanged offline, which commonly leads to confusion about dataset versions once post-campaign releases occur.

We present a proof-of-concept to campaign data dissemination based on content-addressable storage. During the ORCESTRA campaign, observations were converted into analysis-ready Zarr stores and published via the InterPlanetary File System (IPFS). By accessing data through immutable content identifiers (CIDs), teams can use datasets offline in the field while ensuring that the exact same, verifiable data objects remain accessible after the campaign.

To improve discoverability and usability, we developed the ORCESTRA Data Browser, which dynamically generates dataset landing pages by fetching metadata client-side directly from IPFS. Together, these components demonstrate how decentralized, content-addressed data access can support version clarity, reproducibility, and robust data sharing for field campaigns and beyond.

How to cite: Kluft, L. and Kölling, T.: Providing analysis-ready campaign data via the InterPlanetary File System, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5601, https://doi.org/10.5194/egusphere-egu26-5601, 2026.

14:55–15:05
|
EGU26-15996
|
Virtual presentation
Julia Martin, Kerry Levett, and Hamish Holewa

Australian environmental, biodiversity and climate research generates vast and diverse datasets from a wide variety of organisations across the research, government, public and private sectors: all with significant potential to inform research, management and policy. However, these data are frequently stored across multiple institutional and government repositories that lack consistent governance, adequate rich metadata and consistent application of externally-agreed community standards that are fundamental to machine-to-machine discovery and interoperability. As a result, valuable long-tail data remain difficult to find, access and reuse, limiting their impact and hindering translation into decision-making and environmental management. National consultation led by the Australian Research Data Commons (ARDC) confirmed that poor discoverability of domain-specific data is a major barrier to research progress and evidence-based decision-making .

The Domain Data Portals (DDP) program, delivered through the ARDC Planet Research Data Commons, addresses this challenge by improving access to FAIR (Findable, Accessible, Interoperable and Reusable) environmental and climate data held in distributed repositories. The program equips data stewards with tools and capabilities to make long-tail datasets FAIR for knowledge creation. This program partners with the National Environmental Science Program (NESP), Australia’s longest-running environmental research initiative, and the Australian Plant Phenomics Network (APPN). NESP is led by the Australian Government Department of Environment, Climate Change, Energy and Water (DCCEEW) and has 29 research partner organisations. NESP has four hubs in different environmental disciplines: 1)marine and coastal, 2) terrestrial ecology, 3) waste and sustainability, and 4) climate systems. APPN is an Australian National  Collaborative Research Infrastructure Strategy (NCRIS) Facility with nine research nodes. The DDP program is working with data managers across the nodes and disciplines to harmonise data formats and workflows while respecting domain-specific requirements.

The program is delivering cohesive, domain-level discovery of NESP and APPN research outputs through a dedicated portal within ARDC Research Data Australia, which is a metadata aggregation service that enables findability, accessibility, and reuse of data for research from over one hundred Australian research organisations, government agencies, and cultural institutions. To enable Research Data Australia to programmatically harvest the NESP and APPN metadata into the relevant portal, ARDC and the DDP project leads have worked with the  institutions and repositories in scope to develop guidelines on how to include relevant Persistent Identifiers in the metadata for their funded research outputs and ensure rich FAIR-compliant metadata. By developing rich, standardised metadata for all project outputs and leveraging national infrastructure, including persistent identifiers, controlled vocabularies and data publishing services, the DDP program enables robust, efficient aggregation and national discoverability of datasets.

This approach supports consistent adoption of community standards and enhances data visibility, integration and reuse. The Domain Data Portals approach can be applied to other research communities in Australia to make their data FAIR, leveraging components of ARDC’s national information infrastructure.

How to cite: Martin, J., Levett, K., and Holewa, H.: Enhancing discoverability and impact of dispersed data through persistent identifiers in Australia, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-15996, https://doi.org/10.5194/egusphere-egu26-15996, 2026.

15:05–15:15
|
EGU26-13673
|
On-site presentation
Juliano Ramanantsoa, Angelo Strollo, Florian Haslinger, Javier Quinteros, Daniele Bailo, Otto Lange, Samshuijzen Laurens, Sven Peter Naesholm, and Mathilde B. Sørensen

The conceptual clarity of any scientific field depends fundamentally on the precision and standardisation of its terminology. Prior studies have shown that an absence of standardized terminologies can lead to interpretive ambiguity, imprecise outputs, and divergent interpretations across research communities. In seismology, terminologies remain scattered across institutional glossaries, impeding data FAIRness (Findability, Accessibility, Interoperability, and Reusability), metadata consistency, and collaboration with adjacent fields such as  transdisciplinary research and AI engineering.

This work, carried out within the Geo-INQUIRE* project, introduces a vocabulary generation framework and a prototype database implementing three integrated innovations that consolidate the sparse seismological terminologies into a structured, machine-readable format: i) authority-first retrieval, ii) AI-mediated semantic triangulation, and iii) participatory expert governance.

The authority-first pathway performs weighted, priority-ranked extraction from eight expert-curated data centre sources (including FDSN, USGS, EarthScope, EPOS, and other relevant documents from the community), ensuring that the definitions originate from trusted references. The AI fallback pathway is activated only when authoritative retrieval fails, employing a semantic triangulation method in which three large language models - such as OpenAI's GPT-5.2, Anthropic's Claude Opus 4.5, and Google's Gemini 3 - independently generate candidate definitions. Embedding-based similarity analysis determines synthesis eligibility; if cross-model agreement falls below 50 percent, an expert flag is raised to prevent semantic uncertainty. When synthesis proceeds, a transparent concept-merging process extracts common and unique contributions from each model, recording all reasoning steps and preserving full provenance, overcoming a critical limitation of black-box AI knowledge generation.

Beyond technical generation, this work embeds vocabulary development within a participatory framework that transforms terminology from static definitions into community-validated knowledge. Through structured digital deliberation involving more than ten domain experts via a GitHub-based workflow, the approach delivers transparency, auditability, and collective ownership. Experts validate AI-retrieved content, resolve edge cases, and steward terminology evolution through documented discussion threads, ensuring definitions reflect both institutional authority and practitioner consensus while fostering public trust in seismology.

The system produces vocabulary encoding scheme-compliant entries with dual definitions: an authoritative version weighted by source priority, and an AI-synthesized alternative with full provenance. The source-weighting mechanism is fully flexible ensuring the reusability of the framework. Applied to over 500 terms across 4 thematic clusters, this framework demonstrates that AI can systematically extend vocabulary completeness while participatory governance safeguards epistemic integrity. By coupling algorithmic precision with community oversight, this framework strengthens data discovery, metadata coherence, and research infrastructure interoperability across European and international seismological networks that advance transparent, reproducible, and interoperable seismological science.

*Geo-INQUIRE (Geosphere INfrastructures for QUestions into Integrated REsearch) is funded by the European Union (GA 101058518).

 

 

How to cite: Ramanantsoa, J., Strollo, A., Haslinger, F., Quinteros, J., Bailo, D., Lange, O., Laurens, S., Naesholm, S. P., and Sørensen, M. B.: Bridging fragmented terminologies: advancing vocabulary harmonization in Seismology through AI and community co-creation, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13673, https://doi.org/10.5194/egusphere-egu26-13673, 2026.

15:15–15:25
|
EGU26-20023
|
On-site presentation
Emanuel Soeding, Dorothee Kottmeier, Andrea Poersch, Stanislav Malinovschii, Johann Wurz, and Sören Lorenz

At the Helmholtz Association, we aim to establish a harmonized data space that connects information across distributed infrastructures. Ideally, this should work within and beyond our organization. Achieving requires standardizing dataset descriptions using suitable metadata. A handy strategy is, to use persistent identifiers (PIDs) and their metadata records to harmonize central parts of the metadata. This will ensure a first level of interoperability and machine actionability even between discipline-unrelated datasets.

While harmonizing PID metadata is a key step, practical implementation depends on a number of factors: 1. Leadership, to support the necessary change processes, 2. A general awareness of roles and responsibilities across the whole research organization, 3. An implementation plan that prioritizes tasks, identifies the right people and interfaces, and specifies the tools and services required to record metadata. 4. An implementation group comprising people with the relevant expertise to implement and communicate the change process, 5. Informational material and training, to onboard the ones who are affected by change, 6. an organization's management supporting the upcoming change, and 7. Funding to be able to overcome the initial obstacles and get everything up and running.

For example, ORCID identifies research contributors. While often associated with publishing scientists, other contributors—such as technicians, data managers, and administrative staff—also play vital roles. Their contributions are often overlooked or not systematically recorded. To change this, PID workflows should begin early, ideally at the hiring stage, to ensure people's roles are captured and linked to datasets.

Similarly, the PIDINST system—developed by an RDA working group—provides unique identifiers for scientific instruments. It includes a simple schema for recording key metadata about instruments, enabling the reliable identification of measurements made with specific devices. Here, workflows should begin with instrument acquisition and include responsibilities for updating metadata, typically assigned to technicians.

In this presentation, we propose tailored PID workflows involving key stakeholder groups within Helmholtz. We outline strategies for implementing ORCID, ROR, PIDINST, IGDS, DataCite and CrossRef DOIs and assign responsibilities for metadata curation. Our goal is to embed PID usage in day-to-day research processes across all centers of our organization and clarify stakeholder roles, thereby strengthening metadata quality and data interoperability of our metadata.

How to cite: Soeding, E., Kottmeier, D., Poersch, A., Malinovschii, S., Wurz, J., and Lorenz, S.: Paving the Road to FAIR – Strategies and Considerations to activate PIDs in a large Organization, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-20023, https://doi.org/10.5194/egusphere-egu26-20023, 2026.

15:25–15:35
|
EGU26-7771
|
ECS
|
On-site presentation
Abigail Nalesnik, Kristi Wallace, Andrei Kurbatov, Kerstin Lehnert, and Stephen Kuehn

The tephra research community spans diverse disciplines—from volcanology to archaeology—but faces persistent challenges due to fragmented databases and limited data accessibility. To address these issues, the global tephra community has developed best practices for standardized data collection and reporting, documented in Wallace et al. (2022; zenodo.org/records/6568306). These guidelines and templates for physical and geochemical datasets promote FAIR principles by improving data consistency, discoverability, and interoperability. Implementing these practices can significantly enhance multidisciplinary research and foster collaboration.

To advance data discovery and accessibility, the tephra community has partnered with the Interdisciplinary Earth Data Alliance (IEDA²) to create the Tephra Information Portal (TIP). TIP serves as an integrated framework that connects tephra data from existing cyberinfrastructures—such as EarthChem, PetDB, GeoDIVA, SESAR, TephraBase, and StraboSpot—allowing users to search across tephra platforms using common criteria, enhancing data findability and reuse. Standardized data submissions to these platforms are therefore critical for improving the findability of samples and datasets through TIP, and their adoption is strongly encouraged by the tephra community.

How to cite: Nalesnik, A., Wallace, K., Kurbatov, A., Lehnert, K., and Kuehn, S.: Standardizing and encouraging best practices in tephra sample and data collection, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7771, https://doi.org/10.5194/egusphere-egu26-7771, 2026.

15:35–15:45
|
EGU26-3293
|
Virtual presentation
Wouter Addink and Sharif Islam

In recent years, significant progress has been made in digitizing natural history collections using increasingly industrialized workflows involving conveyor belts, digital camera setups, robotics and Artificial Intelligence (AI). Also, new technologies became available to analyse the specimens. Analysis of both biodiversity and geodiversity samples has shifted from destructive analysis to non-destructive, high-resolution, and automated techniques accelerating the creation of new information.However, the resulting data is often fragmented across systems and repositories. Efforts to reconnect these data to the original specimen or derived samples frequently fail because identifiers were missing at the time of analysis, are not globally unique, change over time, or are referenced incorrectly. These issues can be solved by maintaining a digital object on the internet that is created at the time of collecting the sample, which contains contextual information and (links to) its derived data as this becomes available. This is called a Digital Specimen and different entities(human or machine) who create an analysis can add information to the digital object. A one-to-one relationship between the physical sample preserved as a specimen can be kept by giving the physical objecta persistent identifier like an IGSN, International Generic Sample Number. The digital object also gets a persistent identifier: a Digital Specimen identifier in the form of a FAIR Digital Object compliant DOI (Digital Object Identifier).

The Digital Specimen is a citable, machine-actionable proxy for physical specimens that is FAIR by design (FAIR Digital Object compliant) and has a Persistent Identifier (PID) in the form of a DOI to create a self-contained unit of knowledge. This design enables seamless linkage to derived data—such as chemical analysis, digital media, and publications. To implement this, DiSSCo (Distributed System of Scientific Collections) developed the open Digital Specimen (openDS) specification. By integrating community standards like Darwin Core with W3C PROV-O and JSON-LD, openDS provides a common semantic language for global interoperability.

DiSSCo is currently in transition from its project phase into becoming an operational European Research infrastructure. It has already created the first millions of FDO-compliant Digital Specimens and has developed infrastructure to allow the annotation of these digital objects with new data or improvements, either by humans or machines. AI fueled Machine Annotation Services (MAS) developed by third parties can operate in the infrastructure for analysis of the data or knowledge extraction from specimen images. 

In the presentation we will show how the FDO design supports advanced capabilities like multiple redirect to different digital representations for either human or machine, versioning and provenance to allow mutable objects, tooltips in journal systems that show contextual information about a referred sample in a publication through the PID record, and machine actionable metadata that supports machines to act on the data.

How to cite: Addink, W. and Islam, S.: DiSSCo's Vision Applied: (Re-)connecting Fragmented Specimen Data through FAIR Digital Objects, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-3293, https://doi.org/10.5194/egusphere-egu26-3293, 2026.

Chairpersons: Alice Fremand, Lesley Wyborn, Shelley Stall
16:15–16:20
16:20–16:30
|
EGU26-13278
|
solicited
|
On-site presentation
Martina Stockhause, José Manuel Gutiérrez, Ezequiel Cimadevilla Alvarez, Maialen Iturbide, Lina Sitz, and Antonio S. Cofiño

The FAIR principles — Findable, Accessible, Interoperable, and Reusable — underpin Open Science but does not fully ensure the long-term usability of interactive data services like the IPCC WGI Interactive Atlas. Drawing on lessons learned from developing and operating the Interactive Atlas, this presentation explores the challenges of sustaining such services, which rely not only on FAIR-compliant data and software but also on continuous stewardship, infrastructure maintenance, and institutional commitment.

Scientific quality and transparency of the Interactive Atlas are supported through expert assessment by the IPCC authors, provenance documentation, and Complex Citation, which combine the attribution of credit for assessed digital objects with the traceability of digital IPCC results. Yet, sustaining reliability requires ongoing stewardship of both data and software to prevent degradation and preserve reproducibility. Addressing these needs demands joint efforts of the IPCC Data Distribution Centre (DDC) Partners to maintain data, documentation, and interactive components for a diverse user community. FAIR alone is not enough — long-term data preservation and infrastructure maintenance are essential to ensure the sustainability and trustworthiness of interactive data services in Earth system science.

By reflecting on both the successes and limitations of the Interactive Atlas, this contribution offers insights relevant to other Earth system science communities developing interactive or service-oriented data products. These approaches are also applicable to fields beyond Earth system science. 

How to cite: Stockhause, M., Gutiérrez, J. M., Cimadevilla Alvarez, E., Iturbide, M., Sitz, L., and Cofiño, A. S.: Is FAIR Sufficient for Interactive Data Services? Ensuring Sustainability and Reliability of the IPCC WGI Interactive Atlas, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13278, https://doi.org/10.5194/egusphere-egu26-13278, 2026.

16:30–16:40
|
EGU26-5023
|
On-site presentation
María Piquer-Rodríguez, Esther Asef, Sophia Reitzug, and Andreas Hübner

The Earth, Space, and Environmental Sciences are research disciplines in which a large amount of research data is generated and in which the principles of FAIR and open data are now receiving considerable attention.

Both FAIR and open data aim to enable and enhance the reusability of data, but before research data can be made available for broad reuse, it is essential to clarify rights and permissions: who is authorized to share the data and with whom, who may publish it, how credit for data-related work will be attributed, and what arrangements apply if a researcher transfers to another institution.

Concrete regulation of usage rights for research data continues to pose major challenges for researchers and research institutions alike. There are legal uncertainties due to room for interpretation in the general legal requirements, and in many cases, there are no systematised workflows for defining usage rights. To close this gap, a working group at the Department of Earth Sciences at Freie Universität Berlin has developed and implemented a ‘Data Agreement’ that provides clarity on the exercise of usage rights to research data within the group (for students and researchers) and also helps to operationalise FAIR and CARE principles in everyday research practice.

The ‘Data Agreements’ are used as an opportunity to discuss expectations regarding data management and to define and agree on binding rights of use for research data with each new member of the group or student´s thesis projects. We present the key aspects of the ‘Data Agreements’ and report on practical experiences with their use. We show how it not only facilitates clear agreements and prevent subsequent disagreements. In addition to legal aspects, practical aspects such as backup strategies or storage locations can also be specified within this process and thus improve the data management practice within the group.

The ‘Data Agreements’ [1] were developed in the working group together with the Research Data Management team and the university's legal office and are available under CC0 for reuse in other research groups or institutions. While the agreements were developed within a university context and relate to German academic practice and law, they may be reused or serve as templates for other research institutions, in other national or international contexts, and over a wide variety of Earth, Space, and Environmental Sciences disciplines and beyond.

[1] http://dx.doi.org/10.17169/refubium-46356

How to cite: Piquer-Rodríguez, M., Asef, E., Reitzug, S., and Hübner, A.: Using “Data Agreements” in universities to clarify research data rights of use, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5023, https://doi.org/10.5194/egusphere-egu26-5023, 2026.

16:40–16:50
|
EGU26-21754
|
On-site presentation
Kaylin Bugbee, Deborah Smith, Emma Koontz, Rhea Bridgeland, Emily Foshee, Jaclyn Stursma, Dhanur Sharma, Rishab Dey, and Fred Kepner

Effective data governance requires a collective approach rather than isolated efforts. To achieve this, the NASA Science Mission Directorate (SMD) governance team—part of the Data and Analysis Services Project (DASP)—is implementing a strategy to support the Chief Science Data Officer’s vision for interdisciplinary, interoperable open science. The DASP governance team focuses on several key functions. First, the governance team has developed a framework to create governance and guidance for the data, information, and software used across the SMD community to ensure compliance with agency and government policies. The current governance model employs a rapid-response approach, using focused initiatives to identify high-priority needs and develop practical solutions. Second, the DASP governance team works to streamline operations and reduce friction for scientists and data stewards by utilizing automation and targeted training. Third, the DASP governance team is building a robust community of data repositories to empower open science and foster collaboration between divisions. To enhance these efforts, DASP has launched a centralized online hub designed to strengthen connections between SMD data stewards. This centralized platform allows for governance initiative reviews, community updates and sharing of relevant resources. This presentation will share the high-level SMD governance process, the development of the centralized governance community platform, and lessons learned from the first initiatives developed via the governance process. 

How to cite: Bugbee, K., Smith, D., Koontz, E., Bridgeland, R., Foshee, E., Stursma, J., Sharma, D., Dey, R., and Kepner, F.: Collaborative Governance Solutions for NASA Science Mission Directorate (SMD) Data, Information, and Software, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21754, https://doi.org/10.5194/egusphere-egu26-21754, 2026.

16:50–17:00
|
EGU26-6870
|
ECS
|
On-site presentation
Daniel Nüst, Anne Sennhenn, Jörg Seegert, Andreas Hübner, Khabat Vahabi, Stephan Hachinger, Markus Möller, Carsten Hoffmann, Lars Bernard, James M. Anderson, Sarah Fischer, Markus Reichstein, Mélanie Weynants, Carsten Keßler, Katharina Koch, Klaus-Peter Wenz, Nicole van Dam, and Babette Regierer

Many research communities and disciplines undergo a transformation towards promoting, facilitating, and recognising FAIRness (Wilkinson et al., 2016; https://doi.org/10.1038/sdata.2016.18) and Openness in Research Data Management (RDM) practices. These transformations require buy-in from stakeholders at multiple levels and warrant many conversations between all roles to be sustainable. One approach to facilitate  and document the requested stakeholders’ ownership is the use of so-called commitments, where public endorsements by individuals or organisations serve as a driver to normalize desirable practices and offerings. Commitments can establish a community norm, whose practices may eventually turn into standards, requirements and guarantees.

The Earth System Sciences (ESS) consortium of the German Research Data Infrastructure (NFDI) programme, NFDI4Earth (https://nfdi4earth.de/), and the NFDI consortium for the agrosystems research community, FAIRagro (https://fairagro.net), take deliberate steps to initialize cultural change in the form of commitments. The NFDI4Earth and FAIRagro FAIRness and Openness Commitments (https://doi.org/10.5281/zenodo.10123880, published in September 2024; https://doi.org/10.5281/zenodo.14925202 from February 2025) help to start conversations about changing the way that research data is collected, created, published, used, and recognised and request institutions to engage in the implementation and operation of FAIR RDM and related services. The signature of members and representatives of the respective communities signals agreement with the goals and values of the Commitments and with the consortias’ missions, products, and services. The signatories build a community of practice that takes into account diverse expertises, roles, and user groups for a sustainable shift towards more and diversified FAIR research outputs, and increasing adoption of Open Science and Open Research principles and practices.

The Commitments consist of two matching main statements and twelve supporting statements. The main statements are: (1) We commit to advance FAIRness and Openness in Earth System Science/Agricultural Sciences and beyond. (2) We value data infrastructures and data experts. The supporting statements concretise the engagement and give starting points for the implementation. Changes in the supporting statements enabled FAIRagro to incorporate community-specific aspects in its adoption of the NFDI4Earth Commitment. The NFDI4Earth and FAIRagro commitments have 8 and 7 institutional signatories, respectively, and 70 and 54 group or individual signatories, correspondingly (https://nfdi4earth.de/commitment, https://fairagro.net/en/commitment/).

In this work, we present the two Commitments and recap the process for their creation (cf. https://doi.org/10.5194/egusphere-egu23-14456), their differences, and lessons learned. We report on the interactions sparked by the Commitments with community stakeholders. We focus on the role of organisations and groups, because they are crucial to implement cultural change: they can set requirements, provide incentives for their members, and match these with supporting services and infrastructures. Specifically, we report from an exchange of experiences between representatives of institutional and group signatories from a workshop that connected institutions, created a space for open exchange, and laid a foundation for generalisable approaches.

How to cite: Nüst, D., Sennhenn, A., Seegert, J., Hübner, A., Vahabi, K., Hachinger, S., Möller, M., Hoffmann, C., Bernard, L., Anderson, J. M., Fischer, S., Reichstein, M., Weynants, M., Keßler, C., Koch, K., Wenz, K.-P., van Dam, N., and Regierer, B.: FAIRness and Openness Commitments as a catalyst for cultural change in research organisations, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-6870, https://doi.org/10.5194/egusphere-egu26-6870, 2026.

17:00–17:10
|
EGU26-14101
|
ECS
|
On-site presentation
Sarvenaz Ghafourian, Sean Tippett, and Chantel Ridsdale

Indigenous Data Sovereignty reflects the inherent rights of Indigenous Peoples to govern data relating to their communities, lands, and knowledge, while Indigenous Data Governance concerns how these rights are enacted within data systems. Translating this into practice within large-scale environmental data infrastructures remains a challenge.

Ocean Networks Canada (ONC) hosts long-term, near real-time coastal and oceanographic datasets that are widely reused across research, operational, and increasingly automated and machine-assisted workflows. In this context, ensuring that Indigenous governance expectations are clearly communicated and respected throughout the data lifecycle is critical. This work presents ONC’s ongoing efforts to implement Local Contexts Traditional Knowledge and Biocultural Labels and Notices as part of its research data management infrastructure, bridging ethical principles with operational practice.

We describe how Local Contexts information is being integrated into ONC’s metadata profiles, dataset landing pages, and persistent identifier workflows using established standards such as ISO 19115 and DataCite, making the metadata human- and machine-readable. This approach ensures that governance signals, including community-defined use expectations and restrictions, remain visible and interpretable to both human and machine users as data moves through downstream discovery platforms and reuse pathways.

This work is being undertaken as a pilot project and proof of concept, using ONC-owned datasets within the Local Contexts Test Hub. Due to capacity constraints faced by many Indigenous communities, full implementation with community-generated labels is not yet in place. Instead, this pilot allows ONC to explore technical integration pathways, identify challenges related to metadata standardization and machine-readability, and develop documentation, guidance, and technical support in advance. This approach is intentionally designed to ensure that, when communities are ready to engage, they are provided with clear resources and meaningful options for participation without undue technical burden.

This case study demonstrates how Indigenous Data Sovereignty can be meaningfully embedded into existing Earth science data infrastructures without compromising FAIR principles or interoperability. By operationalizing CARE-aligned governance within metadata and identifier systems, this work offers a practical, scalable model for repositories seeking to support ethical, transparent, and community-centred data reuse in the Earth and environmental sciences.

How to cite: Ghafourian, S., Tippett, S., and Ridsdale, C.: Embedding Indigenous Data Governance in Research Data Infrastructures through Local Contexts, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14101, https://doi.org/10.5194/egusphere-egu26-14101, 2026.

17:10–17:20
|
EGU26-22747
|
On-site presentation
Anna Kelbert, Alberto Accomazzi, Edwin Henneken, Kelly Lockhart, Jennifer Bartlett, and Michael Kurtz
The NASA-funded Science Explorer (SciX) is an open, curated information discovery platform for Earth and space science providing trusted access to interdisciplinary scientific resources. Developed as an extension of the Astrophysics Data System (ADS), a cornerstone of scholarly communication in astrophysics, SciX is designed to serve a broader scientific community, with a strong focus on supporting Earth science research, applications, and societal decision-making.
 
At the heart of SciX is a carefully curated database, where all indexed content (literature, datasets, and software) is sourced from reputable, authoritative providers. This ensures that users engage only with credible scientific information, making SciX a trusted environment for
discovery and decision support. The system integrates peer-reviewed research, preprints, conference and meeting abstracts, funded projects, mission and archival datasets, and software tools across domains, fostering connections between Earth and space sciences. This multidisciplinarity is essential for addressing complex societal challenges such as climate adaptation, disaster resilience, as well as larger research questions such as the origin of the solar system and the presence of life in the universe. The key ingredient that SciX is providing is a unified and precise, full-text search across these curated resources. We discuss our efforts to enrich these resources with common disciplinary and cross-disciplinary controlled vocabularies
to enhance findability and cross-disciplinary dialogue.
 
We also discuss our efforts to build a knowledge graph at SciX that connects the literature and the data and software resources, exposing the use of data and software in research and tracking the impact of these resources. In doing so, we hope to facilitate a cultural shift in the Earth and space science communities to streamline adoption of data and software citations, and to better align academic incentives with FAIR practices that have broad societal impact, such as metadata transparency, and resource accessibility and reuse.

How to cite: Kelbert, A., Accomazzi, A., Henneken, E., Lockhart, K., Bartlett, J., and Kurtz, M.: Fostering cross-disciplinary dialogue and credit attribution practices through Science Explorer, a digital library that tracks impact of literature, software and data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-22747, https://doi.org/10.5194/egusphere-egu26-22747, 2026.

17:20–17:30
|
EGU26-10344
|
On-site presentation
Otto Lange, Laurens Samshuijzen, Enoc Martinez, Javier Quinteros, Helle Pedersen, Angelo Strollo, Carine Bruyninx, Florian Haslinger, Marc Urvois, Danciu LAurentiu, and Anna Miglio

The Geo-INQUIRE* project concerns an initiative in which, in a cross-domain setting, the European ESFRI landmark environmental research infrastructures EPOS, EMSO, ECCSEL, the Center of Excellence for Exascale Computing ChEESE, and the ARISE infrasound community, exploit innovative techniques to meet their FAIR data ambitions. At EGU25 we informed the audience about the project’s data management objectives and the strategies that were applied to translate the abstract concept FAIRness into practices that could widely be adopted in a large heterogeneous landscape of data producers. Specifically, we demonstrated how we established a pipeline for the assessment of levels of FAIRness with the integration of the F-UJI tool. This Geo-INQUIRE FAIRness Assessment Pipeline (GiFAP) is in use now for a period of about two years, in which it has proven to be a valuable instrument for the ongoing evaluation of the FAIRness of multiple datasets over time. However, interpreting and comparing snapshots of the value collections is by no means trivial and must be managed and communicated with care.

Because the integration of an assessment tool like F-UJI at the time always involves the adoption of a solution which itself is under active development and as such can hinder the reproducibility of outcomes, special care must be taken with respect to the versions used of both the tool itself and the underlying metrics framework. It is also essential to understand the effect of choices made during repeated assessment across time on the FAIR scores and their subsequent interpretation. The practical use of the overall pipeline as a tool to guide improvements in the FAIRness of data, mainly by adapting and improving the metadata, has revealed valuable insights in the subtleties of applying the FAIR data concept in different communities and to different data types.

As an important real-world example of applying the FAIR concept in a complex dynamic data-lifecycle setting we will explain how we technically integrated the F-UJI instrument in the existing infrastructure. A special focus will be put on possible pitfalls and their solutions regarding versioning issues that naturally arise when comparisons will be made over a longer period of time. The importance of managing expectations, the dependency on data managers, and the interference with applications for long tail researchers will be discussed and we will explain how we covered these within the project. Finally, we will explain how the Geo-INQUIRE solution could be adopted for comparable scenarios. 

* Geo-INQUIRE is funded by the European Union (GA 101058518)



How to cite: Lange, O., Samshuijzen, L., Martinez, E., Quinteros, J., Pedersen, H., Strollo, A., Bruyninx, C., Haslinger, F., Urvois, M., LAurentiu, D., and Miglio, A.: FAIR Assessment in Geo-INQUIRE: Lessons Learned from Two Years of Experience, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-10344, https://doi.org/10.5194/egusphere-egu26-10344, 2026.

17:30–17:40
|
EGU26-13109
|
ECS
|
Virtual presentation
Barbara Riedler, Sophia Klaußner, Stefan Lang, and Khizer Zakir

The increasing availability of spatial data coupled with the utilization of artificial intelligence, makes it essential to focus on the evaluation of data quality. At the same time, the fragmentation of existing quality frameworks hinders the attainment of comparable assessment results. We introduce a novel, modular framework for the evaluation of geospatial data quality with particular emphasis on FAIRness, transferability, reusability and spatial consistency. The framework thereby accommodates data of differing data processing levels, types and contexts. The hierarchical structure integrates common quality dimensions (e.g., completeness, accuracy, consistency) with new dimensions emphasizing upstream validity (metadata, traceability of input data, reproducibility) and downstream usability (applicability, transferability). Additionally, the framework enables the evaluation of two interlinked concepts: general data quality (DQ) and data adequacy (DA). The latter incorporates the relevance of data and the fit to use case-specific requirements. DQ and DA are measured through a combination of machine-evaluable metrics and structured expert judgment, aggregated as indicators on dimension and domain level. The assessment protocol is implemented in form of a spreadsheet and a web-based survey tool. The overall objectives of this development are (1) to achieve harmonization of existing quality concepts to facilitate cross-disciplinary data integration; (2) to support data selection processes in geospatial applications which involve multiple data sources and/or time-critical situations, through the reusability of evaluation results; and (3) to leverage the reflected data usage and integration into operational workflows through the consideration of spatial uncertainties and the implementation of aspects of FAIRness.

How to cite: Riedler, B., Klaußner, S., Lang, S., and Zakir, K.: A harmonized, modular data quality framework facilitating cross-disciplinary usage and time-efficient evaluation of geospatial data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13109, https://doi.org/10.5194/egusphere-egu26-13109, 2026.

17:40–17:50
|
EGU26-16876
|
On-site presentation
Fernando Aguilar Gómez, Daniel García Díaz, Antonio López, Aina García-Espriu, and Cristina González-Haro

The Geospatial Open Science Yielding Applications (GOYAS) project, developed under the Horizon Europe OSCARS framework, demonstrates a comprehensive pathway from FAIR principles to operational practice for Earth observation (EO) data products. While the FAIR principles (Findable, Accessible, Interoperable, Reusable) are widely endorsed by research data communities, translating them into reproducible and scalable workflows across heterogeneous data providers remains challenging. This contribution presents concrete results and lessons learned from GOYAS project, which has developed and implemented a FAIR-by-design system that supports community adoption and cross-disciplinary data reuse.

At its core, GOYAS comprises a set of customized software components, including an automated data production pipeline, a georeferenced data repository, and an OGC-standard API endpoint. The data ingestion pipeline integrates automation that reduces the initial effort required from data producers to generate FAIR data, by automatically producing standardized metadata, provenance information, and quality metrics as a by-product of routine processing. This approach enables transparency, consistency, and long-term reuse across all stages of the data lifecycle. To enforce the “F” of Findability, persistent identifiers (PIDs) are minted for mature data products using EOSC-Beyond services, ensuring persistent, machine-actionable references and reliable data product traceability.

A key outcome of GOYAS is the implementation of a validation framework that acts as a prerequisite for the publication of final data products, whereby persistent identifiers are assigned only to validated outputs. Each product undergoes:

  • Metadata standard validation, ensuring compliance with agreed schemas and machine-readability requirements (ISO 19139);

  • INSPIRE alignment, verifying that spatial data components meet European geospatial interoperability standards;

  • FAIRness evaluation using FAIR EVA (Evaluator, Validator and Advisor), assessing the degree to which products comply with FAIR principles through automated tests.

Only when all validation checks are successfully passed is a product considered mature for publication and assigned a persistent identifier (PID), thereby guaranteeing discoverability and long-term referenceability within EOSC and beyond.

We discuss how FAIR-by-design principles were embedded at key architectural layers, including metadata generation, PID minting, and automated quality assessment, and how these design choices support not only technical interoperability but also community adoption. Lessons learned highlight the importance of early integration of FAIR requirements into workflow design, the practical challenges of harmonizing cross-domain standards (FAIR and INSPIRE), and the role of automation in enabling scalable FAIR implementations without imposing additional effort on data producers.

By providing a documented and operational model that combines FAIR principles, persistent identification, standards compliance, and automated validation, GOYAS advances the practical implementation of FAIR and open data management in environmental sciences and offers transferable insights for related research communities.

How to cite: Aguilar Gómez, F., García Díaz, D., López, A., García-Espriu, A., and González-Haro, C.: What happens when FAIR is built in from the start? Insights from the GOYAS Project, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16876, https://doi.org/10.5194/egusphere-egu26-16876, 2026.

17:50–18:00
|
EGU26-18884
|
On-site presentation
Kelsey Druken, Joshua Torrance, Romain Beucher, Martin Dix, Aidan Heerdegen, Paige Martin, Charles Turner, and Spencer Wong

Making research data Findable, Accessible, Interoperable and Reusable (FAIR) is now widely recognised as essential for open and reproducible science. In practice, however, translating FAIR principles into everyday data management remains challenging, particularly in climate modelling, which involves large data volumes and complex software and data environments on high-performance computing (HPC) platforms. Research rarely follows a simple path from data generation to publication, and FAIR is still often treated as a final, optional step rather than as a set of practices embedded and maintained throughout scientific workflows. 

We present a case study from Australia’s Climate Simulator (ACCESS-NRI) that examines how FAIR principles can be advanced through two complementary approaches applied in parallel. One focuses on the social and practical aspects of FAIR, supporting researchers to apply FAIR practices as part of their everyday research activities. The other centres on embedding FAIR directly into tools and processes, thereby reducing reliance on manual effort and helping to minimise the errors and inconsistencies that naturally arise in complex, collaborative environments. 

Through an open, merit-allocation based approach, ACCESS-NRI provides multiple data sharing pathways, from shorter-term spaces that support active development and collaboration to more curated, publication-ready datasets for longer-term access. This staged model supports the progressive application and uplift of FAIR practices as data are generated, shared, and refined over time, substantially streamlining later curation. Alongside this, we have also focused on improving the consistency and standardisation of ACCESS model outputs by embedding established community conventions and defined data specifications directly in the ACCESS software and release processes. This helps reduce variation across model outputs, supports reuse across tools and researchers, and shifts FAIR from a largely manual effort towards standard practice. 

This case study demonstrates how FAIR principles can be advanced through practical, community-aligned approaches that fit within real research contexts. For ACCESS-NRI, these efforts provide a foundation for tackling deeper FAIR data challenges, with lessons that are relevant to other Earth and environmental science domains facing similar constraints. 

How to cite: Druken, K., Torrance, J., Beucher, R., Dix, M., Heerdegen, A., Martin, P., Turner, C., and Wong, S.: Scaling FAIR Data Practices in Climate Modelling , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18884, https://doi.org/10.5194/egusphere-egu26-18884, 2026.

Posters on site: Fri, 8 May, 16:15–18:00 | Hall X4

The posters scheduled for on-site presentation are only visible in the poster hall in Vienna. If authors uploaded their presentation files, these files are linked from the abstracts below.
Display time: Fri, 8 May, 14:00–18:00
Chairpersons: Alice Fremand, Lesley Wyborn, Marco Kulüke
X4.85
|
EGU26-22121
Donna Scott, Siri Jodha Singh Khalsa, Shannon Leslie, Amanda Leon, Amy Steiker, and Ann Windnagel

Applying the FAIR (Findable, Accessible, Interoperable, and Reusable) principles to enable open and reproducible science is now a core goal across research communities. Yet, for well-established data centers and specialized domains, translating these principles into everyday, sustainable practice remains a significant challenge. Using the National Snow and Ice Data Center (NSIDC) as a case study—founded in 1976 as the World Data Center for Glaciology—we examine how legacy data holdings, evolving research practices, and emerging standards converge in the pursuit of FAIR-aligned stewardship.

This presentation highlights both progress and hurdles in modernizing four decades of passive microwave snow and ice data records from SMMR, SSM/I, and SSMIS sensors managed by the NSIDC Distributed Active Archive Center (DAAC) and NOAA@NSIDC data programs. Many of these data products predate mature standards for metadata, provenance, and interoperability standards, originally distributed in basic binary formats with limited documentation and access options. We describe efforts to migrate these legacy products to self-describing formats, enhance provenance, improve transparency, broaden accessibility and services, and align repository operations with contemporary expectations for FAIR and Open Science.

 

Equally important are the cultural and organizational shifts needed to foster engagement among  researchers, data producers, and data managers in adopting and refining best practices that serve the cryospheric community’s specific needs. We share strategies for balancing standardization with domain-specific requirements, and reflect on how lessons learned from cryospheric data stewardship may inform broader FAIR implementation across the Earth sciences. By sharing these experiences, we hope to contribute to interdisciplinary dialogue on building sustainable, community-driven data ecosystems that support open and reproducible scientific research.

How to cite: Scott, D., Khalsa, S. J. S., Leslie, S., Leon, A., Steiker, A., and Windnagel, A.: Translating FAIR Principles into Practice: Lessons from Four Decades of Cryospheric Data Stewardship, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-22121, https://doi.org/10.5194/egusphere-egu26-22121, 2026.

X4.86
|
EGU26-1564
Alice Fremand, Jens Klump, Sarah Manthorpe, Mari Whitelaw, France Gerard, Wendy Garland, Charles George, and Thabo Semong

The use of Remotely Piloted Aerial Systems (RPAS), also referenced as Uncrewed Aerial Vehicles (UAVs) and more generally as drones, is increasingly prevalent across various scientific disciplines, enabling the collection of large volumes of data for diverse research applications. These technologies are revolutionising data collection by offering higher temporal and spatial resolutions and enabling data collection in hazardous and inaccessible areas. However, the volume of data generated and the absence of standardised workflows to document operations and data processing often complicate data sharing and publication. 

As part of the Research Data Alliance (RDA) Small Uncrewed Aircraft and Autonomous Platforms Data Working Group, we have developed guidelines on how best to improve the Findability, Accessibility, Interoperability and Reusability (FAIR, Wilkinson et al. 2016) of these data and processing workflows. The working group compiled use cases showcasing RPAS applications across various research disciplines, documenting best practices and identifying gaps and challenges researchers have while handling their RPAS-derived data. We paid specific attention to legal, privacy and ethical considerations. Drawing on these insights, the group has now developed guidelines and recommendations to improve RPAS data management throughout the research life cycle, from mission planning to data publication and archiving, linking to existing resources and examples from the scientific community.

How to cite: Fremand, A., Klump, J., Manthorpe, S., Whitelaw, M., Gerard, F., Garland, W., George, C., and Semong, T.: Managing your drone data through the data life cycle: RDA guidelines for FAIR and responsible UAV Use, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-1564, https://doi.org/10.5194/egusphere-egu26-1564, 2026.

X4.87
|
EGU26-4891
|
ECS
Akash Koppa, Son Pham-Ba, Felix Bauer, Olivier Bonte, Oscar Baez-Villanueva, Reda El Ghawi, Alexander Winkler, Diego G. Miralles, Fabrizio Fenicia, Charlotte Gisèle Weil, and Sara Bonetti

Hybrid modeling, which integrates physics-based and machine learning (ML) components, is a growing research area in hydrology and the broader Earth Science community. By combining the interpretability of process-based models with the predictive power of data-driven algorithms, these hybrid architectures offer improved accuracy and representation of complex environmental processes. However, their adoption is currently constrained by significant challenges regarding FAIR principles (Findable, Accessible, Interoperable, Reusable) . Unlike traditional physics-based models, the reusability of hybrid systems is frequently hindered by the dynamic nature of ML components, which are inextricably linked to specific training datasets and hyperparameter configurations. Furthermore, existing data data and model repositories are rarely designed to host such models.

To address these systemic barriers, we collaboratively designed and implemented a standardized FAIR protocol specifically tailored for hydrological hybrid models. This framework, termed as FRAME, consists of three critical components: (a) a set of interoperability coding standards for the physics and ML modules, (b) a unified metadata specification that captures the disparate requirements of both physics-based parameters and ML architectures, and (c) a specialized online repository designed for the persistent hosting and sharing of integrated hybrid assets. To facilitate user adoption, we developed an associated command line interface (CLI) for automated retrieval and setup of these models. To ensure the long-term impact and scalability of this protocol, we are actively soliciting participation from the global hydrologic modeling community. By establishing a community-driven standard, this protocol aims to provide a robust foundation for the transparent, reproducible, and collaborative advancement of hybrid modeling in hydrology.

How to cite: Koppa, A., Pham-Ba, S., Bauer, F., Bonte, O., Baez-Villanueva, O., El Ghawi, R., Winkler, A., G. Miralles, D., Fenicia, F., Gisèle Weil, C., and Bonetti, S.: A FAIR Protocol for Hybrid Models and Data in Hydrology, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-4891, https://doi.org/10.5194/egusphere-egu26-4891, 2026.

X4.88
|
EGU26-5326
Bernadette Ni Chonghaile, Dick Schaap, and Aodhan Fitzgerald

AQUARIUS is an ongoing Horizon Europe funded project. An impressive range of 57 research infrastructure services is made available by Transnational Access (TA) Calls to include research vessels, mobile marine observation platforms, fixed marine facilities, experimental research facilities, river & basin supersites, aircraft, drones, satellite services, and sophisticated data infrastructures.

As a result of the TA projects, many new data sets in a large variety of data types are being collected by TA teams, using and combining multiple and different observation installations. A major aim of AQUARIUS is supporting the EU Mission to Restore our Ocean and waters by 2030, and other marine initiatives, including contributing to the European Digital Twin of the Ocean and the UN Decade for Ocean Sciences.

There is a strong effort in AQUARIUS to get the maximum return of investment from the TA activities. An open data policy has been adopted, implemented with a dedicated Data Management approach, to ensure that all gathered metadata and data are managed in line with the FAIR principles. They should become part of the repositories managed and operated by leading European data management infrastructures, such as SeaDataNet, EurOBIS, ELIXIR-ENA, ICOS-Ocean, and Copernicus INSTAC, for quality assurance, long term stewardship, and wide access and use. These infrastructures in turn are feeding into EMODnet, Copernicus Marine, Blue-Cloud (EOSC), Digital Twin of the Ocean (DTO) developments, and globally to e.g. GEOSS, and the UN-IOC Ocean Decade programme.

To achieve a maximum result, the TA scientific teams are being supported by data centres, experienced in marine data management, and well connected to the European data management infrastructures. Most of them are National Oceanographic Data Centres (NODCs). They provide training and coach the TA teams during the AQUARIUS data management flow scheme. This includes steps from planning to training to deployment to publishing, and a number of instruments. One of those is the AQUARIUS TA Data Summary Log App which is used by PIs of TA projects to keep an overview and index of the data collection events. It produces a list for the data centres to know what data to expect from where and who and as a checklist for the next steps. The AQUARIUS TA Data Summary Log contains only metadata and no data. As follow-up, the TA teams and assigned data centres will work on elaborating the collected data to prevailing standards and inclusion in the European repositories. That progress is made visible through the AQUARIUS Dataflow Dashboard (ADD), integrated in the AQUARIUS website. It follows the progress from planning stage through to publishing of results for each awarded TA project. The ultimate goal is to give discovery and public access to research data sets as collected and processed and data products as generated by the TA research teams as part of the AQUARIUS TA projects.

The presentation will provide more background information on the AQUARIUS project and will highlight more details about the data management approach.

How to cite: Ni Chonghaile, B., Schaap, D., and Fitzgerald, A.: AQUARIUS, Integrating Research Infrastructures, Connecting Scientists, and Enabling Transnational Access for Healthy and Sustainable Marine and Freshwater Ecosystems, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5326, https://doi.org/10.5194/egusphere-egu26-5326, 2026.

X4.89
|
EGU26-5581
Dick M. A. Schaap, Serge Scory, Steven Piel, and Thierry Schmitt

SeaDataNet is a major pan-European infrastructure for managing and providing access to marine data sets, acquired by European organisations from research cruises and other observational activities in European coastal marine waters, regional seas and the global ocean. Founding partners are National Oceanographic Data Centres (NODCs), and major marine research institutes. The SeaDataNet network gradually expanded its network of data centres and infrastructure, during a series of dedicated EU RTD projects, and by engaging as core data management infrastructure and network in leading European Commission initiatives such as the European Marine Observation and Data network (EMODnet), Copernicus Marine Service (CMS), and the European Open Science Cloud (EOSC).

SeaDataNet develops, governs and promotes common standards, vocabularies, software tools, and services for marine data management, which are widely adopted. A core service is the CDI data discovery and access service which provides online unified discovery and access to vast resources of data sets, managed by 115+ connected SeaDataNet data centres from 34 countries around European seas, both from research and monitoring organisations. Currently, it gives access to more than 3 Million data sets, originating from 1000+ organisations in Europe, covering physical, geological, chemical, biological and geophysical data, acquired in European waters and global oceans. Standard metadata and data formats are used, supported by an ever-increasing set of controlled vocabularies, resulting in rich and highly FAIR metadata and data sets. SeaDataNet provides core services in EMODnet Chemistry, Bathymetry, and Physics for bringing together and harmonizing large amounts of marine data sets, which are used by EMODnet groups for generating thematic data products.

EMODnet Bathymetry is active since 2008 and maintains a Digital Terrain Model (DTM) for the European seas. This is published every 2 years, each time extending coverage, and improving quality and precision. The DTMs are produced from surveys and aggregated data sets that are referenced with metadata via the SeaDataNet Catalogue services. Bathymetric survey data sets are gathered and populated by national hydrographic services, marine research institutes, and companies in the SeaDataNet CDI Data Discovery & Access service. Currently, this amounts to more than 45.000 datasets from 78 data providers. A major selection of these datasets has been used for preparing the 2024 release of the EMODnet DTM for all European waters and Caribbean, which has been published on the EMODnet portal. Currently, work is ongoing for a new 2026 version. 

The EMODnet DTM has a grid resolution of 1/16 * 1/16 arc minutes (circa 115 * 115 m), covering all European seas. It is based upon circa 22.000+ in situ datasets. It can be downloaded in tiles and viewed as map layers in the EMODnet portal. The maps are derived from EMODnet Bathymetry OGC WMS, WMTS, and WFS services. The EMODnet Bathymetry products are very popular and in 2024 – 2025 more than 100.000 EMODnet DTM files were downloaded, and more than 60 million OGC service requests were registered over the 2 years. EMODnet Bathymetry is also managing the European contribution to the international Seabed 2030 project.

How to cite: Schaap, D. M. A., Scory, S., Piel, S., and Schmitt, T.: SeaDataNet, pan-European infrastructure for marine and ocean data management and major pillar under EMODnet Bathymetry for generating the best Digital Bathymetry for European Seas   , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5581, https://doi.org/10.5194/egusphere-egu26-5581, 2026.

X4.90
|
EGU26-7777
Thomas Saillour and Panagiotis Mavrogiorgos

Accurate tide gauge records are essential for coastal monitoring, sea level analysis, and the calibration and validation of numerical models. However, global sea level data providers such as the Intergovernmental Oceanographic Commission (IOC)1 often contain inconsistencies related to vertical datums, step changes, sensor noise, and undocumented interventions, which limit their direct applicability for modelling and validation purposes.

We present ioc_cleanup (github.com/oceanmodeling/ioc_cleanup) , an open-source Python repository designed to clean tide gauge time series using a reproducible and transparent workflow defined in structured JSON files. All transformations are traceable, version-controlled using Git, allowing for consistent quality control, peer-review and community-driven improvements. The framework explicitly addresses common data quality issues, including spikes, sensor noise, sensor replacement or substitution, and step changes, as well as the challenge of distinguishing bad data from genuine physical events such as storm-driven sea level extremes or tsunamis.

The cleaned datasets have been used for the calibration and validation of a global barotropic model, revealing systematic data quality patterns across stations and regions. While the framework is applied here to sea level data, the methodology is provider-agnostic and applicable to other geophysical time series.

By formalising expert-driven flagging and corrections in a transparent manner, ioc_cleanup provides a foundation for future developments, including the potential use of machine learning techniques to assist data flagging, reduce operator subjectivity, and extend spatial and temporal coverage. The framework offers a scalable contribution to other datasets (such as GESLA42) and supports reproducible coastal data curation.

Citations:
[1] Flanders Marine Institute (VLIZ); Intergovernmental Oceanographic Commission (IOC) (2025): Sea level station monitoring facility. Accessed at https://www.ioc-sealevelmonitoring.org/ on 2025-12-15 at VLIZ. DOI: 10.14284/482

[2] Haigh, I.D., Marcos, M., Talke, S.A., Woodworth, P.L., Hunter, J.R. & Hague, B.S. et al. (2023) GESLA Version 3: A major update to the global higher-frequency sea-level dataset. Geoscience Data Journal, 10, 293–314. Available from: https://doi.org/10.1002/gdj3.174

How to cite: Saillour, T. and Mavrogiorgos, P.: Reproducible, transparent and traceable cleaning of IOC Tide Gauge Data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7777, https://doi.org/10.5194/egusphere-egu26-7777, 2026.

X4.91
|
EGU26-21811
|
ECS
Julie Auerbach and Quentin Crowley

Abstract Text (235 words):
Making marine geospatial data Findable, Accessible, Interoperable, and Reusable (FAIR) remains challenging for researchers and policy implementors, particularly in integrating geological and biological datasets for Special Areas of Conservation (SACs) management. This contribution shares experiences developing domain-specific FAIR workflows for west coast Ireland SACs (Porcupine Seabight, Belgica Mound, Inisheer Island), harmonizing INFOMAR multibeam data, EMODnet Geology, OBIS biodiversity, and Copernicus currents via the European Digital Twin Ocean (EDITO) and Destination Earth (DestinE) platforms (and others).

Seabed integrity metrics (e.g., Bedrock Suitability Index information) and substrate maps (85% accuracy, Random Forest classification) will be processed on available platforms, e.g., EDITO and DestinE HPC, post-QC for best possible and valid geometries and INSPIRE compliance. Biodiversity connectivity matrices (previous published work and code from the coastalNet R package will be cited and explored), pairwise probabilities e.g., 0.35 Belgica-to-Porcupine) overlay oceanographic simulations (e.g., ESRI EMUs), deposited as interoperable WMS layers on Figshare DOIs with plain-language metadata and APIs.

Specific challenges include integrating "dark" datasets and bridging technical-policy gaps; solutions involved AI-driven summarization, automated versioning, and user-centric pilots (e.g., co-design workshops, tracking download rates, policy citations). Additional challenges include alignment with MSFD thresholds (>25% degraded seabeds) and OSPAR goals fostered adoption, with sensitivity analyses (low BSI reduces connectivity 20-40%) potentially useful for informing trawling vignettes and conservation and restoration efforts (reefs on BSI>0.7).

This approach respects ocean science needs while promoting cross-disciplinary understanding and reuse (e.g., hydrology via sediment mobility), demonstrating cultural shifts through stakeholder panels and GDPR-compliant training toolkits. Outcomes advance RDA ESES goals by scaling FAIR practices for real-time AI dashboards, inviting dialogue on community-driven refinement.

How to cite: Auerbach, J. and Crowley, Q.: FAIR Marine Data Workflows for Policy: Unifying Seabed Integrity and Connectivity in Irish SACs via EDITO and DestinE, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21811, https://doi.org/10.5194/egusphere-egu26-21811, 2026.

X4.92
|
EGU26-19834
Megan Anne French, Blakeman Samantha, Alessandra Giorgetti, Hans Mose Hansen, Marina Lipizer, Maria Eugenia Molina Jack, Gwenaelle Moncoiffe, Anna Osypchuk, and Matteo Vinci

The European Marine Observation and Data Network (EMODnet) was established in 2009 and is proposed as the European Commission (EC) in situ marine data service of the EC Directorate-General Maritime Affairs and Fisheries (DG MARE). EMODnet represents a network of organisations providing free access to European marine data available as interoperable data layers and data products for seven themes: Bathymetry, Geology, Physics, Chemistry, Biology, Seabed habitats, and Human activities. EMODnet Chemistry makes aggregated data collections and products available for contaminants, eutrophication, and marine litter following the Findable, Accessible, Interoperable, and Reusable (FAIR) principles (Wilkinson et al., 2016); for instance, the use of standardised vocabularies supports findability, interoperability, and reuse. EMODnet Chemistry uses the standardised, hierarchically mapped vocabularies of the Natural Environment Research Council (NERC) Vocabulary Server (NVS, managed by the British Oceanographic Data Centre (BODC)) for indexing and annotating meta(data). For example, the BODC Parameter Usage Vocabulary (P01, https://vocab.nerc.ac.uk/search_nvs/P01/) is used to describe variables by providing detailed information on the target chemical object (S27 vocabulary) or property and the matrix/medium including phase, while the SeaDataNet Parameter Discovery Vocabulary (P02, https://vocab.nerc.ac.uk/search_nvs/P02/) and EMODnet Chemistry chemical groups (P36, https://vocab.nerc.ac.uk/search_nvs/P36/) are used to group P01s. Recently, working group activities evaluated EMODnet Chemistry vocabulary issues and needs and proposed improvements; for example, deprecating and replacing the P36 for polychlorinated biphenyls with a new P36 for organohalogens. Thus, some new P36 vocabularies were created/deprecated and the names and definitions of other P36 chemical groups were revised for correctness and to ensure that lower-level vocabularies could be mapped. This work resolved numerous mapping issues for EMODnet Chemistry, allowing all chemical substances to be mapped, making more data findable and interoperable in EMODnet. It also increased alignment with the vocabularies of the International Council for the Exploration of the Sea (ICES). Overall, these efforts improve EU marine data management and support alignment with other EU frameworks.

 

Reference

Wilkinson et al., 2016. The FAIR Guiding Principles for scientific data management and stewardship. 10.1038/sdata.2016.18

How to cite: French, M. A., Samantha, B., Giorgetti, A., Hansen, H. M., Lipizer, M., Molina Jack, M. E., Moncoiffe, G., Osypchuk, A., and Vinci, M.: EMODnet Chemistry and FAIR principles; evaluating and updating vocabularies, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19834, https://doi.org/10.5194/egusphere-egu26-19834, 2026.

X4.93
|
EGU26-14565
Erwann Quimbert and the ODATIS team

ODATIS, the ocean data hub within France's Data Terra research infrastructure, demonstrates how systematic progression from assessment through certification to innovation translates FAIR principles into sustainable community practices. Through three interconnected initiatives, ODATIS provides a replicable model for implementing FAIR while respecting domain-specific requirements.

Infrastructure Foundation

ODATIS operates through ten specialized Data and Service Centers (DSC) serving 130+ French research entities in physical oceanography, biogeochemistry, coastal observations, seafloor mapping, and marine ecosystems. This territorial network connecting national research infrastructure with local researchers provides the organizational foundation for systematic FAIR adoption. Two platforms anchor the infrastructure: SEANOE, a certified repository providing DOIs and preservation, and Sextant, a geographic catalog implementing ISO 19115 and OGC standards.

Assessment: The COPILOTE Project

Before imposing solutions, ODATIS assessed current capabilities through COPILOTE using the FAIR Data Maturity Model (FDMM). Evaluations revealed heterogeneous maturity levels and identified barriers: insufficient metadata, limited controlled vocabularies, unclear licensing, and inadequate provenance tracking. Participatory assessment engaged data managers and researchers in structured dialogue, transforming abstract FAIR concepts into concrete criteria. COPILOTE produced tailored improvement roadmaps demonstrating how standardized frameworks can respect institutional diversity while driving collective progress.

Certification: CoreTrustSeal Achievement

Building on assessment findings, ODATIS DSC pursued CoreTrustSeal certification, documenting organizational infrastructure, digital object management, and preservation capabilities. Successfully certified repositories including SEANOE achieved formal recognition of their trustworthiness, providing researchers with confidence in long-term data preservation and accessibility.

Innovation: The SO'Odatis Project

Funded by France's National Fund for Open Science, SO'Odatis develops integrated services making FAIR intrinsic to workflows. Four initiatives include: launching a diamond open-access journal linking publications with datasets and software; extending Sextant to catalog software with DOIs and Software Heritage integration; developing automated data paper generation from metadata; implementing comprehensive training through the correspondent network.

Cross-Disciplinary Lessons

ODATIS's journey demonstrates critical principles. Assessment before intervention reveals actual barriers and capabilities, preventing misdirected effort. Formal certification embeds FAIR into organizational culture beyond projects. Sustainable adoption requires reducing researcher burden through automation and workflow integration, not adding compliance tasks. Territorial networks enable bidirectional knowledge flow between infrastructure and communities. Critically, FAIR implementation is iterative, each phase builds on previous achievements while identifying new opportunities.

ODATIS offers a concrete roadmap: rigorous assessment identifies gaps; certification drives organizational maturity; innovation develops enabling tools; community engagement ensures relevance. This progression provides a replicable model for infrastructures translating FAIR principles into community-supported practices across Earth and environmental sciences.

How to cite: Quimbert, E. and the ODATIS team: From Assessment to Action: ODATIS's Progressive Journey Toward FAIR Implementation in Ocean Sciences, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14565, https://doi.org/10.5194/egusphere-egu26-14565, 2026.

X4.94
|
EGU26-7107
|
ECS
Rowan Orlijan-Rhyne, Lukas Kluft, and Tobias Kölling

The Barbados Cloud Observatory (BCO), in continuous operation by the Max Planck Institute for Meteorology, offers an extensive record of clouds in the trade wind region since its birth in 2010. In the form of public, analysis-ready zarr stores processed with automated workflows, the record can be studied at time scales from seconds to years and serves to drive theoretical and model advancements. As an important geoscientific research asset, data from the BCO is trustable, reproducible, and versioned, but also easily available.

BCO data processing employs Apache Airflow’s automated workflows which append to zarr stores whenever new data arrives. Management of dynamic and growing datasets—as opposed to static (e.g. campaign) datasets—permits many versions, all of which are accurate and can be automatically regenerated. In shepherding the data, we choose our own unique keys, including dataset version numbering, which make up an intake catalog. We also implement quality control of dataset metadata and encodings with in-house tools.

By allowing for rolling processing of the data, often at daily intervals, our products can be easily probed for scientific, technical, and other use. For instance, we develop a javascript viewer which allows users to quickly and easily visualize data from many instruments. Additionally, by providing raw (i.e. directly from the instrument, as format permits), time-aggregated, commonly gridded, and sitewide 'best estimate' datasets, we also iterate on levels of processing complexity for a host of needs. These usability advantages are consequences of our technical approach, namely automated workflows and analysis-ready zarr stores.

How to cite: Orlijan-Rhyne, R., Kluft, L., and Kölling, T.: Automated workflows for ever-growing, analysis-ready datasets at the Barbados Cloud Observatory, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7107, https://doi.org/10.5194/egusphere-egu26-7107, 2026.

X4.95
|
EGU26-21662
Ralf Kunkel, Marc Hanisch, Christof Lorenz, Ulrich Loup, David Schäfer, Thomas Schnicke, and Jürgen Sorg

In Earth sciences, there is an increasing demand for long-term observation data related to the hydrosphere, pedosphere, biosphere, and lower atmosphere across multiple spatial and temporal scales. In parallel, standardized methods to manage, find, access, provide interoperability, and reuse these data (FAIR) have been developed. Numerous centralized or distributed data infrastructures (thematic silos) exist, often with similar architectures but with a diversity of access methods, vocabularies for description, and frameworks for handling data and data flows.

DataHub is an initiative of the German Helmholtz Research Field Earth and Environment (E&U) with the aim of developing and operating a scalable, FAIR, and distributed digital research infrastructure to link research data from all compartments of the Earth system. By coordinating vocabularies, persistent identifiers (PIDs), and a common nomenclature across centres, DataHub ensures interoperability with national and international systems. The goal is the transition from isolated silos to interdisciplinary infrastructures. This is achieved by creating a community-driven digital research data ecosystem characterized by collaborative software development; the provision and use of products under a common open-source license model; a harmonized architecture of data management systems; connectivity of data via standardized interfaces (e.g., OGC STA, CSW, WMS); and, most importantly, the harmonization of data descriptions and data flows. As a first step, existing data infrastructures are integrated into the jointly developed DataHub environment.

TERENO (TERrestrial ENvironmental Observatories) is used as a reference implementation for the integration of an existing distributed data infrastructure into DataHub. TERENO is an interdisciplinary, long-term research program involving five centres of the German Helmholtz Association (FZJ, GFZ, UFZ, KIT, DLR). Running since 2008, it comprises an Earth observation network across Germany and provides long-term environmental data at multiple spatial and temporal scales to study the long-term impacts of land-use and climate change. It provides more than 3.3 billion observations from over 900 sites.

During the last decade, several drawbacks have been identified in the operation of TERENO, such as inhomogeneities in metadata describing measurement instrumentation and the observed data themselves. Moreover, different data quality routines and assessment schemes are applied.

How to cite: Kunkel, R., Hanisch, M., Lorenz, C., Loup, U., Schäfer, D., Schnicke, T., and Sorg, J.: Integration of TERENO into the DataHub Digital Ecosystem, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21662, https://doi.org/10.5194/egusphere-egu26-21662, 2026.

X4.96
|
EGU26-9152
Massimiliano Cannata, Daniele Strigaro, and Claudio Primerano

Sensor-based environmental monitoring is increasingly vital for research and decision-making, yet the current web standards used to share these data streams, such as the OGC SensorThings API (STA), do not fully support scientific reproducibility, data provenance, or data sovereignty. To meet reproducibility requirements, researchers often resort to downloading and archiving static snapshots of evolving time-series datasets, leading to unnecessary data duplication, loss of linkage with live sources, and inefficient data management.

IstSOS4Things (www.istsos.org) aims to close this critical gap by extending the STA standard with versioning and time-travel capabilities, enabling data auditing and persistent, immutable access to historical states of sensor observations through persistent URL. Much like Git allows access to past versions of code, the proposed STA-traveltime extension let users cite, query and extract the exact dataset used in a study, even years later.

This breakthrough addresses a long-standing limitation of geospatial web services and paves the way for fully FAIR (Findable, Accessible, Interoperable, Reusable) and reproducible research. In parallel, istSOS4Things introduces mechanisms for fine-grained access control embedded within the web service itself, empowering researchers and institutions to share their data in accordance with the principle of “as open as possible, as closed as necessary.” This helps overcome common hesitations for data sharing, ensuring trust, transparency, and legal compliance.

How to cite: Cannata, M., Strigaro, D., and Primerano, C.: istSOS4Things - FAIR & Open Source IoT platform for Open Science, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9152, https://doi.org/10.5194/egusphere-egu26-9152, 2026.

X4.97
|
EGU26-11300
Daniele Strigaro, Massimiliano Cannata, Claudio Primerano, and Andrea Salvetti

In Switzerland, short-duration and spatially concentrated rainfall events increasingly affect small catchments, where limited response times can lead to flash floods and debris flows with significant impacts on local infrastructure. These phenomena typically develop at spatial and temporal scales that are not fully captured by conventional meteorological monitoring networks.

Recent events in Southern Switzerland, including in the municipality of Lumino, have shown how localized precipitation can rapidly overload drainage systems and watercourses. Such situations highlight the need for rainfall observations with higher spatial density and minute-scale temporal resolution, able to complement regional forecasting and warning services.

National early warning systems, including those provided by MeteoSwiss, form a key component of flood risk management but may not resolve precipitation variability at local scales. To complement these systems, SUPSI and the Canton Ticino’s Ufficio dei corsi d’acqua (UCA) are testing a denser rainfall monitoring network based on rain gauges delivering one-minute data streams in near real time.

The monitoring infrastructure is designed according to FAIR data principles, ensuring that observations are findable, accessible, interoperable, and reusable. Data are managed through a cloud-based, event-driven architecture built on open geospatial standards, notably the OGC SensorThings API, implemented using the istSOS framework. Incoming data streams are processed on a computing cluster to derive cumulative rainfall indicators at multiple temporal scales (10-minute, hourly, and three-hourly), which are used to support threshold-based alerting mechanisms.

By combining high-resolution observations with open, standards-based data services, the system enables real-time visualization, automated notifications, and seamless integration with existing hydrological and risk management workflows. This approach demonstrates how FAIR-by-design monitoring infrastructures can bridge the gap between regional forecasts and local-scale observations, strengthening early warning capabilities and supporting more resilient flood risk management in a changing climate.

How to cite: Strigaro, D., Cannata, M., Primerano, C., and Salvetti, A.: FAIR-compliant infrastructure based on istSOS for high-resolution rainfall monitoring and alerting, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11300, https://doi.org/10.5194/egusphere-egu26-11300, 2026.

X4.98
|
EGU26-9202
Karsten Peters-von Gehlen, Kameswar Rao Modali, Florian Ziemen, Martin Bergemann, Christopher Kadow, Karl-Hermann Wieners, Siddhant Tibrewal, Ivonne Anders, Katharina Berger, Tobias Kölling, Lukas Kluft, Marco Kulüke, and Fabian Wachsmann

Climate science enterprise both produces and depends on extremely large datasets in order to meet the needs of diverse scientific and downstream user communities, especially as climate models are increasingly run at kilometre-scale resolutions, resulting in rapidly growing data volumes which increase demands on data handling infrastructures. Individual flagship simulations are no longer used by a single research group, but are routinely reused by dozens or even hundreds of researchers globally. Consequently, data findability, accessibility and reuse must be straightforward, data provenance must be transparent, and the full heritage of simulation data should be preserved in a machine-actionable manner to ensure scientific rigour, explainability and reproducibility.

In this contribution, we present a conceptual infrastructure-level approach developed within the WarmWorld project based on leveraging the versatility of globally unique persistent identifiers (PIDs) to address these challenges. Specifically, we illustrate that by assigning handles to simulation datasets already at the point of production, simulation data stored locally at a HPC data center can become part of a globally interoperable data ecosystem. In our concept, handle profiles contain an URL at which the dataset can be opened. Further, machine-actionable metadata, such as the detailed provenance information describing the employed model configuration or a data reuse license and citation, would be available from the handle landing page. Thus, the motivation behind the approach we follow here is akin to that of the FDO specifications.

Finalized simulation datasets would be exposed through globally accessible SpatioTemporal Asset Catalogs (STAC), where PIDs serve as the authoritative entry point for discovery and access. Data access would be handled by system libraries that resolve storage locations across heterogeneous storage tiers. Crucially, data access shall be designed to be globally open without the need for credentials, reflecting a strong demand from the climate research community, as clearly demonstrated during the WCRP kilometre-scale hackathon (May 2025).

Systematic assignment and pragmatic leveraging of handles assigned to locally stored datasets can thus enable scalable and interoperable access to flagship climate datasets across infrastructures and communities, effectively integrating traditionally closed HPC data environments into the global data space and facilitating interoperability with other large-scale data holdings.

How to cite: Peters-von Gehlen, K., Modali, K. R., Ziemen, F., Bergemann, M., Kadow, C., Wieners, K.-H., Tibrewal, S., Anders, I., Berger, K., Kölling, T., Kluft, L., Kulüke, M., and Wachsmann, F.: PID-Driven Global Access to Flagship km-scale Climate Simulation Data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9202, https://doi.org/10.5194/egusphere-egu26-9202, 2026.

X4.99
|
EGU26-3136
Mathias Bavay, Patrick Leibersperger, and Øystein Godøy

Automatic Weather Stations (AWS) deployed in the context of research projects provide very valuable point data thanks to the flexibility they offer in term of measured meteorological parameters and setup. However this flexibility is a challenge in terms of metadata and data management. Traditional approaches based on networks of standard stations struggle to accommodate these needs, leading to wasted data periods because of difficult data reuse, low reactivity in identifying potential measurement problems, and lack of metadata to document what happened.

The Data Access Made Easy (DAME) effort is our answer to these challenges. At its core, it relies on the mature and flexible open source MeteoIO meteorological pre-processing library. Originally developed for the needs of numerical models consuming meteorological data it has expanded as a data standardization engine for the Global Cryosphere Watch (GCW) of the World Meteorological Organization (WMO). For each AWS, a single configuration file describes how to read and parse the data, defines a mapping between the available fields and a set of standardized names and provides relevant Attribute Conventions Dataset Discovery (ACDD) metadata fields. Low level data editing is also available, such as excluding a given sensor, swapping sensors or merging data from another AWS, for any given time period. Moreover an arbitrary number of filters can be applied on each meteorological parameter, restricted to specific time periods if required. This allows to describe the whole history of an AWS within a single configuration file and to deliver a single, consistent, standardized output file possibly spanning many years, many input data files and many changes both in format and available sensors.

Through the EU project Arctic Passion, a web interface has been developed that allows data owners to manage the configuration files for their stations, refresh their data at regular intervals, inspect the data QA log files, receive notification emails and allow on-demand data generation. The same interface allows other users to request data on-demand for any time period.

How to cite: Bavay, M., Leibersperger, P., and Godøy, Ø.: Data Access Made Easy: flexible, on the fly data standardization and processing, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-3136, https://doi.org/10.5194/egusphere-egu26-3136, 2026.

X4.100
|
EGU26-11768
Nils Brinckmann and Markus Bradke

The rapid growth of Global Navigation Satellite System (GNSS) observations, driven by dense station networks, high-rate data streams, and the modernisation of satellite constellations places increasing demands on data centers in terms of scalability, reliability, and reproducibility. Traditional monolithic GNSS data management systems are often difficult to scale and adapt to evolving processing and analysis workflows. To address these challenges, we are developing a cloud-native GNSS data center architecture based on container orchestration and streaming technologies.

Our system is built on Kubernetes to enable flexible deployment, horizontal scalability, and fault tolerance of GNSS services. Data ingestion is handled through Apache Kafka, which provides a robust, high-throughput messaging backbone for streaming GNSS observations from heterogeneous sources. This approach decouples data producers and consumers, allowing independent scaling of ingestion, processing, and downstream analytics.

For long-term storage and analytical access, GNSS data are ingested via ETL pipelines into an Apache Iceberg data lakehouse. Iceberg provides schema evolution, partition management, and ACID (Atomicity, Consistency, Isolation, and Durability) guarantees, enabling efficient access to large, time-series GNSS datasets for both batch and interactive analysis.

System performance, data flow, and service health are continuously monitored using Prometheus, with operational and scientific metrics visualized through Grafana dashboards. This monitoring framework facilitates operational stability, performance optimization, and transparent reporting of data latency and availability.

We present the overall system design, implementation details, and initial performance results, and discuss how this architecture improves scalability, resilience, and reproducibility compared to conventional GNSS data centers. The proposed approach provides a flexible foundation for next-generation GNSS services and can be extended to other geodetic and Earth observation data streams.

How to cite: Brinckmann, N. and Bradke, M.: A Cloud-Native GNSS Data Lakehouse for Scalable Ingestion, Processing, and Analysis, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11768, https://doi.org/10.5194/egusphere-egu26-11768, 2026.

X4.101
|
EGU26-10308
|
ECS
Patrick Leibersperger, Mathias Bavay, Ionut Iosifescu Enescu, and Chase Núñez

Environmental research relies on seamless data exchange between institutions globally, but inade-
quate documentation and complex formats hinder collaboration. We introduce iCSV, a self-describing,
human-readable format that combines the simplicity of CSV with the metadata richness of NetCD-
F/CF. iCSV ensures long-term interpretability, interoperability and user accessibility, addressing
key challenges in environmental data stewardship. By embedding structured metadata directly in a
human-readable text file, iCSV enables automated validation, supports FAIR principles and lowers
the barrier to data sharing and reuse while ensuring data remains interpretable for future users and
maintaining broad compatibility with existing software. This work motivates the need for a simple,
self-describing tabular format for environmental time series, presents the iCSV specification, positions
it within existing binary and human-readable format ecosystems through comparative analysis, and
discusses current limitations with directions for future improvements.

How to cite: Leibersperger, P., Bavay, M., Enescu, I. I., and Núñez, C.: Interoperable CSV for Environmental Data Archival and Exchange– iCSV, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-10308, https://doi.org/10.5194/egusphere-egu26-10308, 2026.

X4.102
|
EGU26-18803
Markus Möller, Mahdi Hedayat Mahmoudi, and Paul Peschel

Making geospatial data FAIR requires more than metadata standardization - it demands transparent, structured reporting of data quality and uncertainty that allows researchers to assess fitness-for-purpose across diverse applications. Yet most FAIR implementations still treat quality as a generic metadata field, while uncertainty and fitness‑for‑purpose remain buried in narrative documentation and disciplinary tacit knowledge.

In the FAIRagro consortium, we operationalize an application‑oriented quality framework using the example of  Germany‑wide phenology time series (1 km, 1993-2022) by combining three components: (1) standardized producer‑side quality metrics (global R² and RMSE following ISO 19157‑1 for each crop, phase, and year), (2) spatially explicit local uncertainty layers, and (3) a machine‑actionable, application‑specific data quality matrix (AS‑DQM) that captures documented use contexts, validation strategies, limitations, and fitness‑for‑purpose statements from existing publications and workflow descriptions. Large Language Models (LLMs) are central to this workflow: after structure‑preserving conversion of PDFs to enriched Markdown, multimodal LLMs extract quality‑relevant concepts from text, tables, and figures, normalize them against a formal schema, and generate provenance‑linked AS‑DQM JSON profiles that can be queried and reused across applications.

These quality, uncertainty, and fitness profiles are then packaged as FAIR Digital Objects using interoperable containers (ARCs) for version‑controlled, reproducible workflows and RO‑CRATE standards for structured research object metadata - enabling seamless integration with research data management infrastructure and discovery systems. This approach ensures that quality reasoning, local uncertainty estimates, and application contexts travel together with phenology data through the research lifecycle, preserving provenance and enabling automated quality‑aware dataset selection.

This poster represents a transferable template for domain-specific FAIR implementation, demonstrating that structured uncertainty reporting, ISO-compliant quality metrics, LLM-assisted formalization of fitness-for-purpose information, and user-centered fitness-for-purpose assessments are essential bridges between abstract FAIR principles and practical, cross-disciplinary data reuse. For application, users can query not only "where are data FAIR?" but "where are data sufficiently accurate, well‑validated, and uncertainty‑constrained for this specific decision context?". By embedding LLM‑derived quality knowledge, uncertainty products, and an application matrix into machine‑actionable FAIR Digital Objects, we move from static compliance towards dynamic, evidence‑based fitness‑for‑purpose assessment - thereby strengthening trust in public data sets.

How to cite: Möller, M., Hedayat Mahmoudi, M., and Peschel, P.: Operationalizing Data Fitness-for-Purpose Through Standardized Metrics, Local Uncertainty, and LLM-Extracted Quality Reasoning , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18803, https://doi.org/10.5194/egusphere-egu26-18803, 2026.

X4.103
|
EGU26-18954
|
ECS
Gisela Romero Candanedo, Julia Wagemann, Sabrina H. Szeto, Emmanuel Mathot, Felix Delattre, Ciaran Sweet, James Banting, Sharla Gelfand, and Tom Christian

The European Space Agency (ESA), through the Earth Observation Processor Framework (EOPF), is reprocessing Sentinel-1, -2, and -3 archives into the cloud-optimised format Zarr. Through the EOPF Sentinel Zarr Samples Service, Sentinel data users can get early access to sample data in the new EOPF Zarr format.

The ESA-funded EOPF Toolkit project supports users transitioning from the legacy .SAFE Sentinel format to the cloud-optimised EOPF Zarr standard. The core development is EOPF 101, a comprehensive online resource designed to help users explore EOPF Sentinel Zarr data in the cloud. Through step-by-step and hands-on tutorials, Sentinel data users learn how to effectively use EOPF Sentinel Zarr products and build Earth Observation workflows that scale.

Chapter 1 - About to EOPF provides a high-level, easy-to-understand overview of the EOPF project by ESA. Chapter 2 - About EOPF Zarr provides a practical introduction to the cloud-optimised Zarr data format. It shows the benefits of the format, gives an overview of the data structure and includes performance comparisons with other formats. Chapter 3 - About Chunking provides an introduction to the chunking paradigm and lets users explore how to optimise their workflow. Chapter 4 - About EOPF STAC gives easy-to-understand practical examples on how to discover and access data with the EOPF STAC catalog. Chapter 5 - Tools to work with Zarr provides a collection of practical examples of languages, libraries and plug-ins that support users in working with data from the EOPF Samples Service. Chapter 5 - EOPF in Action is a collection of hands-on, practical end-to-end workflows featuring the use of EOPF Zarr data in different application areas.

Besides EOPF 101, the project had additional community engagement activities such as a notebook competition and a collaboration with Champion Users. The notebook competition took place between October 2025 and January 2026. During this period, the Sentinel data community was invited to try out the new EOPF Zarr data format themselves and share their workflows in the form of Jupyter Notebooks. The project further engaged with five organisations (Champion Users) to develop end-to-end workflows in different application domains

The EOPF Toolkit bridges the gap between data provision and practical application through three pillars of engagement: structured learning, expert guidance, and competitive innovation. While the EOPF 101  provides the foundational roadmap, Champion Users offer expert-level insights, and the notebook competition builds a library of community-sourced examples. Together, these initiatives create a feedback loop that transforms new adopters into active contributors, reducing the time-to-insight to the EOPF Zarr data format.

In this presentation, we will provide an overview of the community resources developed under the EOPF Toolkit and will share lessons learned from the community engagement activities.

How to cite: Romero Candanedo, G., Wagemann, J., H. Szeto, S., Mathot, E., Delattre, F., Sweet, C., Banting, J., Gelfand, S., and Christian, T.: EOPF Toolkit: Engaging the Sentinel community to adopt the EOPF Zarr data format, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18954, https://doi.org/10.5194/egusphere-egu26-18954, 2026.

X4.104
|
EGU26-5819
Heinrich Widmann, Andrea Lammert, Eileen Hertwig, Beate Krüss, Karsten Peters-von Gehlen, and Hannes Thiemann

The FAIR-by-design approach pursued by most repositories and data services today requires significant and sustained effort in the curation and quality assurance of both data and metadata. Beyond providing research data that complies with the FAIR principles, it is essential that the level of FAIRness is transparently apparent to users from the metadata prior to data access and download. FAIRness indicators benefit both data providers and reusers by rewarding high-quality curation and supporting informed data selection in  complex, data-intensive Earth System Science (ESS) workflows.

In practice, making FAIRness levels visible requires repository data managers to perform  FAIR evaluation, either through manual assessment or by using established FAIR assessment tools. At the World Data Center for Climate (WDCC) the fully automated F-UJI tool is applied in operational practice to assess and expose FAIRness levels across large collections of climate data.

F-UJI is a web based service that programmatically assess FAIRness of research data objects at the dataset level based on the FAIRsFAIR Data Object Assessment Metrics. Its   automated and machine-aided analytics are well suited for the large amounts of datasets archived in WDCC and reflect established repository practices such as the assignment of DataCite DOIs and the provision of rich, standardised metadata. At the same time, automated assessment relies on clearly machine-assessable criteria, and thus can not fully capture FAIR aspects that require human interpretation, such as reuse relevance or domain-specific semantics. In addition, FAIRness results depend on the machine-detectability of persistent identifiers resolving directly to datasets, which are not always available at higher levels of data collection hierarchies.

Based on our operational experience, we compare F-UJI results with other FAIR assessment approaches, building on findings from a previous comparative study evaluating FAIR assessment methods for WDCC datasets (Peters-von Gehlen et al., 2022). This comparison shows that automated, manual, and hybrid FAIR evaluation approaches each have distinct strengths: automated methods focus on standardised, machine-actionable criteria, while manual assessments capture contextual aspects relevant for data reuse; hybrid approaches combine these advantages and mitigate the limitations of purely automated or manual methods.

This poster shares practical experiences from conducting operational FAIRness assessment at a climate data repository and discusses benefits, limitations, and best practices of automated and hybrid FAIR evaluation approaches in Earth System Science.

How to cite: Widmann, H., Lammert, A., Hertwig, E., Krüss, B., Peters-von Gehlen, K., and Thiemann, H.: Making FAIRness Visible: Practical FAIR Assessment for Earth System Science Data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5819, https://doi.org/10.5194/egusphere-egu26-5819, 2026.

X4.105
|
EGU26-20356
Kirsten Elger, Alexander Brauser, Holger Ehrmann, Ali Mohammed, and Melanie Lorenz

In the geosciences, most research results are supported by data. These data are measured, collected, generated or compiled by humans or machines (including numerical modelling) and they represent an increasingly important part of the research outcome. They should be made available and shared in openly in a reusable format wherever possible, while fully acknowledging the contributions of the individual researchers and institutions that collected or generated the data.

Research data repositories are permanent archives that provide access to data, metadata to related physical samples, as well as scientific software. An increasing number of repositories are assigning digital object identifier (DOI) to the data stored in their archives. The range of services offered includes fully self-service DOIs at large generic repositories, to institutional repositories that are open to institutional members only, and curated data publications by domain repositories specialising in data from a specific scientific field.

The involvement of skilled data curators, who are often also domain researchers, makes domain repositories the preferred destination for the publication of well-documented and reusable data. The generic metadata required for DOI registration is complemented by extensive, domain-specific metadata properties, such as the information on the temporal and geospatial domains, mineral or rock names, instruments and analytical methods. Ideally, this information derives from embedded controlled vocabularies or ontologies, which increase the discoverability of the data for humans and machines. During curation, author information is also supplemented with ORCID and ROR identifiers, and the published data is digitally connected to related research articles, datasets, software, and the physical samples from which the data were obtained. However, they are facing challenges due to insufficient staff to uphold these high publication standards. Unfortunately, the resulting delay in processing requests directs many researchers to generic repositories offering self-service DOIs that do not provide any data curation.

To address these challenges, GFZ Data Services provides intuitive tools for collecting rich metadata (metadata editors), data description templates with extensive explanations and online instructions on recommended file formats, for example. These tools enable researchers to provide high-quality metadata from the outset, thereby reducing the workload and time required for data curation.

In November 2025, GFZ Data Services launched ELMO, the fully revised and modernised version of our metadata editor. ELMO is not only a new web interface, but also contains many new features that improve the quality of metadata and the FAIRness of the data it describes, while simplifying the entry of information for researchers. For example, authors' names and institutions can be automatically entered by entering the ORCID; affiliations can be selected from a drop-down menu linked to the Registry of Research Institutions (ROR); and the controlled, linked data vocabularies already in use (e.g., GCMD and geosciML) are directly connected to the vocabulary services API, thus ensuring they are always up to date.

This presentation will outline the advantages and disadvantages of domain repositories, and introduce our new metadata editor ELMO.

How to cite: Elger, K., Brauser, A., Ehrmann, H., Mohammed, A., and Lorenz, M.: The role of domain repositories in sustaining high-quality data publications: researcher-oriented tools and strategies under limited resources , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-20356, https://doi.org/10.5194/egusphere-egu26-20356, 2026.

X4.106
|
EGU26-21042
Mahdi Hedayat Mahmoudi and Markus Möller

Making research data Findable, Accessible, Interoperable, and Reusable (FAIR) is widely recognised as essential for open and reproducible science. However, researchers often face a gap between FAIR-compliant datasets and data that are actually fit for specific scientific or operational applications. This gap arises because data quality is inherently application-dependent, while critical assumptions, limitations, and uncertainty characteristics are frequently documented only implicitly across publications, dataset metadata, and workflow descriptions. 

We present a document-driven, application-oriented approach to data quality assessment developed within the FAIRagro initiative. 
The method uses the \textbf{Application-Specific Data Quality Matrix (AS-DQM)}, which systematically captures reasoning linking documented data characteristics—such as spatial and temporal resolution, validation strategies, and known limitations—to application requirements and explicit fitness-for-Purpose statements (\href{https://zenodo.org/records/17981173}{FAIRagro resources}). Rather than computing new quality metrics, the AS-DQM formalizes existing knowledge already generated by research communities, reduces barriers to adoption, and supports responsible data reuse. 

The approach is illustrated using a Germany-wide phenology time series as a pilot example. By analysing dataset documentation together with a concrete phenology-based scientific studies, the AS-DQM demonstrates how application-specific quality requirements—such as acceptable temporal uncertainty, spatial aggregation assumptions, and suitability for regional-scale analyses—can be systematically extracted and made explicit. Comparing the resulting application-level quality profile with the dataset-level documentation shows how fitness-for-Purpose emerges from the interaction between data characteristics and application context, highlighting cases where datasets are conditionally suitable or explicitly unsuitable for specific analyses. 

We discuss strengths, limitations, and adoption challenges of document-driven, application-oriented data quality reasoning, emphasizing its broad relevance across Earth and environmental sciences and its role in fostering sustainable, community-driven FAIR data practices.

How to cite: Hedayat Mahmoudi, M. and Möller, M.: From FAIR Principles to Fitness-for-Purpose: Document-Driven, Application-Oriented Data Quality in Agrosystem Research, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21042, https://doi.org/10.5194/egusphere-egu26-21042, 2026.

Posters virtual: Mon, 4 May, 14:00–18:00 | vPoster spot 1b

The posters scheduled for virtual presentation are given in a hybrid format for on-site presentation, followed by virtual discussions on Zoom. Attendees are asked to meet the authors during the scheduled presentation & discussion time for live video chats; onsite attendees are invited to visit the virtual poster sessions at the vPoster spots (equal to PICO spots). If authors uploaded their presentation files, these files are also linked from the abstracts below. The button to access the Zoom meeting appears just before the time block starts.
Discussion time: Mon, 4 May, 16:15–18:00
Display time: Mon, 4 May, 14:00–18:00
Chairperson: Filippo Accomando

EGU26-20391 | Posters virtual | VPS21

A Scalable, FAIR‑Aligned Data Lake Architecture for Earth System Modelling: From Heterogeneous Raw Archives to Curated, Metadata‑Rich, Analysis‑Ready Climate Data 

Bushra Amin, Jakob Zscheischler, Luis Samaniego, Jian Peng, Almudena García-García, and Toni Harzendorf
Mon, 04 May, 14:09–14:12 (CEST)   vPoster spot 1b

Modern Earth system research relies on integrating heterogeneous datasets such as reanalysis, satellite observations, in situ measurements, climate model ensembles, and reforecasts, yet these data are often stored in fragmented, inconsistent, and difficult to reuse forms. This limits reproducibility, slows modelling workflows, and constrains the development of operational digital twins for water and climate risk management.

This contribution presents a scalable, FAIR aligned data lake architecture implemented on the EVE high performance computing environment. The system transforms a large, unstructured source pool of more than two million files into a curated, duplication free, metadata rich repository designed for hydrological modelling, machine learning, and climate analytics. The architecture follows a four stage lifecycle: raw, curated, database ready, and ancillary GIS layers, reflecting data governance practices used by major climate centres.

A reproducible ingestion workflow classifies, deduplicates, and standardizes datasets from ERA5, ERA5 Land, MERRA 2, PRISM, E OBS, GPM IMERG, CMIP6, ISIMIP3, ECMWF reforecasts, MODIS, CHIRPS, GFED, GRDC, GSIM, and other sources. A Python based metadata extractor, built on CF convention standards, automatically captures variables, units, dimensions, spatial resolution, temporal coverage, coordinate reference systems, and checksums. Metadata are stored both as dataset level JSON and as a global inventory, enabling transparent provenance tracking and rapid dataset discovery.

The curated data hub is implemented under /data/db/earth_system and organized by scientific domain, temporal resolution, spatial extent, and processing stage. The system supports SLURM based workflows, HPC native processing, and cloud optimized formats such as Zarr.

This work demonstrates how a single researcher can design and operationalize a modern, HPC native data infrastructure that accelerates hydro climate research and forms the backbone of an emerging Digital Hydro Twin. The approach is transferable to institutions seeking to modernize their data ecosystems and improve reproducibility in environmental modelling.

How to cite: Amin, B., Zscheischler, J., Samaniego, L., Peng, J., García-García, A., and Harzendorf, T.: A Scalable, FAIR‑Aligned Data Lake Architecture for Earth System Modelling: From Heterogeneous Raw Archives to Curated, Metadata‑Rich, Analysis‑Ready Climate Data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-20391, https://doi.org/10.5194/egusphere-egu26-20391, 2026.

Please check your login data.