HS2.4.2 | Large-Sample Hydrology: Advancing dataset developments, enhancing process understanding, and unifying insights through catchment modeling
EDI PICO
Large-Sample Hydrology: Advancing dataset developments, enhancing process understanding, and unifying insights through catchment modeling
Convener: Thiago NascimentoECSECS | Co-conveners: Martina Kauzlaric, Paul C. AstagneauECSECS, Adriane HövelECSECS, Ryoko ArakiECSECS
PICO
| Mon, 04 May, 08:30–12:30 (CEST)
 
PICO spot A
Mon, 08:30
Large-sample hydrology (LSH) datasets are crucial for understanding and predicting hydrological variability. These datasets have grown to encompass a range of hydrological conditions across time and space, facilitating research on a wide variety of topics. This includes testing hypotheses of hydrological theories, exploring uncertainties in data and models, and enabling predictions in ungauged basins. This session highlights recent advances in LSH, with a focus on the development of datasets, the organization and synthesis of hydrological processes, modeling approaches, and improved understanding of hydrological variability. We welcome abstracts that contribute to the field, particularly (but not exclusively) on the following topics:
1. Development and improvement of large-sample datasets:
How can we address current challenges, such as uneven geographical representation, uncertainty quantification, catchment heterogeneities and human interventions, for fair comparisons among datasets? How can we foster the harmonization of large-sample datasets? How can we expand existing datasets to include spatial and temporal higher-resolution data? How can we test the representativeness of the available samples? How can we (systematically) represent human influences in large-sample datasets?
2. Increase our process understanding:
How can we use large samples of catchments to transfer hydrological theories and understandings from well-monitored or experimental catchments to data-scarce catchments? Can we use large-sample datasets to draw improved perceptual models and better define hydrological similarity?
3. Advance catchment modeling:
How can we improve process-based and data-driven modeling by using large samples of catchments? How can functional information and knowledge from gauged catchments be learned and applied to ungauged or data-scarce regions? How can we develop new models and workflows to infer hydrological response under changing environmental conditions, particularly those influenced by human activities?
4. Hydrological synthesis:
How can we use catchment descriptors available in large-sample datasets to infer dominant controls for relevant hydrological processes? Do we need the definition of new catchment descriptors or the inclusion of new variables to further improve catchment characterization? How can we improve our classification of catchments, their connectivity and processes?

PICO: Mon, 4 May, 08:30–12:30 | PICO spot A

PICO presentations are given in a hybrid format supported by a Zoom meeting featuring on-site and virtual presentations. The button to access the Zoom meeting appears just before the time block starts.
Chairpersons: Thiago Nascimento, Paul C. Astagneau
08:30–08:35
Development of new datasets and services
08:35–08:45
|
PICOA.1
|
EGU26-11939
|
solicited
|
On-site presentation
Franziska Clerc-Schwarzenbach, Marc Vis, Ilja van Meerveld, and Jan Seibert

The field of large-sample hydrology is developing at a rapid pace. The increasing availability of hydrometeorological data and information on catchment attributes is primarily thanks to the authors of the various CAMELS datasets and similar datasets, who have put tremendous efforts into creating these resources. This progress has enabled the use of datasets from various regions to conduct large-sample studies. For some regions, large-sample datasets are becoming available at sub-daily resolutions, which will further expand the possibilities for studies in large-sample hydrology. 

Many of these datasets offer multiple time series for a certain variable: for example, precipitation time series originating from different sources, or potential evapotranspiration time series calculated using different equations. Furthermore, there is a considerable number of catchments that are represented in multiple large-sample datasets, either because there is more than one dataset for a particular country (which is the case for Brazil) or because they are included in overarching datasets such as Caravan or EStreams. While this wealth of data is a real treasure, it also poses significant challenges to users of large-sample hydrological data because decisions need to be taken on what data to use (and for what reason). Furthermore, questions on the reliability of the different data are inevitable. Many users end up doing individual data checks on their own or taking more or less random decisions. This reduces the comparability between different studies. 

In this contribution, we present examples of the challenges that arise when working with large-sample hydrological data. We show the results of comparisons between different data sources that are meant to represent the same variable and how these affect model simulations. The presentation aims to stimulate discussion about a more uniform approach to making decisions on which data to use when working with large-sample hydrological datasets. 

How to cite: Clerc-Schwarzenbach, F., Vis, M., van Meerveld, I., and Seibert, J.: So many catchments, so many choices: challenges with large-sample hydrological datasets  , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11939, https://doi.org/10.5194/egusphere-egu26-11939, 2026.

08:45–08:47
|
PICOA.2
|
EGU26-6543
|
ECS
|
On-site presentation
Ashim Maharjan, Alexander Dolich, Patrick Ludwig, Jens Kiesel, Uwe Ehret, and Ralf Loritz

Non-stationarity has long been recognized as a fundamental challenge in hydrological modelling, as climate change and human activities continuously alter catchment properties and the boundary conditions under which hydrological systems operate. However, translating this long-standing recognition into systematic model evaluation remains challenging, as suitable large-sample hydrological datasets that explicitly represent temporal change are still scarce. Most existing datasets adopt a static design, with time-invariant catchment attributes and hydro-meteorological time series limited to retrospective observations, which constrains the systematic testing of hypotheses and models targeting non-stationary hydrological behaviour. Here, we introduce Changing-CAMELS, a pan-European, CAMELS-style dataset explicitly designed to support non-stationary hydrological modelling. Building on the strengths of existing datasets such as CAMELS, Caravan, and EStreams, the developed dataset moves beyond static representations by incorporating time-varying catchment attributes and future climate forcing on a European scale.

In particular, dynamic land-use and land-cover changes are derived from the European LUCAS dataset, providing annual updates to vegetation and land-cover fractions. Simultaneously, the inclusion of both raw and bias-corrected daily-resolution regional and global climate model datasets extends hydro-meteorological forcing beyond the historical period. This integration enables consistent analyses of evolving land cover, shifting climate regimes, and their combined impacts on hydrological responses. Changing-CAMELS covers 4,575 catchments across Europe and harmonizes observations and attributes across national boundaries. By providing both retrospective and prospective information within a unified framework, the dataset allows researchers to systematically evaluate competing hypotheses, compare stationary and non-stationary model formulations, and assess the robustness and uncertainty of hydrological models under climate and land-use change.

How to cite: Maharjan, A., Dolich, A., Ludwig, P., Kiesel, J., Ehret, U., and Loritz, R.: Changing-CAMELS: A pan-European dataset for non-stationary hydrological modeling under climate and land-use change, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-6543, https://doi.org/10.5194/egusphere-egu26-6543, 2026.

08:47–08:49
|
PICOA.3
|
EGU26-9503
|
ECS
|
On-site presentation
Adriane Hövel, Andreas Hartmann, Shijie Jiang, Hongli Liu, and Larisa Tarasova

Streamflow and springflow events contain essential information on how catchments store and release water as a response to incoming precipitation. When evaluated together with the corresponding hydro-meteorological event conditions, they can reveal the dominant runoff generation processes in a catchment. However, there is currently no consistent, objective approach for identifying and assessing such events on a global scale. To provide an open-access, global inventory of events and their generation processes to the hydrological community, we first employ an objective automated event identification method (Giani et al., 2022) and adapt the algorithm to different climatic conditions. For each of the identified events, we calculate event-scale hydrometric signatures (e.g., event runoff coefficients). In a second step, we classify all identified events based on their hydro-meteorological conditions (e.g., snowmelt, intensive rainfall). To do this, we set up a deep learning model for each catchment and predict streamflow events using observed hydro-meteorological information (precipitation and temperature) and global simulations of a hydrological model for soil moisture and snowmelt. We use explainable machine learning to reveal the importance of each of the predictors during the event build up period and to infer the corresponding generation process for each of the identified events (e.g., snowmelt-induced event, rainfall-induced event during wet antecedent conditions). The global inventory of streamflow and springflow events can provide useful process-oriented information based on event-scale signatures for the evaluation of large-scale hydrological models. Furthermore, it potentially serves as a basis for a more effective parameter regionalization based on similarity of dominant hydro-meteorological event conditions across different locations.

Giani, G., Tarasova, L., Woods, R. A., & Rico‐Ramirez, M. A. (2022). An objective time‐series‐analysis method for rainfall‐runoff event identification. Water Resources Research, 58(2), e2021WR031283. https://doi.org/10.1029/2021WR031283

How to cite: Hövel, A., Hartmann, A., Jiang, S., Liu, H., and Tarasova, L.: A global inventory of streamflow and springflow events and their generation processes, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9503, https://doi.org/10.5194/egusphere-egu26-9503, 2026.

08:49–08:51
|
PICOA.4
|
EGU26-3253
|
Highlight
|
On-site presentation
Oleg Zlydenko, Rotem Mayo, Moral Bootbool, Frederik Kratzert, Amitay Sicherman, Ido Zemach, and Deborah Cohen

A critical barrier to advancing large-sample hydrology and global risk assessment is the absence of a comprehensive, high-resolution historical event dataset. Existing resources are often geographically constrained, lack temporal or spatial precision, or are too sparse to support robust global synthesis. To address this gap, we introduce Groundsource, a novel, large-scale global dataset of historical flood events automatically constructed from diverse online news sources. By leveraging Google’s unique web page annotation capabilities and Gemini's natural language processing, we developed a pipeline to systematically identify and structure information about real-world flood events.
Our methodology first filters millions of news articles to isolate reports of actual, past floods, distinguishing them from warnings, policy discussions, and articles that mentions floods in other contexts. For each relevant article, we prompt Gemini to extract the specific dates and locations of the flooding. This structured data is then geocoded and aggregated to produce the Groundsource dataset. The dataset contains ~800,000 events with an estimated 75% precision.
While acknowledging the limited accuracy of LLM-based data extraction, and the inherent limitations of a news-based approach — such as recency-, population-, and coverage-bias — Groundsource represents a significant leap forward in data availability. As a publicly available, open resource covering over 100 countries, it provides a tool of unprecedented scale. Groundsource enables the research community to investigate global flood seasonality and temporal trends, to synthesize the socio-hydrological footprint of extreme events worldwide, to train data-driven models and to validate global flood forecasting systems. 

How to cite: Zlydenko, O., Mayo, R., Bootbool, M., Kratzert, F., Sicherman, A., Zemach, I., and Cohen, D.: Groundsource - a Gemini constructed dataset of real world flood events from news, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-3253, https://doi.org/10.5194/egusphere-egu26-3253, 2026.

08:51–08:53
|
PICOA.5
|
EGU26-9793
|
On-site presentation
Simon Moulds, Thiago Nascimento, Ryan Riggs, George Allen, and Frederik Kratzert

Large-sample hydrology datasets (e.g. CAMELS) provide structured hydro-meteorological time series data together with time-varying and static catchment attributes. They are fundamental to modern hydrological analysis, supporting hypothesis testing, model development, and the synthesis of generalisable hydrological insights across large and heterogeneous sets of river basins. However, the present generation of large-sample datasets have several shortcomings. First, they are difficult to update as new information becomes available. In addition, they often provide only a small subset of the variables collected at hydrometric gauging sites and usually exclude sub-daily data, while inconsistent naming conventions across the various datasets make data integration challenging. Finally, they may not include the quality flags that are often assigned to individual measurements by the measuring authority. To address these issues, we present RivRetrieve-Python (https://github.com/kratzert/RivRetrieve-Python), a new open source library that provides access to streamflow, stage and river temperature from more than 18 hydrometric APIs with more than 60 000 gauge stations in total at the time of writing (January 2026). An object-oriented design abstracts the implementation details of hydrometric APIs to provide users with a consistent interface irrespective of the data source or variable. We provide helper functions to simplify data retrieval from multiple catchments at once. We suggest that RivRetrieve-Python will streamline global to continental hydrological analysis and enable future research on real-time river monitoring and digital twins, hydrological prediction, and sub-daily hydrological variability and extremes. 

How to cite: Moulds, S., Nascimento, T., Riggs, R., Allen, G., and Kratzert, F.: RivRetrieve-Python: A Python package for facilitating and unifying access to global streamflow data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9793, https://doi.org/10.5194/egusphere-egu26-9793, 2026.

08:53–08:55
|
PICOA.6
|
EGU26-13662
|
On-site presentation
Boen Zhang, Michel Wortmann, Yinxue Liu, Simon Moulds, and Louise Slater

Global hydro-environmental databases provide essential information for large-scale hydrological, ecological, and environmental analyses. Most existing global databases are built upon convergent river representations that do not explicitly capture bifurcating river systems. In addition, these databases primarily rely on long-term climatology or static summaries of environmental conditions derived from legacy datasets, limiting their applicability for analyses of hydroclimatic and geomorphological processes. Here we present GRIT-ADB, a new global hydro-environmental database tied to the vectorised Global River Topology (GRIT) database at 30 m resolution that provides a topology-explicit and physically realistic representation of river networks including divergent flow pathways. GRIT-ADB provides standardised hydro-environmental information for 19.6 million km of global streams and rivers. The database comprises around 60 time-varying and static attributes spanning five categories: hydrology, physiography, climate, land cover and use, and soils and geology. Hydro-environmental attributes are derived by aggregating and harmonising data from state-of-the-art global datasets and are accumulated along the river network from headwaters to basin outlets while preserving the topology of divergent flow pathways. The attributes are linked to multiple GRIT scales, including hierarchically-nested subbasins, individual 1km river reaches, and coarser-scale river segments (several kilometres long). By combining a standardised attribute framework with explicit representation of bifurcating river hydrography, GRIT-ADB enables improved large-scale analyses of river connectivity, hydrological extremes, hydro-ecological processes, and environmental change in complex river systems, supporting a wide range of global hydrological and environmental applications.

 

How to cite: Zhang, B., Wortmann, M., Liu, Y., Moulds, S., and Slater, L.: GRIT-ADB: A Global Hydro-Environmental Attributes Database for the GRIT Hydrography, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13662, https://doi.org/10.5194/egusphere-egu26-13662, 2026.

08:55–08:57
|
PICOA.7
|
EGU26-7752
|
ECS
|
On-site presentation
Maria H. Grundmann, Camille Heubi, Corentin Chartier-Rescan, Corinna Frank, Giulia Bruno, Paul C. Astagneau, and Manuela I. Brunner

River water temperature affects water quality, ecosystem functions, and the useability of water for humans. While many data sources for water temperature time series exist at regional or national levels, retrieving and preprocessing such data for large-scale and -sample studies is often time-consuming. Therefore, studies have mainly focussed on local or regional scales, leading to widespread knowledge gaps regarding large-scale river water temperature variability and impacts. The recent push towards making large-sample hydrological data available, e.g. streamflow[1,2,3], has improved our understanding of processes and trends across hydrologically diverse regions, yet most of these datasets do not include water temperature.  

With TempER (Temperatures in European Rivers), we present a large-sample, long-term and high-resolution dataset of river water temperature across Europe. We provide daily water temperature data from 4757 stations, covering up to 72 years, alongside streamflow data where available. We also provide catchment outlines and catchment aggregated land-surface attributes, such as land cover, geology and topography, as well as meteorological time series for these stations. We provide water temperature regime indices for all the stations in our dataset, and the raw data where allowed. To enable updates of this dataset, we provide detailed information on how to retrieve data from over 67 sources in 26 countries. This dataset will pave the way for research projects that improve our understanding of water temperature trends, patterns and extremes across large spatial domains through analysis and modelling.  

 

 

[1] Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293–5313, https://doi.org/10.5194/hess-21-5293-2017, 2017. 

[2] Kratzert, F., Nearing, G., Addor, N. et al. Caravan - A global community dataset for large-sample hydrology. Sci Data 10, 61 (2023). https://doi.org/10.1038/s41597-023-01975-w 

[3] do Nascimento, T.V.M., Rudlang, J., Höge, M. et al. EStreams: An integrated dataset and catalogue of streamflow, hydro-climatic and landscape variables for Europe. Sci Data 11, 879 (2024). https://doi.org/10.1038/s41597-024-03706-1

How to cite: Grundmann, M. H., Heubi, C., Chartier-Rescan, C., Frank, C., Bruno, G., Astagneau, P. C., and Brunner, M. I.: TempER: A large-sample dataset of temperature in European rivers , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7752, https://doi.org/10.5194/egusphere-egu26-7752, 2026.

08:57–08:59
|
PICOA.8
|
EGU26-1426
|
On-site presentation
Edward R. Jones, Frederik Kratzert, and Michelle T. H. van Vliet

The last decade has seen a proliferation in efforts to compile, standardise and openly disseminate datasets spanning hundreds to thousands of catchments, driven by the emergence of large-sample hydrology as a sub-discipline in hydrological sciences. While these datasets have facilitated novel research into the field of water quantity (e.g. streamflow prediction), comparable advances for water quality research remain limited1.

Here, we present the first global integration of water quality into large-sample hydrology (named Caravan-Qual). The dataset contains >70 million river water quality observations covering 100 water quality constituents, compiled from a range of national-to-global datasets covering the period of 1980-2025. Water quality data has been standardised to common naming conventions and reporting units, and further processed to remove duplicates, detect outliers and handle observations below detection limits. Leveraging the Caravan2 dataset and open-source software, water quality monitoring stations are matched to streamflow gauges – with ~31% of daily water quality observations paired to a daily streamflow measurement within a 10km distance. Furthermore, meteorological variables (e.g. temperature, precipitation, net radiation) and catchment attributes (e.g. land cover, soil characteristics) are derived for water quality monitoring stations.

Caravan-Qual is openly available at: https://doi.org/10.5281/zenodo.177870663, and is envisaged to facilitate research into topics including:

  • Spatio-temporal analysis of river water quality dynamics at local to global scales.
  • Investigation of the relationships between (constituent-specific) river water quality responses and hydrological, meteorological and catchment characteristics.
  • The development and evaluation of process-based, hybrid and data-driven water quality models across diverse hydrological and climatic conditions.

References

1Jones, E. R., Graham, D. J., van Griensven, A., Flörke, M. & van Vliet, M. T. H. Blind spots in global water quality monitoring. Environmental Research Letters 19, 091001 (2024). https://doi.org:10.1088/1748-9326/ad6919

2Kratzert, F. et al. Caravan - A global community dataset for large-sample hydrology. Scientific Data 10, 61 (2023). https://doi.org:10.1038/s41597-023-01975-w

3Jones, E. R., Kratzert, F. & van Vliet, M. T. H. Caravan-Qual: A global scale integration of water quality observations into a large sample hydrology dataset. Zenodo [DATASET] (2025). https://doi.org:10.5281/zenodo.17787066

How to cite: Jones, E. R., Kratzert, F., and van Vliet, M. T. H.: Caravan-Qual: A global scale integration of water quality observations into a large-sample hydrology dataset, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-1426, https://doi.org/10.5194/egusphere-egu26-1426, 2026.

08:59–09:01
|
PICOA.9
|
EGU26-5442
|
On-site presentation
Michal Jeníček, Ondřej Ledvinka, Radovan Tyl, Petr Pavlík, Petr Kavka, Adam Vizina, Ondřej Nedělčev, Johnmark Nyame Acheampong, Mateja Fabečić, Mijael Rodrigo Vargas Godoy, Petr Šercl, Jana Bernsteinová, and Jakub Langhammer

Hydrological methods that analyse data from a large sample of catchments with diverse characteristics (large-sample hydrology; comparative hydrology) enable a comprehensive analysis of the hydrological regime and a description of hydrological variability and change in the components of the water balance. Comparative hydrology is better suited for examining the differences and similarities between river basins, thereby supporting their classification and regionalisation. Furthermore, hydrological models significantly streamline the processing of large sets of river basins.

We present CAMELS-CZ (Catchment Attributes and MEteorology for Large-sample Studies – Czechia), a database of catchment attributes for 453 catchments within Czechia, serving as a reference data platform for analysis and modelling using a large sample of catchments. The database provides catchment attributes, as well as hydrological and meteorological time series, in a comparable structure to other existing CAMELS products. The database includes catchments for which daily runoff data are available for at least 15 years. Catchment area ranges from 2 km2 to 10,000 km2 and covers a variety of elevations (200–1,600 m a.s.l.) and runoff regimes (from pluvial to nival-pluvial). Observed time series include runoff, precipitation, air temperature, potential evapotranspiration, and snow water equivalent. In addition to the observed data, the CAMELS-CZ time series was supplemented with simulated data of individual components of the water balance using a semi-distributed bucket-type HBV model. The model was calibrated against observed runoff and snow water equivalent. Simulated time series enable a more detailed assessment of the individual components of the water cycle.

How to cite: Jeníček, M., Ledvinka, O., Tyl, R., Pavlík, P., Kavka, P., Vizina, A., Nedělčev, O., Acheampong, J. N., Fabečić, M., Vargas Godoy, M. R., Šercl, P., Bernsteinová, J., and Langhammer, J.: CAMELS-CZ: a hydro-meteorological time series and attributes database for 453 catchments in Czechia, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5442, https://doi.org/10.5194/egusphere-egu26-5442, 2026.

Extending and improving existing datasets
09:01–09:03
|
PICOA.10
|
EGU26-12386
|
ECS
|
On-site presentation
Alexander Dolich, Eduardo Acuña Espinoza, Uwe Ehret, Jan Bondy, and Ralf Loritz

CAMELS datasets have been primary accelerators for Large-Sample Hydrology (LSH), providing extensive, harmonized hydro-meteorological data and establishing benchmarks that have fundamentally changed how data-driven models in Hydrology are developed and evaluated. However, to date, these efforts have predominantly focused on daily resolution. While the overall performance of deep learning models for daily rainfall-runoff modelling has reached a high standard - often plateauing with "vanilla" LSTMs - significant challenges remain. These include the accurate representation of flood peaks, drought dynamics, performance under non-stationary conditions, and the capturing of rapid events in small catchments. Although initial LSH studies have explored hourly data, fully exploiting sub-daily information remains an open and pressing challenge. The shift to high-resolution datasets offers the potential to improve modeling extreme floods and their dynamics and to capture runoff generation processes also in smaller catchments. However, this transition requires a reassessment of the current state-of-the-art: do the limitations of daily modelling persist at the hourly scale, are they resolved by higher resolution data, and which entirely new challenges arise?
To address these questions and facilitate the transition to sub-daily LSH, we introduce CAMELS-DE-1h, a comprehensive hourly dataset for Germany. It covers 1626 catchments with streamflow and meteorological forcing data spanning 2001 - 2024. Uniquely, CAMELS-DE-1h includes historical short-term meteorological forecasts (ICON-D2, 48 hours lead time) from 2021 - 2024, both as deterministic and ensemble forecasts. This novelty enables rigorous research regarding the propagation of meteorological uncertainty into hydrological predictions and the development of deep learning models for operational settings. With CAMELS-DE-1h, we provide open-source LSTM benchmarks for both discharge simulation and forecasting, and use these benchmarks to evaluate the transition from daily to hourly simulations. Specifically, we analyze how the transition to hourly resolution alters model behavior regarding peak flow timing and hydrograph shape, and discuss the challenges such as computational costs and the need for evaluation metrics adapted to sub-daily Large-Sample Hydrology.

How to cite: Dolich, A., Acuña Espinoza, E., Ehret, U., Bondy, J., and Loritz, R.: CAMELS-DE-1h: Advancing Large-Sample Hydrological Modeling by Shifting to the Hourly Scale, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12386, https://doi.org/10.5194/egusphere-egu26-12386, 2026.

09:03–09:05
|
PICOA.11
|
EGU26-12200
|
ECS
|
On-site presentation
Felipe Fileni, Hayley J. Fowler, Elizabeth Lewis, Fiona McLay, Gemma Coxon, David Archer, Emma Bruce, Longzhi Yang, Matt Fry, Hollie Cooper, and Ollie Swain

High-resolution (15-min) river flow records date back almost a century in the UK. Nevertheless, these data have historically been used only in small-scale studies or operational contexts. A key reason for this limited applicability is that, unlike rainfall, which benefits from established national sub-daily products that are easily accessible (e.g. CEH-GEAR1hr and GRaD-GB), no equivalent unified sub-daily flow product has been available at the national scale. At the same time, the rapid growth of large-sample studies has transformed the field of hydrology, enabling insights into spatial patterns, flood-generating mechanisms and assessing model performance across regions. However, most large-sample hydrological datasets still rely on daily series, whose coarse temporal resolution is insufficient to represent flood-event dynamics, which often unfold on sub-daily timescales.

The creation of UK-Flow15 (available at https://doi.org/10.5285/211710ac-f01b-4b52-807f-373babb1c368) is motivated by these two limitations: the absence of a national high-resolution flow dataset for the UK and the clear scientific value enabled by large-sample analyses. The new national-scale 15-minute dataset comprises >1.8 billion observations from 1,369 gauging stations, spanning 1948–2023.

Producing UK-Flow15 required extensive harmonisation, metadata reconciliation and cross-checks to resolve structural inconsistencies. Historically, the 15-minute records received far less attention than the daily and AMAX datasets derived from them, leaving the high-resolution series inconsistently digitised, stored in multiple versions and rarely subjected to the same level of quality control or metadata curation.

Furthermore, to ensure the dataset is FAIR, we developed and applied a comprehensive quality-control procedure designed to inform users about data quality and limitations. Flagging involved manual visual inspection of all stations, consistency checks against other UK hydrological products, and automated detection of common anomalies such as spikes, truncations, discontinuities, fluctuations and other artefacts. Additional high-flow checks assess the plausibility of extreme events by comparing them with rainfall at the location and concurrent flows in nearby catchments, highlighting cases where they may be hydrologically inconsistent.

UK-Flow15 and its QC framework form a robust standalone product, providing trustworthy, well-documented sub-hourly flow data. Beyond this, the dataset supports the enhancement of other large-sample products, including CAMELS-GB v2. Additionally, the QC system is also adaptable, offering a methodology that can be extended to the creation of other sub-daily/hourly hydrological datasets.

In this work, we aim to demonstrate how UK-Flow15 was processed, what data it contains and how it can be used. We outline the harmonisation and QC workflow applied to produce consistent national 15-minute records. We present the complete flow series, QC flags and metadata now openly accessible. We highlight applications for large-sample studies, flood-wave characterisation and hydrological model evaluation. We hope the dataset contributes to better-informed decisions on sub-daily flood processes at large scale.

How to cite: Fileni, F., Fowler, H. J., Lewis, E., McLay, F., Coxon, G., Archer, D., Bruce, E., Yang, L., Fry, M., Cooper, H., and Swain, O.: UK-Flow15: Sub-hourly river flow data observations from 1369 river gauges in the UK, 1948-2023, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12200, https://doi.org/10.5194/egusphere-egu26-12200, 2026.

09:05–09:07
|
PICOA.12
|
EGU26-12861
|
On-site presentation
Martina Kauzlaric, Bailey J. Anderson, Paul C. Astagneau, Paolo Benettin, Marius Floriancic, Pascal Horton, Basil Kraft, Thiago Nascimento, Jan Schwanbeck, Rosi Siber, Maria Staudinger, Daniel Viviroli, and Maria Grazia Zanoni

New CAMELS (Catchment Attributes and MEteorology for Large-sample Studies) datasets have been increasingly released over the past decade and have allowed for the collection and dissemination of about twenty national datasets, comprising thousands of catchments all around the world. This community effort of providing hydro-meteorological time series alongside relevant catchment attributes is essential for improving hydrological process understanding and modelling across a wide range of conditions. However, the high nonlinearity of the hydrological system and scaling problems in hydrology call for expanding CAMELS datasets to different scales. Most CAMELS datasets provide daily catchment-scale time series, limiting their applicability for sub-daily processes and scaling analyses. Processes such as rainfall–runoff timing, flood generation in mountainous regions, and human flow regulation operate at sub-daily scales and cannot be adequately captured by daily data. Here, we present first efforts to upgrade the existing CAMELS-CH dataset by increasing its temporal and spatial resolution. In addition to hourly hydro-meteorological time-series and statistics extracted at an even higher temporal resolution, we subdivide hydrological Switzerland into topographical catchment units of about 2km2, in order to allow for building nested catchments within the gauged catchments and for performing analyses at different scales. We also include additional attributes related to human influence and the impact of hydropower on streamflow. This dataset will be a valuable resource for different hydrological applications, and will enable the first consistent hydrological benchmarks at different spatial and temporal scales in a highly varying environment such as hydrological Switzerland.

How to cite: Kauzlaric, M., Anderson, B. J., Astagneau, P. C., Benettin, P., Floriancic, M., Horton, P., Kraft, B., Nascimento, T., Schwanbeck, J., Siber, R., Staudinger, M., Viviroli, D., and Zanoni, M. G.: Expanding CAMELS-CH: increasing resolution and including new assets, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12861, https://doi.org/10.5194/egusphere-egu26-12861, 2026.

09:07–09:09
|
PICOA.13
|
EGU26-16172
|
On-site presentation
Keirnan Fowler, Zhang Ziqi, and Hou Xue

This presentation will introduce version 2 (v2) of the Australian edition of the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) dataset. Since its initial release in 2021, CAMELS-AUS has been important in advancing research on hydrological change, arid-zone hydrology, and hydrological model refinement, with uptake by researchers both in Australia and internationally. This update significantly expands the dataset's scope and utility. The number of catchments covered has more than doubled, increasing from 222 to 561. Temporal coverage has been extended by eight years, now reaching 2022, compared to 2014 in the previous version. Furthermore, the quality and depth of attribute information has been enhanced, with improvements in information regarding hydrological signatures and streamflow uncertainty quantification. These  changes position CAMELS-AUS v2 as an improved and up-to-date resource for hydrological research and practical applications across Australia. CAMELS-AUS v2 is freely downloadable from https://doi.org/10.5281/zenodo.12575680

How to cite: Fowler, K., Ziqi, Z., and Xue, H.: Updating CAMELS-AUS to increase the catchment sample, lengthen hydrometeorological timeseries, and improve attribute information, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16172, https://doi.org/10.5194/egusphere-egu26-16172, 2026.

New hydrological signatures and descriptors
09:09–09:11
|
PICOA.14
|
EGU26-9948
|
ECS
|
On-site presentation
Mattia Neri and Elena Toth

Understanding the primary controls of basin dynamics provides the fundamental basis for transferring hydrological information. When focusing on the rainfall-runoff transformation at high temporal resolutions, catchment similarity should reflect the stochastic nature and temporal sequencing of streamflow. This requires an integrated analysis of the entire hydrograph and its forcings, ensuring that the information embedded in the flow propagation and generation processes is fully captured for regionalization purposes.

In a previous study (Neri et al., 2022), we introduced a novel hydrological signature based on the concept of transfer entropy (TE). This signature quantifies the information flow between the complete time series of meteorological forcings and observed streamflow. The approach leverages these information flows to identify dominant hydrological processes and to characterize and classify basins, under the assumption that similar TE values identify similar catchments. In Neri et al. (2022), the proposed technique was applied to a densely gauged set of Austrian catchments, demonstrating the potential of transfer entropy as an additional instrument for assessing hydrological similarity and for quantifying the connection between different governing processes. Specifically, the method proved capable of distinguishing the predominant or partial roles of snowmelt and evapotranspiration in the region, assessing differences in catchment response times, and highlighting the role of high orographic precipitation in snow-dominated catchments.

In this new study, the proposed approach is tested across diverse large-sample datasets within the Caravan framework (Kratzert et al., 2023), at both national and global scales. The objective of the analysis is to determine whether the potential identified in the previous experiment is generalizable—and to what extent—to more extensive study areas. Furthermore, we investigate how the methodology can be adapted to better identify basin dynamics in regions characterized by significantly higher hydro-climatic variability. Specifically, we explore the use of various meteorological forcings and the application of transfer entropy across multiple time scales. The results, in terms of both indicator values and basin dynamics classification, are interpreted in detail against a set of geo-morphological and climatic catchment features, as well as a set of typical and consolidated streamflow signatures.

 

References

Kratzert, F., Nearing, G., Addor, N., Erickson, T., Gauch, M., Gilon, O., Gudmundsson, L., Hassidim, A., Klotz, D., Nevo, S., Shalev, G., & Matias, Y. (2023). Caravan—A global community dataset for large-sample hydrology. Scientific Data, 10(1), 61. https://doi.org/10.1038/s41597-023-01975-w

Neri, M., Coulibaly, P., & Toth, E. (2022). Similarity of catchment dynamics based on the interaction between streamflow and forcing time series: Use of a transfer entropy signature. Journal of Hydrology, 614, 128555. https://doi.org/10.1016/j.jhydrol.2022.128555

How to cite: Neri, M. and Toth, E.: A transfer entropy signature to capture hydrological similarity: a global-scale validation of a measure based on the interaction between streamflow and forcing time series, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9948, https://doi.org/10.5194/egusphere-egu26-9948, 2026.

09:11–09:13
|
PICOA.15
|
EGU26-14445
|
ECS
|
On-site presentation
Mira Anand and Wouter Berghuijs

Streamflow is typically autocorrelated, and the degree to which prior conditions influence river flow informs how we predict future conditions and can drive temporal clustering of hydrological extremes. We introduce a memory decay curve that describes streamflow memory (i.e. autocorrelation) based on two components: initial strength and persistence. These curves effectively summarize the dynamics of catchment memory at monthly to multi-annual timescales, allowing for large-sample inter-catchment comparisons.

We fit these memory decay curves to streamflow measurements from thousands of EStreams stations across Europe from 1980-2021. From these curves, we distinguish four basic memory archetypes based on the combination of strong (or weak) memory and long (or short) persistence. These archetypes exhibit distinct geographic patterns across Europe, with strong and long memory most present in regions with large aquifers or deep bedrock.

Streamflow memory at different timescales shows varied connections to different surface, subsurface, and climate characteristics. We use a random forest model to predict memory from these characteristics at multiple timescales, with the highest skill for seasonal and the lowest skill for yearly predictions. Surface-related features (e.g. topography) influence model predictions at shorter timescales, whereas subsurface feature importances increase with lag time; climate features, in particular aridity, are important across all timescales. We also compare the memory present in observation-based data to the memory produced by modelled streamflow for Europe to understand how well these dynamics are represented in modelled data. The memory decay curves presented in this study demonstrate the presence of hydrologic memory in European catchments at timescales from months to years and can improve the understanding and prediction of streamflow dynamics.

How to cite: Anand, M. and Berghuijs, W.: Memory decay curves describe streamflow dynamics across Europe at multiple timescales, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14445, https://doi.org/10.5194/egusphere-egu26-14445, 2026.

09:13–10:15
Coffee break
Chairpersons: Martina Kauzlaric, Adriane Hövel
Impacts of climate and environmental change across scales
10:45–10:55
|
PICOA.1
|
EGU26-15027
|
solicited
|
On-site presentation
Georgia Destouni and Mohanna Zarei

All societies, economic sectors, and ecosystems depend on and influence the flows and storages of terrestrial water, both as vital freshwater resources and as sources of flood and drought risk. Robust assessment of the conditions and changes of these flows and storages relies on the availability, consistency, and physical realism of hydro-climatic datasets. Here, we evaluate four widely used global hydro-climatic datasets with harmonized spatiotemporal coverage (Zarei and Destouni, 2024): (i) Obs, based solely on in-situ observations; (ii) Mixed-GLEAM, combining the same observational data with model-based GLEAM evapotranspiration; and the fully model-based reanalysis products (iii) GLDAS and (iv) ERA5. 

Comparatively across these datasets, we analyze long-term means and trends across 1,561 catchments worldwide and for four regions: the Baltic (with 69 catchments), Mediterranean (182), South America (95), and Sub-Saharan Africa (127), over the period 1980–2010. All datasets consistently show large-scale spatial trends of increasing mean temperature, precipitation, evapotranspiration, and runoff from high latitudes toward the equator. In contrast, estimates of water-storage change (DS) and its spatial patterns differ markedly among datasets. GLDAS exhibits near-zero long-term average DS, implying no systematic drying or wetting, whereas ERA5 indicates predominantly strong negative DS (systematic drying), except in the Baltic region where positive DS (systematic wetting) dominates. Temporal trend analyses further show agreement among datasets for rising temperatures, but weaker, often insignificant, and divergent change trends in precipitation, runoff, and evapotranspiration, both in magnitude and direction. Overall, the intercomparison reveals that ERA5 departs substantially from observation-based estimates and from the other datasets, with systematic biases and physically implausible implications for the terrestrial water fluxes and storage changes across regions and globally. 

Physically inconsistent storage-change implications affect inferred runoff-generation processes, hydrological memory, and model parameter transferability in large-sample applications and catchment modeling. The dataset intercomparison results raise fundamental concerns regarding the suitability of ERA5 for large-sample assessments of terrestrial water variability and change. In general, constraining the assessments by catchment-wise water-balance linkages and closure offers a valuable framework for diagnosing dataset realism and advancing unified understanding of the climate-and-water interplay across regions and scales.

Reference: Zarei, M., Destouni, G. (2024). A global multi catchment and multi dataset synthesis for water fluxes and storage changes on land. Scientific Data, 11, 1333.

How to cite: Destouni, G. and Zarei, M.: Understanding Hydro-climatic Variability and Change Across World Regions and Scales: A Multi-Catchment, Multi-Dataset Approach, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-15027, https://doi.org/10.5194/egusphere-egu26-15027, 2026.

10:55–10:57
|
PICOA.2
|
EGU26-19775
|
On-site presentation
Rolf Hut and Mark Melotto

The rise of Large Sample Hydrology has given hydrologist the data and tools to systematically analyze the behavior and characteristics of catchments across the globe: across different climate regimes, geologies, countries and continents, gaining insights in what are local, versus what are global phenomena. With this work we launch large sample hydrology into the future by providing a unified workflow to derive forcing from any CMIP6 climate model and scenario for any catchment available in the CARAVAN dataset. We will present which regions, from a hydrological point of view, are expected to be hardest hit by climate change.

 

We present a reproducible, FAIR-by-design workflow built on the eWaterCycle platform that enables climate change impact simulations using multiple hydrological models across any catchment from the CAMELS and CARAVAN datasets. eWaterCycle is a platform that facilitates open and FAIR hydrological modeling research. Different models are seamlessly integrated into eWaterCycle as plugins using software containers and the Basic Model Interface (BMI). Because of the clear separation between experiment and model, running the same experiment with different models is straightforward.

Recently, we hosted the CARAVAN dataset on a remote (OpenDAP) server and added support in eWaterCycle to access catchment data with a single line of code. Combined with our existing functionality to generate hydrological forcing data from any CMIP6 climate model run, this now makes it easy to perform climate change impact analyses for any catchment in the CARAVAN dataset.

We will demonstrate how any hydrological modeler or researcher can use this workflow today. The workflow consists of a collection of Jupyter notebooks that anyone can use to conduct their own climate change impact analyses. We will highlight how these standardized workflows have enabled undergraduate students to independently carry out impact analyses on a region and problem of their own choosing for their thesis.

This is also an open invitation to anyone interested in performing climate change impact analyses or hydrological modeling using the CARAVAN datasets: eWaterCycle is freely available to use, and we welcome collaborations.

How to cite: Hut, R. and Melotto, M.: Climate scenario data for all of CARAVAN shows which regions will be hit hardest by Climate Change, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19775, https://doi.org/10.5194/egusphere-egu26-19775, 2026.

10:57–10:59
|
PICOA.3
|
EGU26-10361
|
ECS
|
On-site presentation
Emma Ford, Wilson Chan, Amulya Chevuturi, Eugene Magee, Rachael Armitage, Bastien Dieppois, Manuela Brunner, Hannah Christensen, and Louise Slater

Floods are hydro-climatic extremes with severe socioeconomic and environmental consequences. Many studies have examined how large-scale modes of climate variability (e.g., ENSO, NAO) influence floods, but many have relied on catchments influenced by anthropogenic activities, which obscure underlying climate-flood relationships. Here, we use the newly released ROBIN Reference Hydrometric Network, a global dataset of over 3,000 near-natural catchments with daily streamflow records, to provide an observational assessment of climate-flood relationships at the global scale. We first quantify long-term and multi-temporal trends in annual flood peaks and peak-over-threshold events and evaluate their connections with key modes of climate variability across different IPCC regions. Trend analysis reveals how flood metrics have evolved across regions and time periods, while correlation analysis reveals the modes of climate variability that are associated with year-to-year variations in flood peaks and frequencies. A signal-to-noise framework tests whether global mean surface temperature leaves a detectable fingerprint on high flow regimes. This analysis helps to clarify the extent to which climate variability influences flood occurrence and magnitude in near-natural catchments worldwide. Moreover, we propose a machine learning-based process attribution framework to identify climate and catchment controls on floods in near-natural catchments. Preliminary results indicate substantial spatial variability in dominant flood drivers across and within IPCC regions and suggest that large-scale atmospheric circulation modes exert strong, but regionally distinct, influence on seasonal flood frequency. Overall, our findings underscore the importance of regional climate modes in modulating floods and provide the first global baseline on climate-driven changes to floods in near-natural catchments.  

How to cite: Ford, E., Chan, W., Chevuturi, A., Magee, E., Armitage, R., Dieppois, B., Brunner, M., Christensen, H., and Slater, L.: Global climate signals of floods in near-natural rivers , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-10361, https://doi.org/10.5194/egusphere-egu26-10361, 2026.

10:59–11:01
|
PICOA.4
|
EGU26-8010
|
ECS
|
On-site presentation
Julia dos Santos da Silva, Bruno J. Lemaire, Francesca Pianosi, and Fanny Sarrazin

Reservoirs are present in many catchments worldwide. They allow to regulate river flow in support of human activities, and they help to reduce flood risk and sustain low flows. They can significantly alter the natural flow of rivers, depending on how they are used and managed, and thus affect ecosystem functioning. However, the large-scale impact of reservoirs on streamflow regimes is not well understood because data on their operating rules are rarely publicly available. Hydrological studies often neglect reservoirs and are often limited to “natural” catchments that are not influenced by human activities.

Here we aim to assess the impact of reservoir regulation on river flow at the national scale in France over the 1970–2020 period, and to determine to what extent the impact varies depending on the reservoir purpose and the physio-climatic conditions. We focus on France, which presents a large variety of landscapes and reservoirs, and where the construction of new reservoirs is envisaged in the face of rising irrigation water demand. We compile a national reservoir dataset by combining data from different sources. We assess their impact by comparing the observed streamflow between 227 regulated catchments and 908 benchmark catchments, that are assumed to be representative of natural flow conditions. We adopt different hydrological signatures that capture different aspects of the streamflow, namely its average value, its inter- and intra-annual variability, and hydrological extremes.

The results show some similarities with those of previous studies in the United Kingdom and the United States, for instance reduced seasonality and flood peaks in regulated compared to benchmark catchments. Notably, we also observe specificities for the French reservoirs that can, among others, increase drought severity. Ultimately, the study allows us to better understand how reservoirs can affect regulated rivers, thus informing their management and their integration into large-scale hydrological models.

How to cite: dos Santos da Silva, J., Lemaire, B. J., Pianosi, F., and Sarrazin, F.: Impact of reservoirs on river flow from a large sample of catchments in France , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8010, https://doi.org/10.5194/egusphere-egu26-8010, 2026.

11:01–11:03
|
PICOA.5
|
EGU26-18868
|
ECS
|
On-site presentation
Anna Luisa Hemshorn de Sánchez, Wouter Berghuijs, Anne F. Van Loon, Dimmie Hendriks, and Ype van der Velde

Understanding large-scale patterns in streamflow response to precipitation variability helps identifying places where precipitation changes most strongly affect streamflow. This study presents the sensitivity of annual streamflow of over 8,000 European catchments to annual and seasonal precipitation variability, as measured by observation-based streamflow elasticities. We extend the scope of the conventionally studied mean flows by incorporating annual maximum and minimum flows as well. As anticipated, both annual mean and extreme flows generally increased with higher annual mean precipitation. On average for Europe, a 1% change in annual precipitation on average resulted in an amplified flow response of 1.2% in annual mean flows, an even stronger amplification of 1.3% in annual maximum flows, and a dampened response of 0.9% in annual minimum flows. These elasticities exhibited distinct regional patterns. Northern Poland and the Baltic States featured remarkably insensitive mean and extreme streamflow. Furthermore, annual maximum flows in the mountainous Central Europe were highly sensitive to summer precipitation. In Spain, a high elasticity of mean and maximum flows to winter precipitation was observed. The elasticity of low flows appeared to be more localised and less related to precipitation variability. We then employed a random forest model that incorporated 20 climate and catchment characteristics to examine their relationship with streamflow elasticities and identify the climate characteristics exhibiting the strongest correlation. Despite the high number of characteristics included the model’s capacity to predict the elasticities based on the selected input variables was relatively low, suggesting that some key drivers remain unaccounted for. An important factor influencing streamflow elasticities that was not comprehensively addressed through the random forest input variables is human activity. To further explore the human influence on streamflow response to precipitation, we studied approximately 150 Dutch catchments with varying degrees of human influence. We used six hydrological signatures to group the catchments into typologies of similar behaviour and analysed whether these typologies are related to the degree of management. This research advances our understanding of mean and extreme streamflow responses across regional and large scales.

How to cite: Hemshorn de Sánchez, A. L., Berghuijs, W., Van Loon, A. F., Hendriks, D., and van der Velde, Y.: Sensitivities of mean and extreme streamflow to precipitation variability across Europe, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18868, https://doi.org/10.5194/egusphere-egu26-18868, 2026.

Value of data for advancing catchment modeling
11:03–11:05
|
PICOA.6
|
EGU26-2853
|
ECS
|
On-site presentation
Ryoko Araki, Anne Holt, John Hammond, Admin Husic, Gemma Coxon, and Hilary McMillan

Understanding hydrologic processes is essential for developing effective hydrologic models and management strategies. However, we lack a continental-scale, comprehensive knowledge of which processes dominate and how their drivers vary across diverse landscapes.

To address this gap, we synthesize large-sample precipitation and streamflow datasets, from Caravan and USGS GAGES-II, to identify spatial patterns in hydrologic behavior. Then, we apply a random forest machine learning model to examine the predictability of hydrological processes and to understand their climatic and landscape drivers. We use a hydrologic signature approach, where signatures—metrics derived from observed hydroclimatic time series—capture key aspects of hydrologic dynamics. 

Using these hydrologic signatures, we developed a “dominant process map” that highlights the spatial variability of baseflow, overland flow, water balance loss, and storage capacity across the conterminous United States. The map demonstrates clear regional gradients from baseflow to overland flow regimes, as well as transitions from water-retaining to low-storage regions. 

In contrast to previous studies emphasizing climate as the primary driver of these processes, our map highlights substantial influences from landscape features. In the eastern half of the US, baseflow is primarily influenced by soils and geology, while stormflow is controlled by topography. In the western US, climate remains the dominant control of most processes. Metropolitan areas emerged as hotspots influenced by anthropogenic factors.

Our dominant process maps serve as a valuable hypothesis-generating tool for model builders and water managers to estimate regional hydrological processes a priori. Our approach to training random forest models to predict hydrologic signatures is readily applicable to other datasets; this facilitates extrapolating hydrological process knowledge from well-studied catchments to ungaged basins or other large-sample datasets.  

How to cite: Araki, R., Holt, A., Hammond, J., Husic, A., Coxon, G., and McMillan, H.: Continental-scale prediction of hydrologic signatures and processes, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-2853, https://doi.org/10.5194/egusphere-egu26-2853, 2026.

11:05–11:07
|
PICOA.7
|
EGU26-13583
|
ECS
|
On-site presentation
Larisa Tarasova, Chahinaz Ziani, and Lars Ribbe

Catchment descriptors are standard explanatory factors for catchments hydrological signatures, they are widely used to infer dominant hydrological processes, identify and transfer information across similar catchment, and upscale findings from smaller to larger scales. However, conventional approaches for deriving catchment descriptors use spatial averages over the catchment, that overlook the inherent spatial variability arising from geomorphological catchment organization. This could explain the limited accuracy of existing models to predict hydrological responses across catchments. In this study, we examine the potential of topography to capture the spatial variability of catchment descriptors. We weight various catchment descriptors with four topographic metrics reflecting distinct aspects of spatial variability, namely, horizontal channel proximity (distance to nearest drainage), vertical drainage potential (height about the nearest drainage), flow-path length (distance to outlet), and river network hierarchy (stream order). We test their added value of the enhanced descriptors to predict mean values and variability of streamflow event characteristics (event runoff coefficient, time scale, rise time) in 392 German catchments.

Results show considerable improvement in prediction accuracy of mean event rise time and time scale using catchment descriptors weighted with distance to the nearest drainage and outlet compared to standard averaged descriptors. The proximity to the drainage that effectively controls the travel time likely to exert a strong control of shape and timing of event hydrographs. The prediction of the variability of event runoff coefficient improved considerably using when descriptors weighted with height above the nearest drainage. The latter effectively captures the soil moisture levels and channel saturation that likely controls the variability of runoff coefficients. However, predictions of the mean runoff coefficient, and variability of event time scale and rise time exhibited minimal gains. This indicates that these characteristics are rather governed by the climate and soil properties at larger scale, while their smaller scale variability plays only minor role. These findings demonstrate how topographic metrics serve as effective proxies for catchment geomorphological organization. The derived topographically-enhanced catchment descriptors have potential to improve predictions of hydrological signatures

How to cite: Tarasova, L., Ziani, C., and Ribbe, L.: The use of topographically-enhanced catchment descriptors to improve the predictions of streamflow events characteristics across German catchments, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13583, https://doi.org/10.5194/egusphere-egu26-13583, 2026.

11:07–11:09
|
PICOA.8
|
EGU26-7548
|
ECS
|
On-site presentation
Thiago V. M. do Nascimento, Julia Rudlang, Sebastian Gnann, Jan Seibert, Markus Hrachowitz, and Fabrizio Fenicia

Large-sample hydrology datasets have become increasingly popular in recent years, providing hydro-meteorological time series and catchment attributes for thousands of catchments worldwide. However, the role of such catchment attributes in informing model regionalization, and particularly the effect of their level of spatial detail on prediction in ungauged basins (PUB), remains poorly explored. This study addresses this gap by examining whether catchment attributes derived from geological maps of varying levels of detail improve model regionalization and, in turn, PUB, for both a bucket-type model and a data-driven Long Short-Term Memory (LSTM) model across 130 catchments in two independent basins: the Moselle (27 100 km²) and the Garonne (13 730 km²). We conducted five modeling experiments: a benchmark without geological information and four geology-informed configurations with increasing levels of detail (random, global, continental, and regional). A fold-based space–time cross-evaluation strategy was used to assess model performance on both time periods and catchments unseen during calibration. Performance was evaluated using a modified Nash–Sutcliffe Efficiency (NSE) and a set of streamflow signatures describing flow variability, storage, and regime behavior. Across both basins and model types, benchmark experiments yielded the lowest space–time performance, followed by the random experiment and the global geology experiment, while the experiments using continental and regional geology consistently resulted in higher NSE values. Improvements were strongest for the bucket-type model, with the most detailed geological attributes leading to consistent gains in median performance and robustness. Differences among experiments were more pronounced for streamflow signatures. For the bucket-type model, only the experiments adopting the continental or regional geology reproduced observed signatures with Spearman’s correlations exceeding 0.60, whereas the LSTM model already showed reasonable skill in the benchmark case but still benefited systematically from increasing geological detail. Together, these results demonstrate that incorporating detailed geological information can enhance streamflow representation and model transferability in PUB applications, and that the level of geological detail is a critical, yet often overlooked, factor in large-sample hydrology and regionalization studies.

How to cite: M. do Nascimento, T. V., Rudlang, J., Gnann, S., Seibert, J., Hrachowitz, M., and Fenicia, F.: How do geological map details influence hydrological model transferability to ungauged basins in large-sample studies?, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7548, https://doi.org/10.5194/egusphere-egu26-7548, 2026.

11:09–11:11
|
PICOA.9
|
EGU26-16675
|
ECS
|
On-site presentation
Iiro Seppä, Daniel Klotz, Carlos Gonzales Inca, and Petteri Alho

Despite deep learning’s recent performance dominance in rainfall-runoff modeling, we still don’t know which variables truly matter in the boreal zone for it. Only a few studies have attempted to determine what variables have largest influence on the prediction quality, and none have done so in the boreal zone. Most feature importance studies have also failed to account for the strong relationships between different covariates present in hydrometeorological datasets and the detrimental effects these pose for many popular feature importance methods. The aim of this study is to address this research gap and increase knowledge on the dominant drivers of rainfall-runoff processes in the boreal zone. More specifically, we sought to create a ranking of feature importances for large selection of catchments and to identify and explain regional differences in the importances.

As a baseline, an ensemble of long short-term memory networks was trained to predict daily runoff for 101 Finnish catchments using 13 dynamic meteorological variables and 36 static attributes from the CAMELS-FI (Catchment Attributes and MEteorology for Large-sample Studies, FInland) dataset. To robustly determine feature importance, three different methods were employed, each involving leaving variables out, retraining the model and evaluating the change in performance, across several performance metrics. The first method was leave-one-covariate-out (LOCO), second was leave-one-covariate-in (LOCI) and third excluded the variable of interest as well as all the correlated variables (leave-one-group-out, LOGO). LOCI was implemented separately to static and dynamic features, such that static features received all dynamic inputs and vice versa.

The results demonstrate significant variations in feature importance both between the different setups and among catchments. The baseline mode performed excellently (mean KGE 0.85). LOCI revealed that snow-related information is more important than precipitation outside the southwest coast of Finland, for multiple metrics related to mean and high flow conditions. This is much further south than previous research has suggested. However, precipitation was the only feature with substantial decline in performance in a LOCO setting (mean KGE 0.74), indicating that it provides information that is both important and unique and that other features are (almost) fully reconstructible from collinear features. Removal of all static attributes reduced the predictive power of the model substantially (mean KGE 0.67). The decline in performance was not spatially uniform. It was greatest in catchments that deviate most from ”average” catchment properties, particularly those with large lake area. The importance of lakes is further supported by the fact that the performance can be mostly restored by reintroducing lake area percentage back to the data (mean KGE 0.79).

This study highlights three key considerations for feature importance analysis in data driven hydrological modeling.

First, focusing solely on global feature importance overlooks regional differences and variables that are important to specific catchments.

Second, hydrologists should account for the correlation structure of hydrological datasets, both when selecting a feature importance method and when interpreting the results.

Third, we argue that the methods examined here measure different aspects of feature importance, and none alone would be sufficient to provide a complete understanding.

How to cite: Seppä, I., Klotz, D., Gonzales Inca, C., and Alho, P.: Feature importance for deep learning rainfall-runoff modeling in the boreal zone, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16675, https://doi.org/10.5194/egusphere-egu26-16675, 2026.

Hydrological modeling across scales
11:11–11:13
|
PICOA.10
|
EGU26-21812
|
ECS
|
On-site presentation
Paul Royer-Gaspard, Olivier Robelin, Mathilde Puche, and Magali Troin

Predicting flows in ungauged basins is a key challenge for integrated water resource management and hydrological risk prevention. To fill these gaps, regionalization approaches have generally relied on conceptual or physically-based hydrological models whose parameters are calibrated on gauged rivers and then transferred to ungauged rivers. However, these methods often have significant limitations in terms of robustness and accuracy, particularly when applied in heterogeneous hydrological contexts [1].

With the growing adoption of machine and deep learning by the hydrological community, new opportunities are emerging for operational hydrology. In particular, recurrent neural networks such as Long Short-Term Memory networks (LSTM) have proven to be effective in exploiting large databases of observed flows and climate forcings compared to traditional locally or regionally calibrated approaches [2]. Nevertheless, LSTM is still rarely used in France for prediction outside of a few research projects (e.g. [3,4]).

The objective of this study is to compare an LSTM model with the GR5J model [5,6] in a regionalization exercise with a special focus in flood prediction. The evaluation is carried out on the French catchments of the Explore 2 project, which gather more than 600 catchments [7]. The GR5J model, which stands for Génie Rural Journalier à 5 Paramètres (5-parameter daily rural engineering model), is a widely used reference model in catchment hydrology modeling due to its simplicity and flexibility. GR5J parameters are regionalized with different algorithms, including traditional spatial proximity and catchment similarity methods as well as a machine learning method based on random forest regression [8]. A direct assessment of flood statistics is also performed with random forest regression as a benchmark for flood prediction.

The models are evaluated according to global criteria as well as hydrological signatures representative of flood hazards. The hydrological characteristics of the catchments are analyzed to identify favorable and unfavorable conditions for regionalization.

This study discusses the prospects offered by deep learning for hydrological regionalization and its future integration into operational applications such as hydrological projection.

 

[1] Guo et al. (2020). https://doi.org/10.1002/wat2.1487

[2] Kratzert et al. (2019). https://doi.org/10.5194/hess-23-5089-2019

[3] Hashemi et al. (2022). https://doi.org/10.5194/hess-26-5793-2022

[4] Puche et al. (2026, in review). http://dx.doi.org/10.2139/ssrn.5286855

[5] Perrin et al. (2003). https://doi.org/10.1016/S0022-1694(03)00225-7

[6] Le Moine (2008). https://hal.inrae.fr/tel-02591478v1

[7] Sauquet et al. (2025). https://doi.org/10.5194/egusphere-2025-1788

[8] Saadi et al. (2019). https://doi.org/10.3390/w11081540

How to cite: Royer-Gaspard, P., Robelin, O., Puche, M., and Troin, M.: Comparison of deep learning and conceptual models for the prediction of flood statistics in ungauged catchments in France, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21812, https://doi.org/10.5194/egusphere-egu26-21812, 2026.

11:13–11:15
|
PICOA.11
|
EGU26-16673
|
On-site presentation
Benedikt Heudorfer and Ralf Loritz

With increasingly large samples being used in deep learning hydrology, the computational cost of training models such as Long Short-Term Memory (LSTM) networks raises fundamental questions about how much (and which type of) data are actually needed. This study investigates the information content of different training data subsets and their impact on predictive skill. To do so, we systematically train LSTM models on progressively larger subsamples of the CAMELS-US dataset, using 11 different ablation/subsampling strategies that emphasize different parts of the training data, associated with different hydrological regimes, statistical representativity, temporal context, and spatial coverage. We then evaluate LSTM performance gains as a function of subsample size.

As training data volume increases, performance gains saturate more or less rapidly depending on the specific strategy tested. Random sampling emerges as the most robust and efficient strategy, achieving strong predictive skill (NSE > 0.7) with roughly 10% of the available data, illustrating high representativity of the full dataset. Temporal ablations reveal that surprisingly short input sequences (≈ 2 weeks) and limited historical records (≈ 2 years) suffice for competitive performance (NSE > 0.7), highlighting the value of including much shorter time series into datasets like CAMELS than previously assumed valuable. In contrast, although high-flow conditions have been shown in literature to be particularly information-rich, exclusively training on extremes underperforms compared to above-mentioned ablation strategies in our setup. Likewise, we show that spatial subsampling substantially limits generalized performance, underscoring the importance of spatial hydro-climatic diversity.

Overall, the results demonstrate that training efficiency in data-driven hydrology is governed more by data representativity than by targeted selection of e.g. specific event types. These findings provide practical guidance for cost-effective model development, pre-training, and experimental design in large-sample hydrologic deep learning.

How to cite: Heudorfer, B. and Loritz, R.: Is smart sampling helping to train more efficent deep learning model in Hydrology?, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16673, https://doi.org/10.5194/egusphere-egu26-16673, 2026.

11:15–11:17
|
PICOA.12
|
EGU26-932
|
ECS
|
On-site presentation
Sacha Ruzzante, Wouter Knoben, Thorsten Wagener, Tom Gleeson, and Markus Schnorbus

Variability in river flow can be understood as the sum of irregular, seasonal and interannual variance components. Skillful simulations of irregular events are needed to accurately predict short-duration events such as floods, while skillful simulation of interannual variance is required to accurately predict long-term change and long-duration droughts. However, popular performance metrics such as the Nash-Sutcliffe Efficiency (NSE) and Kling-Gupta Efficiency (KGE) do not distinguish these three variance components. We analyse streamflow simulations from 18 process-based, machine learning, and hybrid hydrologic models from around the globe (22,089 simulated time series in total) to investigate how well large-sample hydrologic models represent each variance component. We find that in highly seasonal (tropical, alpine, and polar) catchments these models achieve very high NSE and KGE values but produce worse-than-average simulations of interannual and irregular variance. Year-to-year variability in streamflow extremes and monthly mean flows is consistently more poorly simulated in highly seasonal catchments than in less-seasonal catchments. This suggests that these hydrologic models have limited skill in predicting long-term responses to climate change in alpine, polar, and tropical regions, which are some of the most vulnerable regimes regarding climate change. There is a need to rethink the value of efficiency scores such as NSE and KGE in large-domain model evaluation, and to complement such approaches with more detailed and more process-based investigations of model performance.

How to cite: Ruzzante, S., Knoben, W., Wagener, T., Gleeson, T., and Schnorbus, M.: Large-sample hydrologic models poorly simulate interannual variability in seasonal catchments, despite high Nash-Sutcliffe and Kling-Gupta Efficiencies, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-932, https://doi.org/10.5194/egusphere-egu26-932, 2026.

11:17–11:19
|
PICOA.13
|
EGU26-12908
|
On-site presentation
Jan Seibert, Marc Vis, and Sandra Pool

Large-sample datasets have become available for many regions worldwide, and their availability has changed hydrological catchment modelling. Assessing model performance is an essential component of most large-sample applications. When assessing model performance, an important question is how to interpret the values of performance measures. We have previously shown that the performance of an uncalibrated bucket-type model varies significantly across regions. In humid or snow-dominated catchments,  NSE values of 0.8 or higher can be reached with an uncalibrated model, which are values often considered as good. This implies that using a fixed value for a performance measure to judge model performance, as sometimes suggested in the literature, is inappropriate. Instead, one should consider that given the local hydroclimatic conditions and the available data quality, the performance we should expect from any model in a particular catchment can vary widely. At the same time, a perfect fit (value of 1) is usually impossible to achieve due to model and data errors and uncertainties. Therefore, it is helpful to compare model performances to lower and upper benchmarks.

The purpose of this study was two-fold. First, we examined how to compute lower performance bounds from randomly chosen parameter sets, including guidance for appropriate ensemble sizes, the effects of parameter ranges, and the selection of parameter sets. We also examined the relationships between lower and upper benchmarks and catchment characteristics.  Secondly, we utilised these findings to compute both lower and upper benchmarks for many of the existing CAMELS datasets. By providing these values to the modelling community, we aim to facilitate the broader use of lower and upper benchmarks in large sample hydrological modelling studies. We argue that these values are valuable to the hydrological modelling community, as they provide a basis for benchmarking model performance across the various CAMELS datasets. This will allow assessment of model performance, considering what one could and should expect for a particular catchment. Such assessments are important, for instance, when one seeks to evaluate the adequacy of model structures or compare approaches for the prediction in ungauged basins.

How to cite: Seibert, J., Vis, M., and Pool, S.: Setting the Bar: Benchmarks for Model Performances in Large-Sample Hydrology, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12908, https://doi.org/10.5194/egusphere-egu26-12908, 2026.

11:19–11:21
|
PICOA.14
|
EGU26-18887
|
On-site presentation
Ralf Merz and Zhenyu Wang

Conceptual bucket-type models have been a mainstay in hydrology for decades due to their simplicity and flexibility. With the emergence of large-sample datasets regional applications of these models has become increasingly feasible and relevant. However, challenges remain: how can we ensure consistent modeling across catchment boundaries, and do typical model setups capture the dominant processes shaping regional water cycles?

To address these challenges, we first leverage large datasets, including CAMELS-DE and thousands of groundwater level time series across Germany, to build and validate a conceptual, fully distributed hydrological model at the national scale. Using the SALTO model as an example, we demonstrate how the Parameter Set Shuffling (PASS) approach enables regional calibration while accounting for spatial variability.

We discuss strategies to incorporate anthropogenic impacts into regional water cycle modeling, including reservoirs, dams, drinking water abstraction, and wastewater return flows. By integrating these human influences, our approach provides a more realistic representation of Germany’s hydrology.

Additionally, we introduce an event-based model diagnostic framework that identifies which hydrological conditions are reliably represented by the model structure and highlights the potential of large-sample data to improve regional hydrological modeling.

How to cite: Merz, R. and Wang, Z.:  Scaling Up Buckets – Using large sample data to build a regional hydrological model, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18887, https://doi.org/10.5194/egusphere-egu26-18887, 2026.

11:21–11:23
|
PICOA.15
|
EGU26-7000
|
ECS
|
On-site presentation
Rosanna Lane, Helen Baron, Elizabeth Cooper, and Emma Robinson

Model intercomparison projects (MIPs) have many benefits, including improving understanding of model capabilities, furthering advancements in model development, providing a benchmark of model performance, helping to quantify modelling uncertainties and fostering collaboration. Here, we introduce the UK Hydro-MIP, a community-led hydrological and land surface model intercomparison for streamflow simulation across Great Britain. This MIP encouraged members of the community to submit modelled daily river flows, following an agreed model protocol to ensure consistency in driving data and output formats. A diverse range of model types were represented, including land surface models, physically based to conceptual hydrological models, and machine learning models. The resultant large sample dataset, including modelled river flows and evaluation metrics from over 16 models for over 628 catchments, will be released later this year.

Initial analysis of the dataset was carried out during a hackathon event, where all contributors and stakeholders were invited to an in-person meeting to discuss priorities and analyse results together. Here, we present initial results from the UK Hydro-MIP and the hackathon, highlighting the relative strengths of different modelling approaches and common modelling challenges across Great Britain.

How to cite: Lane, R., Baron, H., Cooper, E., and Robinson, E.: The UK Hydro-MIP: Evaluating modelled river flows from a diverse set of models over a large sample of British catchments, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7000, https://doi.org/10.5194/egusphere-egu26-7000, 2026.

11:23–12:30
Please check your login data.