HS2.3.10 | Advances in Hybrid Modeling for Hydrologic and Water Quality Forecasting: Integrating Machine Learning with Process-Based Approaches
Advances in Hybrid Modeling for Hydrologic and Water Quality Forecasting: Integrating Machine Learning with Process-Based Approaches
Convener: Xilin Xia | Co-conveners: Elias Getahun, Zhenxing Zhang, David Hannah, Wuhuan Zhang
Fri, 08 May, 16:15–18:00 (CEST)
 
Room 2.17
Posters on site
| Attendance Fri, 08 May, 14:00–15:45 (CEST) | Display Fri, 08 May, 14:00–18:00
 
Hall A
Fri, 16:15
Fri, 14:00
This session explores the forefront of hybrid modeling that integrates process-based hydrologic and water quality models with AI and machine learning (ML) techniques to improve predictions and management of water resources under climate change stresses. Hybrid modeling leverages the physical realism of process-based models alongside the adaptive learning and data-driven capabilities of ML – including frontier AI such as foundation models and Large Language Models (LLMs) to overcome limitations such as data scarcity, structural deficiencies in process-based models, and the challenges of simulating non-linear and complex hydrological processes.
Contributions are sought to advance the conceptual and practical understanding of hybrid models applied to hydrologic and water quality simulation, especially those focusing on:
• Improving streamflow and pollutant transport predictions in diverse hydro-climatic and data-scarce regions
• Hybrid approaches for simulating nonpoint source pollution and watershed-scale water quality dynamics
• Regional and catchment-scale applications demonstrating scalability, transferability, and robustness of hybrid frameworks
• Real-time forecasting and operational water management enabled by hybrid modeling
• Policy-relevant applications linking model outputs to climate adaptation, water allocation, and resilience strategies
• Methodological challenges and solutions regarding model interpretability, uncertainty quantification, computational efficiency, and equitable technology access
This session will foster interdisciplinary dialogue on designing, implementing, and applying hybrid modeling approaches that enhance hydrologic prediction and water quality assessment to support sustainable water resource management and climate resilience.

Posters on site: Fri, 8 May, 14:00–15:45 | Hall A

The posters scheduled for on-site presentation are only visible in the poster hall in Vienna. If authors uploaded their presentation files, these files are linked from the abstracts below.
Display time: Fri, 8 May, 14:00–18:00
A.14
|
EGU26-16503
|
ECS
Seungmin Lee, Hoyong Lee, Seonuk Baek, Imee V. Necesito, and Soojun Kim

Harmful Algal Blooms (HABs) pose a significant threat to freshwater ecosystems and public health globally, necessitating reliable early warning systems for effective water resource management. This study presents an end-to-end AI framework designed to predict 4-level ordinal algal alerts in South Korea by systematically integrating heterogeneous spatio-temporal environmental datasets, including GIS-based spatial features, water quality, meteorological, and hydrological data. Our methodological approach involves: (1) extracting spatial features via GIS; (2) optimizing time-lags and interpolating time-series data based on Spearman correlation; and (3) performing ordinal classification using a LightGBM (LGBM) model. To address the ordinal nature of algal alerts, the model was optimized using Optuna with the Quadratic Weighted Kappa (QWK) metric. A rigorous Dual Cross-Validation (CV) framework was employed to assess generalization capabilities: Year-over-Year (YoY) CV with an Embargo technique was used to evaluate temporal performance while preventing data leakage, and Leave-One-Station-Out (LOSO) CV was applied to validate spatial generalization for unobserved locations. Additionally, Isotonic Regression was implemented for probability calibration to enhance the reliability of the predicted outputs. By effectively controlling spatio-temporal information leakage, this study demonstrates superior predictive performance across unobserved timeframes and locations, providing a robust decision-support tool for practical water quality management.

How to cite: Lee, S., Lee, H., Baek, S., Necesito, I. V., and Kim, S.: Robust Ordinal Algal Alert Prediction Framework Integrating Heterogeneous Spatio-temporal Data and Dual Cross-Validation, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16503, https://doi.org/10.5194/egusphere-egu26-16503, 2026.

A.15
|
EGU26-16874
Elias Getahun and Raghav Kharosekar

Accurate prediction of nitrate‑nitrogen (NO₃‑N) and total phosphorus (TP) loads is critical for managing water quality in agricultural watersheds, where excess nutrient runoff can contribute to downstream eutrophication. This study applies Random Forest (RF) regression and conditional inference RF to predict monthly NO₃‑N and TP loads at eight monitoring gages within Conservation Reserve Enhancement Program (CREP) watersheds in the Illinois River and Kaskaskia River basins. The machine learning (ML) models were trained using hydroclimatic, land‑use, and nutrient datasets from 2000–2022 and validated with 2023 observations.
The predictor variables included discharge, precipitation, temperature, land use, total Kjeldahl nitrogen (TKN), suspended sediment, and septic system density in the watersheds. Multiple modeling strategies were evaluated, including full‑feature, reduced‑feature (i.e., derived through importance thresholds or removal of collinear nutrient variables), and hyperparameter‑tuned configurations. Model performance was assessed using Nash–Sutcliffe Efficiency (NSE), R², and RMSE, and interpretability was evaluated through feature‑importance metrics and SHAP analyses.
Monthly Random Forest models effectively captured seasonal nutrient dynamics. Discharge consistently emerged as the dominant predictor of NO₃‑N loads, while interactions among variables, particularly TKN and suspended sediment, played major roles in predicting TP. Land use and septic system density exhibited limited predictive influence. Model performance was strong across configurations, with training NSE values exceeding 0.95 and validation NSE frequently above 0.9. However, reduced skill during summer and fall suggested the influence of unrepresented processes such as evapotranspiration. The most stable performance across sites and seasons was achieved with hyperparameter‑tuned, full‑feature models. SHAP analyses revealed clear linkages between hydrologic and biogeochemical processes, while Spearman correlation heatmaps highlighted strong covariation among nutrient loads and moderate coupling with climatic variables.
These results demonstrate the value of machine‑learning approaches such as RF as complementary alternatives to process‑based models like SWAT, offering robust tools for informing nutrient‑reduction strategies and supporting policy decisions in impaired agricultural watersheds.

How to cite: Getahun, E. and Kharosekar, R.: Data‑Driven Modeling of Nutrient Dynamics: Random Forest Predictions of Nitrate and Total Phosphorus Loads in Illinois, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16874, https://doi.org/10.5194/egusphere-egu26-16874, 2026.

A.16
|
EGU26-16441
Weihao Wang, Fei Dong, Xiaobo Liu, and Wenqi Peng

Climate change is increasing hydro-climatic variability and amplifying water quality risks, posing major challenges for operating large-scale inter-basin water transfer projects. Operators must make real-time decisions while accounting for multiple source waters that exhibit distinct and nonstationary quality characteristics, alongside persistent limitations of process-based models and data scarcity in upstream boundary conditions. To address these challenges, we propose a three-tier hybrid modeling framework that integrates machine learning (ML), process-based hydrodynamic–water quality simulation, and multi-objective optimization to enable coordinated regulation of water quantity and quality in the extended Eastern Route of the South-to-North Water Diversion Project (SNWD).

The framework is driven by continuous observations from monitoring stations distributed along the project route and is implemented as a three-level modeling cascade. Level 1 develops ML-based upstream boundary prediction models using Long Short-Term Memory (LSTM) networks to produce 7-day-ahead forecasts of key water quality variables for heterogeneous source waters (Yellow River water, diversion water, and local water). Forecast targets include CODMn (permanganate index), NH₃–N, total nitrogen (TN), total phosphorus (TP), and dissolved oxygen (DO), while pH is treated as a compliance constraint. This anticipatory component mitigates data scarcity and captures nonlinear inflow dynamics, providing actionable boundary conditions for downstream assessment. Level 2 constructs a mechanism–data fusion module that couples process-based hydrodynamic and water quality models with ML-based corrections informed by real-time monitoring. By assimilating monitoring observations together with future engineering operation plans and diversion demand assessments, the module simulates transport, mixing, and water quality evolution along the transfer route. Level 3 applies multi-objective optimization to generate rolling diversion schedules that balance water supply reliability against pollution risk under climate-stress scenarios. The optimizer outputs updated, implementable schedules as new data and near-term plans become available, supporting operational water management.

A spatio-temporal decoupling strategy is further introduced to separate source-specific variability from in-route transport processes, enabling interpretable attribution of observed water quality changes to different sources and facilitating targeted regulation across critical segments. Operational deployment demonstrates enhanced decision support: the 7-day predictive lead time enables proactive coordination of multi-source diversions, and the optimized rolling regulation reduces concentrations of the regulated indicators (CODMn, NH₃–N, TN, and TP) by approximately 9% while improving Water Quality Index (WQI) scores by about 11% at the critical control section DiSanDian. The proposed hybrid framework provides a scalable and transferable pathway for integrating AI with process-based understanding to improve water quality simulation and real-time management, contributing to climate adaptation and resilience strategies for complex water infrastructure systems.

How to cite: Wang, W., Dong, F., Liu, X., and Peng, W.: From Forecasting to Rolling Optimization: Real-Time Hybrid Modeling for Water Quantity–Quality Regulation in the SNWD Extended Eastern Route, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16441, https://doi.org/10.5194/egusphere-egu26-16441, 2026.

A.17
|
EGU26-4666
|
ECS
Xuan Zhang, Chenlu Cui, and Gaoxu Wang

Flood forecasting in small mountainous catchments is challenging due to strong nonlinearity in runoff generation and short hydrological response times, which are often inadequately represented by conceptual models. The Xin’anjiang (XAJ) model, although widely applied, relies on simplified process representations that limit its ability to capture complex flood dynamics. In contrast, data-driven approaches such as Long Short-Term Memory (LSTM) networks offer high predictive flexibility but suffer from limited physical interpretability. To bridge this gap, we propose an interpretable physics–data hybrid framework (XAJ–LSTM), in which an LSTM network dynamically corrects residuals from the XAJ model while explicitly incorporating physically meaningful state variables. Model interpretability is enhanced using SHapley Additive exPlanations (SHAP), which quantify the contribution of different inputs to flood predictions. The framework is evaluated using 15 flood events observed between 2015 and 2018 in the Qiaodong Village catchment, a representative small mountainous basin in China. The results indicate that the XAJ–LSTM hybrid model significantly outperforms the standalone physical model, improving the Nash–Sutcliffe Efficiency (NSE) from 0.55 to 0.77 and effectively correcting peak flow errors. Moreover, the integration of physical state variables, particularly soil moisture, is crucial for improving predictive accuracy, whereas adding redundant runoff components introduces noise and degrades model performance. SHAP analysis further confirms that antecedent observed discharge and XAJ-simulated discharge are the dominant drivers of the LSTM-based correction. Overall, this hybrid framework improves flood forecasting accuracy while enhancing interpretability, offering a promising approach for physically informed modeling in nonlinear, data-limited catchments.

How to cite: Zhang, X., Cui, C., and Wang, G.: A Physics–Data Hybrid Xin’anjiang Flood Forecasting Model Based on LSTM Residual Correction and SHAP Interpretability, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-4666, https://doi.org/10.5194/egusphere-egu26-4666, 2026.

A.18
|
EGU26-7108
Guang Han, Zhihua He, and He Sun

Traditional hydrological models have been widely applied to flood simulation across the globe, yet the accurate simulation of peak discharge remains a long-standing shortcoming of these models. Taking the upper reaches of the Dongjiang River and Beijiang River in South China as the study area, this study employed the Variable Infiltration Capacity (VIC) model to capture the peak discharge during flood events. The simulation period was divided into a calibration period (2011–2014) and a validation period 1 (2008–2010) before vegetation changes, as well as a validation period 2 (2015–2020) after vegetation changes. The results demonstrated that the VIC model exhibited good applicability in both the upper Dongjiang River and upper Beijiang River basins. For the upper Beijiang River basin, the Nash-Sutcliffe Efficiency (NSE) and Kling-Gupta Efficiency (KGE) values were both above 0.6 during the calibration period, while these values were close to 0.6 in both validation periods 1 and 2. However, the model consistently underestimated the peak discharge in all periods. To address this limitation, a machine learning approach was introduced by coupling the VIC model with the Bidirectional Long Short-Term Memory (Bi-LSTM) network. Specifically, the soil moisture content, grid-scale runoff simulated by the VIC model, and precipitation data were used as training inputs for the Bi-LSTM model. Meanwhile, the standalone VIC model and pure Bi-LSTM model were set as control groups for comparison. The results indicated that the coupled VIC-Bi-LSTM model outperformed the control groups in capturing both the runoff process and peak discharge in the two basins. During the calibration period, the NSE values of the coupled model reached 0.9, and remained above 0.7 in both validation periods. In addition, scenarios before and after vegetation changes were designed to analyze the performance of the VIC model in simulating runoff under varying underlying surface conditions. The results revealed that the VIC model could effectively capture the impacts of vegetation changes on runoff, with the NSE value in validation period 2 (post-vegetation change scenario) being close to that in validation period 1. Moreover, the coupling with Bi-LSTM enabled more precise simulation of runoff in the upper Dongjiang and Beijiang Rivers under the scenario of altered vegetation cover.

How to cite: Han, G., He, Z., and Sun, H.: Coupling Machine Learning with Physical Models to Improve Peak Flood Simulation under Vegetation and Rainstorm Variability, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7108, https://doi.org/10.5194/egusphere-egu26-7108, 2026.

A.19
|
EGU26-12440
|
ECS
Mario Soriano and Reed Maxwell

Transit time to baseflow refers to the amount of time between when a water parcel enters a catchment as precipitation and when it exits the system via discharge. It is a key concept that links climate variability, hydrological transport, and biogeochemical processes, with broad implications for both surface water and groundwater quality, resource sustainability, and vulnerability to climate change impacts. Transit time distributions can be inferred from spatially resolved time-series measurements of environmental tracer concentrations, but such observations are typically available only in a limited number of locations such as highly instrumented catchments. Across large regions, physically based numerical models have been shown to accurately describe transit time distributions when compared to tracer data, but these models often require extensive computational resources.

In this study, we examine machine learning approaches for efficient prediction of transit time, specifically investigating their spatial transferability across multiple large domains. We employ a continental scale physically based hydrologic model coupled with Lagrangian particle tracking to quantify transit time to baseflow metrics in four large river basins in the conterminous USA: Upper Colorado (290,000 sq km), Missouri (1,350,000 sq km), Upper Mississippi (490,000 sq km), and Ohio (420,000 sq km). We use results from the physically based model to train machine learning metamodels for predicting transit time metrics with multiple spatial aggregation units, with input predictors describing topography, climate, and geology. Functional input-output relationships learned by metamodels are assessed using model-agnostic explainability techniques and evaluated against theoretical physically based relationships. Spatial cross-validation frameworks are used to evaluate cross-domain predictive accuracy and characterize the influence of input data quantity and distribution similarity between training and target regions. Results from the analysis help elucidate the potential utility and limitations of machine learning metamodels for computationally efficient prediction of transit time metrics in data scarce regions.

How to cite: Soriano, M. and Maxwell, R.: Large-domain transferability of machine learning metamodels for predicting water transit time to baseflow, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12440, https://doi.org/10.5194/egusphere-egu26-12440, 2026.

A.20
|
EGU26-16160
|
ECS
Minjeong Cho, Minhyuk Jeung, Daeun Yun, Jiye Park, Gihun Bang, and Sang-Soo Baek

Fecal coliform bacteria are widely used as an indicator of fecal contamination and associated human-health risk in water. This study simulated fecal coliform dynamics in the Bonghwang River using the Soil and Water Assessment Tool (SWAT). The SWAT bacteria subroutine, which considers in-stream bacteria die-off only, was modified to include solar radiation-associated die-off and concurrent growth and die-off within streambed sediments. To address the computational burden of SWAT, a surrogate model was developed using outputs from the modified SWAT model. The surrogate model enabled rapid watershed-scale prediction of fecal contamination by simplifying computations while preserving the predictive accuracy of SWAT. Sensitivity analysis demonstrated that solar radiation is one of the most significant fate factors of fecal coliform. The modified SWAT model improved watershed-scale estimates of bacterial concentrations, while the surrogate model enabled efficient prediction and analysis across the watershed. Overall, this approach provides predictive and reliable information on fecal contamination and can support effective watershed management.

How to cite: Cho, M., Jeung, M., Yun, D., Park, J., Bang, G., and Baek, S.-S.: Simulating fecal coliform dynamics at the watershed scale using a modified SWAT model and surrogate model, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16160, https://doi.org/10.5194/egusphere-egu26-16160, 2026.

A.21
|
EGU26-16166
|
ECS
Gihun Bang, Minjeong Cho, Jiye Park, Daeun Yun, Minhyuk Jeung, Soobin Kim, and Sang-Soo Baek

With the advancement of industrialization, the number and quantity of hazardous chemical substances being released into the environment continue to increase. Currently, over 40,000 chemical substances are available for use in South Korea, with approximately 400 new chemicals imported and distributed annually. Among these, hazardous substances such as heavy metals and pesticides, when introduced into water systems through industrial activities, agricultural runoff, or accidental spills, pose significant risks to both environmental and human health. These substances are associated with various diseases, carcinogenic risks, and endocrine disruption, necessitating proactive management strategies. Existing water quality monitoring systems primarily function as reactive measures, focusing on incident detection rather than prevention. Although real-time monitoring methods can detect anomalies, they are limited in accurately predicting the transport pathways and concentration variations of hazardous chemicals. Moreover, environmental factors such as flow velocity, precipitation, and temperature significantly impact the dispersion process, which current monitoring approaches fail to adequately incorporate. To overcome these limitations, this study aims to develop a predictive system leveraging deep learning techniques for water pollution incident simulation and forecasting. This model integrates existing hydrodynamic and water quality models (e.g., EFDC, MIKE) with data-driven approaches to enhance predictive accuracy. Additionally, an augmented reality (AR)-based visualization system will be implemented to intuitively display pollutant dispersion and high-risk areas during water pollution incidents. AR devices such as HoloLens will be utilized to provide decision-makers, including environmental management agencies and local governments, with real-time analytical capabilities for rapid response. Furthermore, the system is designed to transition from reactive to preventive response strategies. By applying advanced algorithms, the system will automatically recommend priority response areas for emergency discharges and pollution containment measures. This study aims to enhance response capabilities to increasing water pollution incidents both domestically and internationally. By minimizing environmental and health impacts caused by hazardous chemicals, the proposed system is expected to contribute significantly to public safety. Furthermore, the integration of deep learning and augmented reality technologies represents a substantial advancement in environmental monitoring and predictive modeling.

How to cite: Bang, G., Cho, M., Park, J., Yun, D., Jeung, M., Kim, S., and Baek, S.-S.: Development of Simulation System for Water Quality Accident using Deep Learning and Visualization Through Augmented Reality, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16166, https://doi.org/10.5194/egusphere-egu26-16166, 2026.

Login failed. Please check your login data.