Climate change, land-use change, and increasing socio-economic pressures are reshaping water and environmental systems, while the volume and heterogeneity of available data—from in situ observations to reanalysis products, remote sensing, and citizen-generated sources—continue to grow. Machine learning (ML) has become an important component of hydro-environmental modelling for forecasting, classification, and pattern discovery. However, in practice, many ML applications remain highly case-specific and dependent on implicit expert decisions related to problem formulation, predictor selection, validation design, and interpretation, which are rarely made explicit or transferable across regions and users.
This contribution presents a human-in-the-loop hybrid intelligence framework that integrates ML workflows with Large Language Models (LLMs) to support structured reasoning during environmental model development and evaluation. Rather than using LLMs for automated optimisation or model selection, the framework positions them as a guidance and scaffolding layer that helps make modelling assumptions, choices, and limitations explicit and traceable, while retaining expert control over all final decisions.
Methodologically, the framework combines (i) hands-on ML pipelines, ranging from baseline statistical models to more advanced learning algorithms for forecasting and classification, and (ii) an LLM-based guidance layer that structures expert reasoning through prompts, checklists, and decision logs. This guidance supports key stages of the modelling process, including the definition of modelling objectives, assessment of data quality, selection of environmentally meaningful predictors, and the design of validation strategies. Particular emphasis is placed on encouraging validation schemes that account for temporal dependence and spatial heterogeneity, such as blocked or spatial cross-validation, rather than default random data splits.
The framework is currently being developed and iteratively evaluated through expert-led case studies using real hydro-environmental datasets, rather than through formal classroom deployment. Initial applications focus on groundwater level analysis and hydro-environmental forecasting problems in Greece, including collaborative work in Crete, where the framework has been used to structure modelling choices and interpret model behaviour under non-stationary conditions. Additional exploratory applications using existing datasets have been used to stress-test the transferability of the workflow across contrasting environmental settings. Ongoing extensions include the application of the framework within coastal erosion modelling activities currently being developed in Colombia.
The LLM layer supports explicit reasoning about why a model performs well or poorly under specific conditions, how assumptions propagate into uncertainty, and where data-driven learning diverges from physical expectations. This reflective use of hybrid intelligence helps expose failure modes and modelling sensitivities that are often hidden in automated pipelines.
Results from the expert-led evaluations indicate that the proposed framework improves the transparency and reproducibility of modelling decisions, facilitates comparison across case studies, and supports more consistent interpretation of ML results across regions and scales. At the same time, the approach lowers the entry barrier for non-specialists without removing expert oversight or domain judgement.
The framework is being developed within the context of the Erasmus+ AI-LEARN project (Project reference: 2025-1-NL01-KA220-HED-000355215), where it serves as a methodological backbone for future training and capacity-building activities in water and environmental intelligence.