ITS1.15/NH13.1 | Natural Language Processing and Large Language Models in Geosciences, Natural Hazards and Hydrology
EDI PICO
Natural Language Processing and Large Language Models in Geosciences, Natural Hazards and Hydrology
Convener: Mariana Madruga de BritoECSECS | Co-conveners: Lina SteinECSECS, Gabriele Messori, Jens Klump
PICO
| Thu, 07 May, 16:15–18:00 (CEST)
 
PICO spot 4
Thu, 16:15
Recent advances in Large Language Models (LLMs) and Natural Language Processing (NLP) are rapidly changing geosciences research, offering new opportunities for knowledge discovery, data analysis, and real-time monitoring. At the same time, the increasing availability of digital text and image data—from scientific literature and newspaper articles to social media and historical archives—offers unprecedented opportunities to explore new data sources in geosciences research.

This session examines how geoscientists are using LLMs, NLP, and text-as-data approaches across various hydrology, natural hazards research, and the broader earth system sciences research fields. We invite contributions that showcase innovative uses of LLMs and NLP, discuss methodological challenges, or integrate text mining techniques into geoscientific workflows.

We particularly welcome submissions on topics including, but not limited to:
- Chatbots and AI assistants in geosciences
- Assessment of natural hazard impacts (e.g., floods, droughts, landslides, heatwaves, windstorms)
- Real-time disaster monitoring and early warning systems
- Evidence synthesis and literature mapping
- Public sentiment and perception analysis
- Policy tracking and narrative analysis
- Social media analyses
- Enhancement of metadata and data descriptions
- Automation of historical data rescue
- Integration of LLMs with remote sensing or image data
- Methodological challenges in using LLMs and NLP-based analyses, including bias, reproducibility, and interpretability

By sharing case studies, technical developments, and lessons learned, we aim to promote the effective use of these tools while also highlighting the challenges that newcomers may encounter, including issues with data coverage, quality control, and concerns about reproducibility. By sharing best practices, this session aims to inspire collaboration and innovation in harnessing LLMs, NLP, and text-as-data in geosciences.

PICO: Thu, 7 May, 16:15–18:00 | PICO spot 4

PICO presentations are given in a hybrid format supported by a Zoom meeting featuring on-site and virtual presentations. The button to access the Zoom meeting appears just before the time block starts.
Chairpersons: Mariana Madruga de Brito, Lina Stein, Gabriele Messori
16:15–16:20
16:20–16:22
|
PICO4.1
|
EGU26-13303
|
ECS
|
Highlight
|
On-site presentation
Gizem Ekinci, Koketso Molepo, Sebastian Willmann, Johanna Baehr, Kevin Sieck, Felix Oertel, Bianca Wentzel, Thomas Ludwig, Martin Bergemann, Jan Saynisch-Wagner, and Christopher Kadow
Large language models (LLMs) have the potential to transform how climate scientists interact with data by lowering technical barriers and enabling more intuitive analysis workflows. Building on previous demonstrations of LLM-assisted climate analysis, we present how FrevaGPT, an LLM-powered scientific assistant integrated into Freva - a climate data search and analysis platform- , supports climate scientists in their day-to-day data exploration and analysis. FrevaGPT interprets natural language queries and automatically generates traceable, editable, and reusable analysis scripts that can be executed within established scientific environments. It retrieves relevant datasets and literature, performs analyses, and visualises results, therefore allowing researchers to focus on scientific interpretation rather than coding intricacies. By leveraging a broad repository of climate observations and model output, FrevaGPT ensures transparent and reproducible workflows that adhere to best practices in climate research. It also integrates seamlessly into Jupyter-AI and, by making use of the Freva library, combines the code-generating capabilities of LLMs with contextual understanding of how to access relevant datasets on the HPC cluster. As a “co-pilot” for geoscientists, the system not only responds to explicit requests but also proactively suggests relevant climate modes, events, and next analytical steps, helping to uncover insights that might otherwise be overlooked. Practical use cases demonstrate how FrevaGPT assists with interactive exploratory analysis and hypothesis refinement across climate datasets of varying complexity. By embedding LLM-assisted natural language interaction into real-world climate research workflows, this work highlights methodological considerations and opportunities for enhancing scientific productivity, promoting broader adoption of NLP and AI tools among Earth system scientists. We provide scientific evaluation of FrevaGPT’s capability through a benchmark suite. A live demo will be presented and can be used by the audience to do real climate analysis on a high-performance computer with access to petabytes of Earth system data - starting with a simple prompt.
 

How to cite: Ekinci, G., Molepo, K., Willmann, S., Baehr, J., Sieck, K., Oertel, F., Wentzel, B., Ludwig, T., Bergemann, M., Saynisch-Wagner, J., and Kadow, C.: From Natural Language to Reproducible Climate Analysis: FrevaGPT in the Geosciences, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13303, https://doi.org/10.5194/egusphere-egu26-13303, 2026.

16:22–16:24
|
PICO4.2
|
EGU26-2901
|
ECS
|
On-site presentation
Yingjia Li, Feng Zhang, Xinpeng Yu, Shiruo Hu, and Jianshi Zhao

Deep learning hydrological modeling typically requires extensive expert knowledge in programming, model selection, and data engineering, creating a significant barrier to efficiency and scalability. To address this challenge, we propose HydroAIM, an agentic deep learning modeling system for hydrological time series forecasting based on Large Language Model (LLM). Built upon the Model Context Protocol (MCP) to ensure standardized tool integration and modular extensibility, this system orchestrates a collaborative architecture comprising four specialized agents: task analysis agent, data preprocessing agent, model building agent, and result presentation agent. Supported by a comprehensive internal template library and toolbox, these agents autonomously execute the modeling pipeline from raw data to final evaluation. We conducted extensive compatibility tests across various LLMs and performed rigorous ablation studies to validate the necessity of the components. Experimental evaluation on the CAMELS dataset demonstrates that HydroAIM can generate reliable, expert-level modeling code. Moreover, the deep learning models constructed by HydroAIM significantly comparable to the traditional process-based Sacramento Soil Moisture Accounting (SAC-SMA) model without human intervention. Furthermore, the system also exhibits strong capability in global modeling tasks, offering a robust and scalable solution for intelligent hydrological research.

How to cite: Li, Y., Zhang, F., Yu, X., Hu, S., and Zhao, J.: HydroAIM: LLM-based Agentic Intelligent Deep Learning Modeling for Hydrological Time Series Forecasting, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-2901, https://doi.org/10.5194/egusphere-egu26-2901, 2026.

16:24–16:26
|
PICO4.3
|
EGU26-2906
|
ECS
|
On-site presentation
Hammer: An Expert-Level Large Language Model for Hydro-Science and Engineering Balancing Domain Expertise and General Intelligence
(withdrawn)
Xinpeng Yu, Wenbo Shan, Yingjia Li, Shiruo Hu, Dingxiao Liu, Zhijun Zheng, Jing Liu, Wei Luo, Lizhi Wang, Bin Xu, and Jianshi Zhao
16:26–16:28
|
PICO4.4
|
EGU26-12821
|
On-site presentation
Mayssa Kchaou, Hernan Andres Gonzalez Gongora, Alicia Chimeno Sarabia, Francisco Doblas-Reyes, and Amanda Cardoso Duarte

LLMs can effectively simplify complex textual information, yet their application in scientific domains, particularly climate science, remains limited. Climate research relies on dense, technical documents such as assessment reports that are difficult to navigate for non-specialists and time-constrained experts. We have explored the development of a climate-aware LLM that enhances access to such materials by balancing conversational fluency with strict grounding in trustworthy geoscientific sources. In this research, we are studying the different methodologies to develop a climate-aware LLM, to create a model that bridges the gap between complex reports of experts and information. This climate-aware LLM is also envisioned as a foundational component for future, more advanced AI developments in the climate domain.

A major contribution of this work is the development of a curated, large-scale synthetic dataset designed to bridge the gap between LLMs and Earth science. We created a dataset by collecting and preprocessing a vast corpus of Copernicus publications and the Intergovernmental Panel on Climate Change (IPCC) reports, which served as the foundation for generating high-quality Question-Answering pairs. By employing various prompt engineering techniques, we ensured the data covers a wide range of Earth science topics and includes diverse question categories, such as open-ended, closed-ended, and freeform queries, among others. To ensure the practical utility of the model, we also implemented optimizations to reduce generation latency for real-world applications.

Moreover, we systematically evaluate multiple architectural approaches, including retrieval-augmented generation (RAG), retrieval-augmented fine-tuning (RAFT), and full fine-tuning, using a combination of standard semantic and lexical evaluation metrics, domain-specific climate benchmarks such as the ClimaQA Benchmark, and LLM-as-a-judge evaluations to compare model outputs.

How to cite: Kchaou, M., Gonzalez Gongora, H. A., Chimeno Sarabia, A., Doblas-Reyes, F., and Cardoso Duarte, A.: Toward a Climate-Aware Large Language Model: A Comparative Study of Methodologies for Source-Grounded  Large Language Models, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12821, https://doi.org/10.5194/egusphere-egu26-12821, 2026.

16:28–16:30
|
PICO4.5
|
EGU26-5757
|
ECS
|
On-site presentation
Bing Xu and Alexander Brenning

Social media can provide rapid on-site information that helps to improve situational awareness in disaster response. Nevertheless, social media posts often provide imprecise or ambiguous location information (e.g., toponyms), leaving the exact location within the referenced area highly uncertain. In addition, the actual event time may deviate from the posting time. Existing toponym-based geocoding approaches typically reduce a place name to a single representative point, which is insufficient to capture within-area spatial uncertainty and to integrate heterogeneous evidence.

We propose an uncertainty-aware spatiotemporal inference framework that fuses geographic factors with multimodal social media information to estimate both the most likely event location and occurrence date, using landslides as an event type with topographic and hydro-climatic location and time constraints. The framework is evaluated using landslide-related social media posts monitored by the Global Landslide Detector in the contiguous United States. First, toponyms extracted from posts are geocoded into candidate geometries that constrain the spatial search domain. Second, we build a spatial probability map by combining a landslide susceptibility raster representing topographic constraints with image-derived semantic cues. CLIP is used to detect roads and water bodies from post images, which adaptively weight road/river buffer zones before normalization. Third, within a time window before the post date, we extract PRISM daily precipitation series as a hydro-climatic constraint, and fuse it with the spatial probability to form a joint spatiotemporal score. The framework outputs (i) a spatial probability map and (ii) the most likely occurrence date.

We evaluate the method using posts with manually annotated coordinates and assess map quality using the Percentile Rank (PR) of the ground-truth pixel, among other metrics. Preliminary results indicate that incorporating road–water features with image-driven semantic modulation consistently concentrates the true landslide location into smaller high-probability areas and yields event-time estimates consistent with rainfall-triggering processes. This provides an uncertainty-aware transferable framework for rapid, social-media-driven event localization and verification for event types with geographic constraints.

How to cite: Xu, B. and Brenning, A.: Uncertainty-aware Spatiotemporal Inference of Landslide Events by Fusing Multimodal Social Media Information with Geographic Features, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5757, https://doi.org/10.5194/egusphere-egu26-5757, 2026.

16:30–16:32
|
PICO4.6
|
EGU26-7361
|
On-site presentation
Elisabetta Napolitano, Silvia Peruccacci, Massimo Melillo, Stefano Luigi Gariano, and Maria Teresa Brunetti

Reliable forecasting of rainfall-induced landslides requires historical data collected in structured and well-documented catalogues. However, scarce and inaccurate information on the timing and location of the failures often leads to high uncertainty in predictions. When properly trained, Artificial Intelligence (AI) can significantly accelerate data collection and processing, enabling the interpretation of large volumes of information much faster than traditional manual approaches.

We developed an AI-based two-step procedure for the automatic extraction of spatial and temporal information on rainfall-induced landslides from textual online documents. The procedure is a prompt-engineered framework, which uses Large Language Models (LLMs) and Natural Language Processing (NLP). Starting from Google Alert-filtered news on landslides, the framework integrates two-step procedure optimization for: (1) date/time attribution, (2) geolocation by combining LLM interpretative capacity with OpenStreetMap API. The output is useful for building or updating landslides catalogues, such as the ITAlian rainfall-induced LandslIdes CAtalogue (ITALICA, Peruccacci et al., 2023; Brunetti et al., 2025). This approach represents a significant advancement over traditional manual extraction of landslide information from news sources that is affected by several limitations: (1) processing of hundreds of news articles is time-consuming, complex, and highly demanding; (2) manual procedures are prone to bias and error, reducing data objectivity, reliability, and reproducibility. Moreover, (3) the heterogeneity of information sources hampers the production of standardized outputs limiting the integration into national or regional landslide catalogues. These limitations are particularly critical in operational contexts where rapid data integration is required for improving catalogue completeness, calibrating rainfall thresholds, and validating landslides early warning systems. Recent advances have partially addressed these challenges through rigorous methodologies involving multiple trained expert operators and double-validation processes (Peruccacci et al., 2023; Brunetti et al., 2025). Although expert validation remains crucial, this approach supports the reliability and objectivity of hazard modeling and prediction, contributing to global landslide research and risk reduction.

This contribution is part of the AI-PERIL (AI-Powered Extraction of Rainfall-Induced Landslide Information) project, which is supported by the International Consortium on Landslides (ICL).

 

References:

Brunetti, M.T., Gariano, S.L., Melillo, M., Rossi, M., and Peruccacci, S.: An enhanced rainfall-induced landslide catalogue in Italy. Scientific Data, 12, 216, https://doi.org/10.1038/s41597-025-04551-6, 2025

Peruccacci, S., Gariano, S. L., Melillo, M., Solimano, M., Guzzetti, F., and Brunetti, M. T.: The ITAlian rainfall-induced LandslIdes CAtalogue, an extensive and accurate spatio-temporal catalogue of rainfall-induced landslides in Italy. Earth System Science Data, 15, 2863–2877, https://doi.org/10.5194/essd-15-2863-2023, 2023.

How to cite: Napolitano, E., Peruccacci, S., Melillo, M., Gariano, S. L., and Brunetti, M. T.: Extraction of spatial and temporal landslide information using AI, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7361, https://doi.org/10.5194/egusphere-egu26-7361, 2026.

16:32–16:34
|
PICO4.7
|
EGU26-9099
|
ECS
|
On-site presentation
Shibo Cui, Ni Li, and Jianshi Zhao

China is among the countries most severely affected by flood disasters worldwide, and many studies estimate that China accounts for the largest share of global direct economic flood losses. However, a long-term, comprehensive and open database on flood disaster impacts in China has been lacking. In this study, we construct the China Flood Disaster Impacts Database (CFDID, 1949–2023) based on more than 80 official Chinese disaster yearbooks, using optical character recognition (OCR) and large language model (LLM) techniques for data extraction and structuring. The database contains over 15,000 flood disaster events from 1949 to 2023, covering five major flood types and 11 impact indicators. The direct economic losses recorded in CFDID account for more than 70% of the officially reported national flood losses (1991-2023), indicating a high degree of coverage and representativeness. CFDID provides a solid data foundation for future research on flood risk, impacts and adaptation in China. Moreover, the data collection framework developed in this study can also be extended to other countries and regions.

How to cite: Cui, S., Li, N., and Zhao, J.: CFDID v1.0: A China Flood Disaster Impacts Database (1949-2023), EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9099, https://doi.org/10.5194/egusphere-egu26-9099, 2026.

16:34–16:36
|
PICO4.8
|
EGU26-18740
|
ECS
|
On-site presentation
Daniel Pardo-García, Francisco Pastor, and Samira Khodayar

Mediterranean tropical-like cyclones, known as medicanes, are among the most damaging and socio-economically disruptive weather phenomena in the region. While their physical characteristics have been increasingly investigated, a comprehensive and systematic assessment of their societal and economic impacts remains limited, largely due to the fragmented and heterogeneous nature of impact information. 

To address this gap, we present an automated, AI-based framework to detect, classify, and monitor the socio-economic impacts associated with medicanes using unstructured textual data from diverse sources, including news articles, media reports, and documentation from international agencies. The methodology follows a two-stage workflow. First, event-related texts are identified through an advanced filtering procedure combining geographical constraints, temporal consistency, topic relevance, and keyword-based selection. Second, state-of-the-art Natural Language Processing (NLP) and Machine Learning (ML) techniques are applied to extract, classify, and quantify reported hazards and impacts across multiple sectors, such as infrastructure, population, economic activities, and emergency response. 

By integrating NLP and ML methods with geolocation tools, the framework enables the automated spatio-temporal mapping of medicane related hazards and damages, substantially reducing subjectivity and dependence on manual post-event assessments. The approach demonstrates that news-based and other textual sources can serve as consistent, scalable, and near-real-time indicators of the socio-economic consequences of complex multi-hazard events such as medicanes.

This work provides, to our knowledge, the first systematic and reproducible methodology to quantify the socio-economic footprint of Mediterranean cyclones using text-as-data approaches. The results highlight the potential of NLP-based impact detection to complement traditional hazard-focused analyses and to support integrated risk assessment, climate services, and disaster risk reduction strategies in the Mediterranean region. 

How to cite: Pardo-García, D., Pastor, F., and Khodayar, S.: Automated spatio-temporal detection of medicane hazards and socio-economic impacts from news-based data using machine learning , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18740, https://doi.org/10.5194/egusphere-egu26-18740, 2026.

16:36–16:38
|
PICO4.9
|
EGU26-17783
|
On-site presentation
Michele Ronco, Luca Bandelli, Lorenzo Bertolini, Sergio Consoli, Damien Delforge, Daria Mihaila, Alessio Spadaro, Marco Verile, and Christina Corbane

We explore the use of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) to extract, structure, and analyze disaster information from multilingual news sources. Using over 3,000 events from the Emergency Events Database (EM-DAT, 2014–2024), we process Europe Media Monitor (EMM) news to generate structured disaster storylines and knowledge graphs that capture complex interactions among hazards, impacts, and responses—details often missing from traditional datasets. RAG enables the construction of coherent narratives detailing hazard characteristics, affected regions, fatalities, and economic losses, complementing conventional approaches such as remote sensing with richer contextual information. These structured outputs support retrospective analysis, multi-hazard risk assessment, and decision-making for disaster management. In line with the FAIR (Findable, Accessible, Interoperable and Reusable) principles, all workflows are openly accessible via an interactive exploration dashboard, and the data generated are made available through the Joint Research Data Catalogue. This study illustrates how LLMs and NLP can transform unstructured reporting into organized, reusable formats, enhancing situational awareness, early warning, and operational planning. It highlights both the opportunities and methodological considerations—including automation, reproducibility, and integration with existing hazard monitoring systems—demonstrating the potential of text-as-data approaches for advancing natural hazard research in geosciences

How to cite: Ronco, M., Bandelli, L., Bertolini, L., Consoli, S., Delforge, D., Mihaila, D., Spadaro, A., Verile, M., and Corbane, C.: Turning Global News into Disaster Insights: Large Language Models and Knowledge Graphs for Multi-Hazard Analysis, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-17783, https://doi.org/10.5194/egusphere-egu26-17783, 2026.

16:38–16:40
|
PICO4.10
|
EGU26-17978
|
On-site presentation
Eduardo Rico Carranza

Social media and consumer product portals have successfully leveraged data analytics to match users with products, friends, or information, having a significant impact on lifestyle, economy, and politics. Central to these systems is the structured storage of heterogeneous data and the use of bespoke algorithms to enable context-specific search, ranking, and retrieval. This represents a potential opportunity for spatial planning and policy-making: can similar technologies be repurposed to support evidence-based policy-making and ecological management in rural landscapes?

We present LandMatch, an AI-based framework designed to support policymakers and agribusinesses in identifying partnerships, investment opportunities, and intervention strategies that jointly address economic performance and ecological sustainability in the UK countryside. LandMatch draws on techniques from social media analytics, information retrieval, and graph-based modelling, building a Spatial Knowledge Graph (SKG). It uses Large Language Models (LLMs) to summarise and structure this information into a form suitable for large-scale analysis and semantic retrieval. The spatial dimension of its graph structure enables analyses and recommendations that reflect both functional similarity and landscape-level ecological processes.

We have developed a prototype for LandMatch in the context of Chichester, West Sussex (UK). Through a series of tests, we demonstrate the feasibility of combining text-based retrieval augmented generation (RAG), automated data collection through web scraping and semantic mapping, as well as large-scale clustering and spatial graph analytics. Our work ultimately highlights a new approach to integrating social, economic, and geospatial data on a robust, interpretable, and design-ready platform.

How to cite: Rico Carranza, E.: LandMatch: Using LLMs and social media algorithms to spatial planning, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-17978, https://doi.org/10.5194/egusphere-egu26-17978, 2026.

16:40–16:42
|
PICO4.11
|
EGU26-19595
|
ECS
|
On-site presentation
Taís Maria Nunes Carvalho, Jingxian Wang, Ana Maria Rotaru, Gabriela C. Gesualdo, Luca Severino, Laura Hasbini, and Mariana Madruga de Brito

Understanding how disasters impact communities and how humanitarian organisations respond is essential for improving disaster preparedness, response, and policy. However, humanitarian organizations, government agencies and scientific institutions often report on disaster impacts and response in unstructured narrative reports, limiting its accessibility for systematic analysis. In this study, we developed a data-driven pipeline to extract and classify impact and response information from the International Federation of Red Cross and Red Crescent Societies (IFRC) disaster appeals and operational reports. We processed the text into clean sentences and manually annotated a stratified set of reports, covering different climate hazard types. Sentences were labelled as reporting impacts, reporting response measures, or neither, and those describing impacts or responses were further categorised into a taxonomy of 24 impact subclasses and 26 response subclasses. Annotations were used to train four text classification models for detecting and classifying impact- and response-related sentences. Our approach demonstrates the feasibility of automatically extracting structured disaster impact and response data from humanitarian narrative reports, enabling large-scale analytics and supporting evidence-based disaster management.

How to cite: Nunes Carvalho, T. M., Wang, J., Rotaru, A. M., Gesualdo, G. C., Severino, L., Hasbini, L., and de Brito, M. M.: Framing impact, shaping response: Linking affectedness and action in humanitarian practice, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19595, https://doi.org/10.5194/egusphere-egu26-19595, 2026.

16:42–16:44
|
EGU26-22037
|
Virtual presentation
Md Adilur Rahim

Recent advances in large language models (LLMs) are transforming how geoscientists interact with data, models, and decision-support systems. Beyond literature web search and text processing, LLMs now enable new forms of knowledge discovery, real-time analysis, and human–AI collaboration in natural hazards and climate-risk research. At the same time, the increasing availability of geospatial data, remote sensing images, and model outputs creates both opportunities and challenges for integrating text-as-data approaches into operational geoscientific workflows.

We present a set of applied case studies demonstrating how LLM-driven assistant agents can be embedded into geoscientific systems to support flood risk assessment, hazard communication, and mitigation planning and decision. The demonstrated system integrates LLM agents with hydrodynamic models (HEC-RAS), geospatial flood and exposure datasets, a building-scale digital twin, and policy and planning documents such as the Louisiana State Hazard Mitigation Plan. Through a conversational interface, users can query flood risks, building exposure, mitigation scenarios, etc., while the LLM agent orchestrates model execution, data retrieval, and insights synthesis.

These case studies illustrate how LLMs can translate heterogeneous data sources into interpretable, policy-relevant information for practitioners and communities. In addition to demonstrating capabilities, we discuss methodological challenges related to reproducibility, transparency, and bias when deploying LLMs in hazard and hydrology applications, including issues of data provenance, prompt sensitivity, and model-driven interpretation. By sharing practical lessons learned from demonstrations in coastal Louisiana, this contribution highlights both the promise and limitations of using LLM agents as geoscientific assistants for real-time disaster monitoring, risk assessment, and decision support.

How to cite: Rahim, M. A.: Risk to Resilience: LLM-Driven Agentic AI for Natural Hazard Assessment and Decision Support, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-22037, https://doi.org/10.5194/egusphere-egu26-22037, 2026.

16:44–16:46
|
PICO4.12
|
EGU26-11560
|
ECS
|
On-site presentation
Zixin Hu, Andrea Cominola, and Heidi Kreibich

With millions of people exposed globally, riverine floods are one of the major natural hazards worldwide, resulting in a direct average annual loss of US$ 104 billion and 7 million fatalities in the twentieth century. Amidst increasing calls for accelerating climate adaptation, including the recent UNEP report, a pivotal question remains: what are the status, effectiveness, and potential of adaptation efforts to reduce future flood risks? National adaptation plans play a central role in climate risk governance by driving adaptation, yet their length and heterogeneity in language, content organization, and format pose challenges to a systematic and scalable comparison across countries. Extracting structured information from these plans requires advanced methods from natural language processing (NLP) and machine learning.

We first compile a dataset including national flood plans from different countries worldwide using a hybrid information retrieval strategy that integrate manual keyword search, GPT-5.1–assisted queries, community engagement through surveys and direct outreach, and manual validation. Building on this dataset, we implement a language model-based workflow for topic modelling and content analysis. Our workflow combines text preprocessing, embedding, and a guided topic modelling step that incorporates 18 predefined categories of flood adaptation measures from the EU Floods Directive, such as emergency response planning and water flow regulation. Our approach enables structured analysis of flood adaptation plans, mapping of measure diversity and prevalence across countries and regions, and identification of correlations with hazard characteristics, damages, and economic indicators. In addition, our workflow supports the detection of emerging or overlooked adaptation measures.

How to cite: Hu, Z., Cominola, A., and Kreibich, H.: Leveraging Large Language Models for Global Assessment of National Flood Adaptation Plans, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11560, https://doi.org/10.5194/egusphere-egu26-11560, 2026.

16:46–16:48
|
PICO4.13
|
EGU26-14947
|
ECS
|
On-site presentation
Isabela Burattini Freire, Mariana Madruga de Brito, and Taís Maria Nunes Carvalho

Principles of justice and equity in climate impacts research are widely recognized as essential for the legitimacy and effectiveness of international climate agreements. Yet, quantitative evidence on global imbalances in climate knowledge production remains limited. In this study, we leverage recent advances in Natural Language Processing to provide a large-scale, data-driven assessment of global inequalities in climate impacts research, with particular focus on disparities between the Global North and the Global South, as well as differences across country income groups as defined by the World Bank’s gross national income–based classification. We compile a dataset of over 40,000 open- and closed-access scientific publications from OpenAlex related to the thematic scope of IPCC Working Group II on societal impacts, vulnerability, and adaptation. The relevance of publications within our database is identified using a machine-learning pipeline. Building on the relevant articles, we analyze global co-authorship networks to identify key research hubs, bridges, and communities across countries and regions. Our preliminary results show that climate impacts’ research is predominantly led by high-income countries, which dominate the top ten global research hubs and account for more than 60% of total authorships. Research communities exhibit strong geographic clustering, with countries collaborating more intensively with continental neighbors. However, high-income countries play a disproportionate intermediary role in global collaboration networks: despite its geographic distance, the United Kingdom intermediates twice as many scientific collaborations within the African climate impacts research community as South Africa. We further quantify structural inequalities in collaboration using temporal homophily measures in co-authorship networks. While cross-income and North–South collaborations have increased over time, income-based homophily remains stable once research productivity is accounted for, indicating that high-income countries continue to preferentially co-author with one another. This suggests that increased connectivity has not translated into more equitable research output. By using NLP-based literature mapping and network analysis, this work highlights their combined potential for diagnosing structural biases in climate change knowledge production. Our findings aim to provide empirical evidence to support more equitable research collaborations, and more coherent international climate change policy frameworks.

How to cite: Burattini Freire, I., Madruga de Brito, M., and Nunes Carvalho, T. M.: Who shapes climate impacts research? An NLP-based network analysis of global hubs and bridges, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14947, https://doi.org/10.5194/egusphere-egu26-14947, 2026.

16:48–16:50
|
PICO4.14
|
EGU26-8582
|
On-site presentation
Samuel Park, David J. Yu, Hoon C. Shin, Changdeok Gim, and Jeryang Park

Effective flood management requires coordination across fragmented governance clusters, yet the institutional interdependencies connecting these clusters often remain hidden within complicated, multi-layered policy documents. This study develops an integrated analytical framework to identify two distinct types of network vulnerabilities: weak ties—critical existing connections bridging otherwise disconnected clusters—and structural holes—absent relationships whose creation would most effectively improve system integration. We extracted institutional relationships from Korean water governance documents using a rule-based text analysis approach and constructed a directed network representing actors and infrastructure components. Network analysis methods were applied to detect governance clusters and quantify both existing bridges between clusters and potential new connections that would reduce network fragmentation. Our findings reveal complementary vulnerability patterns. Weak ties in Korea's governance system function as critical linkages through central coordinating authorities, connecting national policy-making bodies with local implementation units. This concentration creates critical dependency on few coordination channels. Structural hole analysis uncovered different leverage points: emergency response actors, despite peripheral formal positions, occupy strategic locations where new institutional linkages would most effectively enhance integration across governance domains. The distinction between weak ties and structural holes proves essential for intervention design: existing weak connections require strengthening through resource allocation and protocol clarification, while structural holes demand institutional transformation to create entirely new coordination pathways. This dual diagnostic approach provides a transferable framework for enhancing flood resilience across diverse water governance contexts.

 

Acknowledgement

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Ministry of Science and Technology (RS-2024-00356786).

How to cite: Park, S., Yu, D. J., Shin, H. C., Gim, C., and Park, J.: Uncovering the Overlooked: Exploring Structural Holes to Enhance Urban Flood Resilience in Institutional Networks, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8582, https://doi.org/10.5194/egusphere-egu26-8582, 2026.

16:50–18:00
Please check your login data.