Effective volcano monitoring relies on the timely detection and correct classification of diverse, time-dependent geophysical signals associated with magmatic and hydrothermal processes, including volcanic tremor, long-period and volcano-tectonic earthquakes, deformation transients, gas release, and thermal anomalies. Artificial Intelligence and Machine Learning (AI/ML) methods have emerged as powerful tools to automate event detection, classification, and forecasting in operational volcano observatories. Consequently, the number of peer-reviewed studies applying AI/ML to volcano monitoring has increased exponentially in the past decade.
Despite this rapid development, we suggest that the effective operational uptake of AI/ML in volcano monitoring remains limited due to four structural challenges. First, the lack of standardised, community-accepted benchmarking datasets and evaluation protocols prevents meaningful comparison of algorithm performance across studies, volcanoes, and datatypes. Second, differing implementation, training, and testing practices limit reproducibility. Third, many AI/ML-based monitoring methods remain deterministic, with limited or no uncertainty quantification. This favours overconfident models and complicates their integration into probabilistic, risk-based decision frameworks that are central to operational volcanology. Finally, the relative novelty of AI/ML in volcano monitoring has resulted in a fragmented research landscape with limited coordinated community infrastructure.
We propose a community-driven initiative to address these limitations through the design of a modular, open validation framework for AI/ML methods in volcano monitoring. The framework should integrate curated, benchmark-quality multi-parameter datasets that capture real-world variability in volcanic behaviour. Standardised training, testing, and evaluation protocols will enable fair, transparent, and reproducible comparison of both classical and emerging AI/ML approaches and the inclusion of uncertainty quantification, allowing performance to be assessed not only in terms of accuracy but also in terms of reliability and decision relevance.
By establishing shared benchmarks and open evaluation practices, we aim to accelerate methodological development, improve reproducibility, and support the responsible transfer of AI/ML tools into operational volcano observatories. We will present a prototype as a starting point and invitation to the volcanological and data science communities to help design and implement this validation framework.