ESSI2.4 | Data compression and reduction for Earth System Sciences datasets
Data compression and reduction for Earth System Sciences datasets
Co-organized by CR6/GD12/GI2/GMPV12/NP4/PS7/SM9/SSS10/TS10
Convener: Juniper TyreeECSECS | Co-conveners: Sara Faghih-NainiECSECS, Clément BouvierECSECS, Oriol TintoECSECS

Earth System Sciences (ESS) datasets, particularly those generated by high-resolution numerical models, are continuing to increase in terms of resolution and size. These datasets are essential for advancing ESS, supporting critical activities such as climate change policymaking, weather forecasting in the face of increasingly frequent natural disasters, and modern applications like machine learning.

The storage, usability, transfer and shareability of such datasets have become a pressing concern within the scientific community. State-of-the-art applications now produce outputs so large that even the most advanced data centres and infrastructures struggle not only to store them but also to ensure their usability and processability, including by downstream machine learning. Ongoing and upcoming community initiatives, such as digital twins and the 7th Phase of the Coupled Model Intercomparison Project (CMIP7), are already pushing infrastructures to their limits. With future investment in hardware likely to remain constrained, a critical and viable way forward is to explore (lossy) data compression & reduction that balance efficiency with the needs of diverse stakeholders. Therefore, the interest in compression has grown as a means to 1) make the data volumes more manageable, 2) reduce transfer times and computational costs, while 3) preserving the quality required for downstream scientific analyses.

Nevertheless, many ESS researchers remain cautious about lossy compression, concerned that critical information or features may be lost for specific downstream applications. Identifying these use-case-specific requirements and ensuring they are preserved during compression are essential steps toward building trust so that compression can become widely adopted across the community.

This session will present and discuss recent advances in data compression and reduction for ESS datasets, focusing on:

1) Advances in and reviews of methods, including classical, learning-based, and hybrid approaches, with attention to computational efficiency of compression and decompression.
2) Approaches to enhance shareability and processing of high-volume ESS datasets through data compression (lossless and lossy) and reduction.
3) Inter-disciplinary case studies of compression in ESS workflows.
4) Understanding the domain- and use-case specific requirements, and developing methods that provide these guarantees for lossy compression.

Please check your login data.