SC2.5 | Data compression and reduction for Earth System Sciences datasets in practice
Data compression and reduction for Earth System Sciences datasets in practice
Co-organized by AS6/CL6/ESSI6/GI2/GM11/HS11/NP9
Convener: Juniper TyreeECSECS | Co-conveners: Sara Faghih-NainiECSECS, Clément BouvierECSECS, Oriol TintoECSECS

Earth System Sciences (ESS) datasets, particularly those generated by high-resolution numerical models, are continuing to increase in terms of resolution and size. These datasets are essential for advancing ESS, supporting critical activities such as climate change policymaking, weather forecasting in the face of increasingly frequent natural disasters, and modern applications like machine learning.

The storage, usability, transfer and shareability of such datasets have become a pressing concern within the scientific community. State-of-the-art applications now produce outputs so large that even the most advanced data centres and infrastructures struggle not only to store them but also to ensure their usability and processability, including by downstream machine learning. Ongoing and upcoming community initiatives, such as digital twins and the 7th Phase of the Coupled Model Intercomparison Project (CMIP7), are already pushing infrastructures to their limits. With future investment in hardware likely to remain constrained, a critical and viable way forward is to explore (lossy) data compression & reduction that balance efficiency with the needs of diverse stakeholders. Therefore, the interest in compression has grown as a means to 1) make the data volumes more manageable, 2) reduce transfer times and computational costs, while 3) preserving the quality required for downstream scientific analyses.

Nevertheless, many ESS researchers remain cautious about lossy compression, concerned that critical information or features may be lost for specific downstream applications. Identifying these use-case-specific requirements and ensuring they are preserved during compression are essential steps toward building trust so that compression can become widely adopted across the community.

This short course is designed as a practical introduction to compressing ESS datasets using various compression frameworks and to share tips on preserving important data properties throughout the compression process. After completing the hands-on exercises, using either your own or provided data, time will be set aside for debate and discussion to address questions about lossy compression and to exchange wishes and concerns regarding this family of methods. A short document summarising the discussion will be produced and made freely available afterwards.

Please check your login data.