SC2.28 | How to feed Machine Learning algorithms with proper metrics?
How to feed Machine Learning algorithms with proper metrics?
Co-organized by CR8/ESSI6/HS11
Convener: Kai Hartmann | Co-convener: Annette RudolphECSECS

The majority of multivariate statistics and machine learning algorithms expect Euclidean metrics on unconstrained data spaces. On the other hand, most variables in geosciences are strictly positive and capped by physical constraints, which leads to pointless arithmetic measures. Disobeying these constraints may obscure meaningful patterns, produce spurious correlations, or senseless measures of model quality. Within this short course, useful recipes to overcome common pitfalls in multivariate statistics and machine learning for (a) common physically constrained and (b) compositional data spaces will be presented with hands-on examples.

The course is structured into four topics:
a) Why are common metrics meaningless in constrained data spaces?
b) Challenges of modeling physical extremes
c) Basic recipes for physically constrained data spaces 
d) Meaningful transformation for compositional data

This course is held interactively with interdisciplinary hands-on experience. Advanced statistical/mathematical knowledge is not mandatory, but bringing your own laptop with R, Python, or Matlab environment will help to follow the presented recipes and exercises!

Please check your login data.