Articles | Volume 2, issue 2
12 Oct 2016
 | 12 Oct 2016

Mixture model-based atmospheric air mass classification: a probabilistic view of thermodynamic profiles

Jérôme Pernin, Mathieu Vrac, Cyril Crevoisier, and Alain Chédin

Abstract. Air mass classification has become an important area in synoptic climatology, simplifying the complexity of the atmosphere by dividing the atmosphere into discrete similar thermodynamic patterns. However, the constant growth of atmospheric databases in both size and complexity implies the need to develop new adaptive classifications. Here, we propose a robust unsupervised and supervised classification methodology of a large thermodynamic dataset, on a global scale and over several years, into discrete air mass groups homogeneous in both temperature and humidity that also provides underlying probability laws. Temperature and humidity at different pressure levels are aggregated into a set of cumulative distribution function (CDF) values instead of classical ones. The method is based on a Gaussian mixture model and uses the expectation–maximization (EM) algorithm to estimate the parameters of the mixture. Spatially gridded thermodynamic profiles come from ECMWF reanalyses spanning the period 2000–2009. Different aspects are investigated, such as the sensitivity of the classification process to both temporal and spatial samplings of the training dataset. Comparisons of the classifications made either by the EM algorithm or by the widely used k-means algorithm show that the former can be viewed as a generalization of the latter. Moreover, the EM algorithm delivers, for each observation, the probabilities of belonging to each class, as well as the associated uncertainty. Finally, a decision tree is proposed as a tool for interpreting the different classes, highlighting the relative importance of temperature and humidity in the classification process.

Short summary
Here, we propose a classification methodology of various space-time atmospheric datasets into discrete air mass groups homogeneous in temperature and humidity through a probabilistic point of view: both the classification process and the data are probabilistic. Unlike conventional classification algorithms, this methodology provides the probability of belonging to each class as well as the corresponding uncertainty, which can be used in various applications.