Paper ID: 2209.09030

SMIXS: Novel efficient algorithm for non-parametric mixture regression-based clustering

Peter Mlakar, Tapio Nummi, Polona Oblak, Jana Faganeli Pucer

We investigate a novel non-parametric regression-based clustering algorithm for longitudinal data analysis. Combining natural cubic splines with Gaussian mixture models (GMM), the algorithm can produce smooth cluster means that describe the underlying data well. However, there are some shortcomings in the algorithm: high computational complexity in the parameter estimation procedure and a numerically unstable variance estimator. Therefore, to further increase the usability of the method, we incorporated approaches to reduce its computational complexity, we developed a new, more stable variance estimator, and we developed a new smoothing parameter estimation procedure. We show that the developed algorithm, SMIXS, performs better than GMM on a synthetic dataset in terms of clustering and regression performance. We demonstrate the impact of the computational speed-ups, which we formally prove in the new framework. Finally, we perform a case study by using SMIXS to cluster vertical atmospheric measurements to determine different weather regimes.

Submitted: Sep 19, 2022