Paper ID: 2404.17940

CBMAP: Clustering-based manifold approximation and projection for dimensionality reduction

Berat Dogan

Dimensionality reduction methods are employed to decrease data dimensionality, either to enhance machine learning performance or to facilitate data visualization in two or three-dimensional spaces. These methods typically fall into two categories: feature selection and feature transformation. Feature selection retains significant features, while feature transformation projects data into a lower-dimensional space, with linear and nonlinear methods. While nonlinear methods excel in preserving local structures and capturing nonlinear relationships, they may struggle with interpreting global structures and can be computationally intensive. Recent algorithms, such as the t-SNE, UMAP, TriMap, and PaCMAP prioritize preserving local structures, often at the expense of accurately representing global structures, leading to clusters being spread out more in lower-dimensional spaces. Moreover, these methods heavily rely on hyperparameters, making their results sensitive to parameter settings. To address these limitations, this study introduces a clustering-based approach, namely CBMAP (Clustering-Based Manifold Approximation and Projection), for dimensionality reduction. CBMAP aims to preserve both global and local structures, ensuring that clusters in lower-dimensional spaces closely resemble those in high-dimensional spaces. Experimental evaluations on benchmark datasets demonstrate CBMAP's efficacy, offering speed, scalability, and minimal reliance on hyperparameters. Importantly, CBMAP enables low-dimensional projection of test data, addressing a critical need in machine learning applications. CBMAP is made freely available at this https URL and can be installed from the Python Package Directory (PyPI) software repository with the command pip install cbmap.

Submitted: Apr 27, 2024