Paper ID: 2407.15893

Cascaded two-stage feature clustering and selection via separability and consistency in fuzzy decision systems

Yuepeng Chen, Weiping Ding, Hengrong Ju, Jiashuang Huang, Tao Yin

Feature selection is a vital technique in machine learning, as it can reduce computational complexity, improve model performance, and mitigate the risk of overfitting. However, the increasing complexity and dimensionality of datasets pose significant challenges in the selection of features. Focusing on these challenges, this paper proposes a cascaded two-stage feature clustering and selection algorithm for fuzzy decision systems. In the first stage, we reduce the search space by clustering relevant features and addressing inter-feature redundancy. In the second stage, a clustering-based sequentially forward selection method that explores the global and local structure of data is presented. We propose a novel metric for assessing the significance of features, which considers both global separability and local consistency. Global separability measures the degree of intra-class cohesion and inter-class separation based on fuzzy membership, providing a comprehensive understanding of data separability. Meanwhile, local consistency leverages the fuzzy neighborhood rough set model to capture uncertainty and fuzziness in the data. The effectiveness of our proposed algorithm is evaluated through experiments conducted on 18 public datasets and a real-world schizophrenia dataset. The experiment results demonstrate our algorithm's superiority over benchmarking algorithms in both classification accuracy and the number of selected features.

Submitted: Jul 22, 2024