Paper ID: 2302.08623

A Hybrid Chimp Optimization Algorithm and Generalized Normal Distribution Algorithm with Opposition-Based Learning Strategy for Solving Data Clustering Problems

Sayed Pedram Haeri Boroujeni, Elnaz Pashaei

This paper is concerned with data clustering to separate clusters based on the connectivity principle for categorizing similar and dissimilar data into different groups. Although classical clustering algorithms such as K-means are efficient techniques, they often trap in local optima and have a slow convergence rate in solving high-dimensional problems. To address these issues, many successful meta-heuristic optimization algorithms and intelligence-based methods have been introduced to attain the optimal solution in a reasonable time. They are designed to escape from a local optimum problem by allowing flexible movements or random behaviors. In this study, we attempt to conceptualize a powerful approach using the three main components: Chimp Optimization Algorithm (ChOA), Generalized Normal Distribution Algorithm (GNDA), and Opposition-Based Learning (OBL) method. Firstly, two versions of ChOA with two different independent groups' strategies and seven chaotic maps, entitled ChOA(I) and ChOA(II), are presented to achieve the best possible result for data clustering purposes. Secondly, a novel combination of ChOA and GNDA algorithms with the OBL strategy is devised to solve the major shortcomings of the original algorithms. Lastly, the proposed ChOAGNDA method is a Selective Opposition (SO) algorithm based on ChOA and GNDA, which can be used to tackle large and complex real-world optimization problems, particularly data clustering applications. The results are evaluated against seven popular meta-heuristic optimization algorithms and eight recent state-of-the-art clustering techniques. Experimental results illustrate that the proposed work significantly outperforms other existing methods in terms of the achievement in minimizing the Sum of Intra-Cluster Distances (SICD), obtaining the lowest Error Rate (ER), accelerating the convergence speed, and finding the optimal cluster centers.

Submitted: Feb 16, 2023