Paper ID: 2408.03526
Minimum Enclosing Ball Synthetic Minority Oversampling Technique from a Geometric Perspective
Yi-Yang Shangguan, Shi-Shun Chen, Xiao-Yang Li
Class imbalance refers to the significant difference in the number of samples from different classes within a dataset, making it challenging to identify minority class samples correctly. This issue is prevalent in real-world classification tasks, such as software defect prediction, medical diagnosis, and fraud detection. The synthetic minority oversampling technique (SMOTE) is widely used to address class imbalance issue, which is based on interpolation between randomly selected minority class samples and their neighbors. However, traditional SMOTE and most of its variants only interpolate between existing samples, which may be affected by noise samples in some cases and synthesize samples that lack diversity. To overcome these shortcomings, this paper proposes the Minimum Enclosing Ball SMOTE (MEB-SMOTE) method from a geometry perspective. Specifically, MEB is innovatively introduced into the oversampling method to construct a representative point. Then, high-quality samples are synthesized by interpolation between this representative point and the existing samples. The rationale behind constructing a representative point is discussed, demonstrating that the center of MEB is more suitable as the representative point. To exhibit the superiority of MEB-SMOTE, experiments are conducted on 15 real-world imbalanced datasets. The results indicate that MEB-SMOTE can effectively improve the classification performance on imbalanced datasets.
Submitted: Aug 7, 2024