Imbalanced Datasets

Imbalanced datasets, where one class significantly outnumbers others, pose a major challenge in machine learning, hindering the accurate prediction of minority classes. Current research focuses on developing robust algorithms and model architectures, such as deep neural networks, gradient boosting machines, and generative models (including GANs and diffusion models), to address this imbalance through techniques like resampling, loss function modification, and data augmentation. These advancements are crucial for improving the reliability of machine learning models across diverse applications, from medical diagnosis and fraud detection to industrial anomaly detection and environmental monitoring, where accurate identification of rare events is critical. The field is actively exploring both classical and novel approaches to improve model performance and fairness in the face of skewed data distributions.

Papers