Imbalanced Datasets
Imbalanced datasets, where one class significantly outnumbers others, pose a major challenge in machine learning, hindering the accurate prediction of minority classes. Current research focuses on developing robust algorithms and model architectures, such as deep neural networks, gradient boosting machines, and generative models (including GANs and diffusion models), to address this imbalance through techniques like resampling, loss function modification, and data augmentation. These advancements are crucial for improving the reliability of machine learning models across diverse applications, from medical diagnosis and fraud detection to industrial anomaly detection and environmental monitoring, where accurate identification of rare events is critical. The field is actively exploring both classical and novel approaches to improve model performance and fairness in the face of skewed data distributions.
Papers
Supervised Contrastive Learning with Tree-Structured Parzen Estimator Bayesian Optimization for Imbalanced Tabular Data
Shuting Tao, Peng Peng, Qi Li, Hongwei Wang
Active Learning for Imbalanced Civil Infrastructure Data
Thomas Frick, Diego Antognini, Mattia Rigotti, Ioana Giurgiu, Benjamin Grewe, Cristiano Malossi