Imbalanced Dataset
Imbalanced datasets, where one class significantly outnumbers others, pose a major challenge in machine learning, hindering accurate model training and prediction. Current research focuses on developing techniques to mitigate this imbalance, employing methods like oversampling minority classes, adjusting loss functions (e.g., using weighted losses or novel loss functions like IWL), and incorporating data augmentation or semi-supervised learning strategies. These advancements are crucial across diverse fields, from healthcare (e.g., heart disease and glaucoma diagnosis) to cybersecurity (malware detection) and safety analytics (accident prediction), improving the reliability and effectiveness of machine learning models in real-world applications. The ultimate goal is to create robust models that accurately classify all classes, even when data is scarce for some.
Papers
Learning with Noisy Labels over Imbalanced Subpopulations
MingCai Chen, Yu Zhao, Bing He, Zongbo Han, Bingzhe Wu, Jianhua Yao
PU GNN: Chargeback Fraud Detection in P2E MMORPGs via Graph Attention Networks with Imbalanced PU Labels
Jiho Choi, Junghoon Park, Woocheol Kim, Jin-Hyeok Park, Yumin Suh, Minchang Sung