Imbalanced Dataset

Imbalanced datasets, where one class significantly outnumbers others, pose a major challenge in machine learning, hindering accurate model training and prediction. Current research focuses on developing techniques to mitigate this imbalance, employing methods like oversampling minority classes, adjusting loss functions (e.g., using weighted losses or novel loss functions like IWL), and incorporating data augmentation or semi-supervised learning strategies. These advancements are crucial across diverse fields, from healthcare (e.g., heart disease and glaucoma diagnosis) to cybersecurity (malware detection) and safety analytics (accident prediction), improving the reliability and effectiveness of machine learning models in real-world applications. The ultimate goal is to create robust models that accurately classify all classes, even when data is scarce for some.

Papers