Data Imbalance
Data imbalance, where some classes in a dataset are significantly under-represented compared to others, poses a major challenge for machine learning models, leading to biased predictions and poor performance on minority classes. Current research focuses on mitigating this imbalance through various techniques, including data augmentation (e.g., synthetic oversampling using LLMs), algorithmic modifications (e.g., cost-sensitive learning, novel loss functions like LDAM and IWL), and ensemble methods, often applied within architectures like XGBoost, graph neural networks, and deep neural networks. Addressing data imbalance is crucial for improving the fairness, reliability, and generalizability of machine learning models across diverse applications, from medical diagnosis and fraud detection to environmental monitoring and materials science.
Papers
Equitable Length of Stay Prediction for Patients with Learning Disabilities and Multiple Long-term Conditions Using Machine Learning
Emeka Abakasanga, Rania Kousovista, Georgina Cosma, Ashley Akbari, Francesco Zaccardi, Navjot Kaur, Danielle Fitt, Gyuchan Thomas Jun, Reza Kiani, Satheesh Gangadharan
Capsule Vision Challenge 2024: Multi-Class Abnormality Classification for Video Capsule Endoscopy
Aakarsh Bansal, Bhuvanesh Singla, Raajan Rajesh Wankhade, Nagamma Patil