Data Shift
Data shift, the discrepancy between training and real-world data distributions, significantly hinders the reliability and performance of machine learning models. Current research focuses on detecting and mitigating these shifts through various approaches, including adversarial learning, meta-analysis of existing detection scores, and the development of robust model training methods that incorporate prior knowledge or data augmentation techniques like unproportional mosaicing. Addressing data shift is crucial for improving the generalizability and trustworthiness of machine learning models across diverse applications, particularly in high-stakes domains like healthcare and autonomous systems where model failures can have serious consequences.
Papers
Shifts 2.0: Extending The Dataset of Real Distributional Shifts
Andrey Malinin, Andreas Athanasopoulos, Muhamed Barakovic, Meritxell Bach Cuadra, Mark J. F. Gales, Cristina Granziera, Mara Graziani, Nikolay Kartashev, Konstantinos Kyriakopoulos, Po-Jui Lu, Nataliia Molchanova, Antonis Nikitakis, Vatsal Raina, Francesco La Rosa, Eli Sivena, Vasileios Tsarsitalidis, Efi Tsompopoulou, Elena Volf
GSCLIP : A Framework for Explaining Distribution Shifts in Natural Language
Zhiying Zhu, Weixin Liang, James Zou