Self Training
Self-training is a semi-supervised machine learning technique that leverages unlabeled data to improve model performance by iteratively training on pseudo-labels generated by the model itself. Current research focuses on enhancing self-training's robustness and efficiency through techniques like contrastive learning, preference optimization, and uncertainty estimation, often integrated with various model architectures including deep neural networks, transformers, and generative models. This approach is proving valuable across diverse applications, from improving fairness in machine learning to enabling more sample-efficient training in areas like 3D object detection, natural language processing, and biosignal-based robotics control. The ultimate goal is to reduce reliance on expensive and time-consuming data annotation while improving model accuracy and generalization.
Papers
On Unsupervised Uncertainty-Driven Speech Pseudo-Label Filtering and Model Calibration
Nauman Dawalatabad, Sameer Khurana, Antoine Laurent, James Glass
Self-training of Machine Learning Models for Liver Histopathology: Generalization under Clinical Shifts
Jin Li, Deepta Rajan, Chintan Shah, Dinkar Juyal, Shreya Chakraborty, Chandan Akiti, Filip Kos, Janani Iyer, Anand Sampat, Ali Behrooz