Start Active Learning

Start active learning addresses the challenge of training machine learning models when no labeled data is initially available, a "cold-start" problem common in many applications. Current research focuses on developing effective strategies for selecting initial data subsets for model initialization, often employing clustering techniques enhanced by foundation models or leveraging proxy tasks and size-balanced sampling to mitigate class imbalance. These advancements aim to significantly reduce the annotation effort required for training high-performing models, impacting fields like medical image analysis, natural language processing, and recommendation systems by enabling efficient model development with limited labeled data.

Papers