Scalar Feedback
Scalar feedback, providing numerical ratings instead of simple binary judgments (e.g., correct/incorrect), is gaining traction as a method for training machine learning models, particularly in scenarios with limited high-quality human-labeled data. Current research focuses on developing robust algorithms that effectively utilize this potentially noisy feedback, including self-training methods and techniques like distribution-based rescaling to mitigate inconsistencies. This approach holds significant promise for improving model performance in various applications, ranging from language model training on complex reasoning tasks to interactive reinforcement learning for robotics, by reducing reliance on extensive human annotation.
Papers
December 11, 2023