Confidence Score
Confidence scores, representing a model's certainty in its predictions, are crucial for building trustworthy AI systems, particularly in high-stakes applications like healthcare and autonomous driving. Current research focuses on improving the calibration and reliability of these scores across diverse model architectures (including LLMs, transformers, and conformers) and tasks, often employing techniques like self-consistency, multicalibration, and novel scoring functions tailored to specific data characteristics (e.g., ordinal data, long-form text). The accurate estimation of confidence is vital for enhancing model performance, enabling selective classification (rejecting low-confidence predictions), and facilitating human-in-the-loop systems where trust and transparency are paramount.
Papers
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, Christopher D. Manning
Selectively Answering Ambiguous Questions
Jeremy R. Cole, Michael J. Q. Zhang, Daniel Gillick, Julian Martin Eisenschlos, Bhuwan Dhingra, Jacob Eisenstein