Model Trustworthiness

Model trustworthiness, encompassing reliability, accuracy, and fairness, is a critical area of research aiming to ensure that AI systems make dependable predictions and decisions. Current efforts focus on improving model calibration (aligning confidence with accuracy), mitigating biases and vulnerabilities, and enhancing explainability through techniques like knowledge transfer between large language models and concept bottleneck models. These advancements are crucial for building trust in AI across diverse applications, from healthcare and finance to environmental modeling and security, ultimately promoting responsible and beneficial AI deployment.

Papers