Answer Correctness
Answer correctness in large language models (LLMs) and vision-language models (VLMs) is a critical area of research focusing on improving the reliability and trustworthiness of AI-generated responses. Current efforts concentrate on developing methods to assess answer reliability, including techniques that analyze consistency across multiple model outputs or decompose complex questions into simpler sub-questions. These advancements aim to mitigate issues like hallucination and overconfidence, ultimately leading to more accurate and dependable AI systems for various applications. The improved evaluation of answer correctness is crucial for advancing the field and ensuring responsible deployment of these powerful technologies.
Papers
Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs
Minh-Vuong Nguyen, Linhao Luo, Fatemeh Shiri, Dinh Phung, Yuan-Fang Li, Thuy-Trang Vu, Gholamreza Haffari
PEDANTS: Cheap but Effective and Interpretable Answer Equivalence
Zongxia Li, Ishani Mondal, Yijun Liang, Huy Nghiem, Jordan Lee Boyd-Graber