Critique Ability
Critique ability in large language models (LLMs) focuses on evaluating their capacity to identify and correct errors in their own reasoning and generated outputs. Current research emphasizes benchmarking this ability across diverse tasks, using metrics beyond simple accuracy to assess aspects like reasoning steps, constraint satisfaction, and handling of complex instructions, often employing techniques like chain-of-thought prompting and self-critique mechanisms. This research is crucial for improving LLM reliability and trustworthiness, impacting fields ranging from automated reasoning and code generation to more nuanced applications requiring robust and explainable AI.
Papers
Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning
Kuofeng Gao, Huanqia Cai, Qingyao Shuai, Dihong Gong, Zhifeng Li
TMGBench: A Systematic Game Benchmark for Evaluating Strategic Reasoning Abilities of LLMs
Haochuan Wang, Xiachong Feng, Lei Li, Zhanyue Qin, Dianbo Sui, Lingpeng Kong
CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
Jihyung Kil, Zheda Mai, Justin Lee, Zihe Wang, Kerrie Cheng, Lemeng Wang, Ye Liu, Arpita Chowdhury, Wei-Lun Chao
APTNESS: Incorporating Appraisal Theory and Emotion Support Strategies for Empathetic Response Generation
Yuxuan Hu, Minghuan Tan, Chenwei Zhang, Zixuan Li, Xiaodan Liang, Min Yang, Chengming Li, Xiping Hu