Critique Ability
Critique ability in large language models (LLMs) focuses on evaluating their capacity to identify and correct errors in their own reasoning and generated outputs. Current research emphasizes benchmarking this ability across diverse tasks, using metrics beyond simple accuracy to assess aspects like reasoning steps, constraint satisfaction, and handling of complex instructions, often employing techniques like chain-of-thought prompting and self-critique mechanisms. This research is crucial for improving LLM reliability and trustworthiness, impacting fields ranging from automated reasoning and code generation to more nuanced applications requiring robust and explainable AI.
Papers
Dynamics of Instruction Tuning: Each Ability of Large Language Models Has Its Own Growth Pace
Chiyu Song, Zhanchao Zhou, Jianhao Yan, Yuejiao Fei, Zhenzhong Lan, Yue Zhang
M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models
Wai-Chung Kwan, Xingshan Zeng, Yufei Wang, Yusen Sun, Liangyou Li, Lifeng Shang, Qun Liu, Kam-Fai Wong
Beyond Text: A Deep Dive into Large Language Models' Ability on Understanding Graph Data
Yuntong Hu, Zheng Zhang, Liang Zhao
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks
Avinash Madasu, Anahita Bhiwandiwalla, Vasudev Lal
Critique Ability of Large Language Models
Liangchen Luo, Zi Lin, Yinxiao Liu, Lei Shu, Yun Zhu, Jingbo Shang, Lei Meng