Reasoning Capability
Reasoning capability in large language models (LLMs) is a central research area focusing on enhancing their ability to solve complex problems requiring multiple steps and logical inferences. Current research investigates various prompting techniques, such as chain-of-thought prompting and retrieval-augmented generation (RAG), to improve reasoning performance across diverse tasks, including mathematical, logical, and commonsense reasoning, often using benchmarks like GSM8K and its variants. These efforts aim to understand the limitations of current LLMs, which often rely on pattern matching rather than true logical deduction, and to develop more robust and reliable reasoning methods. The ultimate goal is to create LLMs capable of genuine reasoning, impacting fields ranging from scientific discovery to personalized education and decision support systems.
Papers
A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners
Bowen Jiang, Yangxinyu Xie, Zhuoqun Hao, Xiaomeng Wang, Tanwi Mallick, Weijie J. Su, Camillo J. Taylor, Dan Roth
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning
Joykirat Singh, Akshay Nambi, Vibhav Vineet
Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities
Wenyue Hua, Kaijie Zhu, Lingyao Li, Lizhou Fan, Shuhang Lin, Mingyu Jin, Haochen Xue, Zelong Li, JinDong Wang, Yongfeng Zhang
mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models
Huiyuan Lai, Malvina Nissim
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Marianna Nezhurina, Lucia Cipolina-Kun, Mehdi Cherti, Jenia Jitsev
QCRD: Quality-guided Contrastive Rationale Distillation for Large Language Models
Wei Wang, Zhaowei Li, Qi Xu, Yiqing Cai, Hang Song, Qi Qi, Ran Zhou, Zhida Huang, Tao Wang, Li Xiao
Archimedes-AUEB at SemEval-2024 Task 5: LLM explains Civil Procedure
Odysseas S. Chlapanis, Ion Androutsopoulos, Dimitrios Galanis
General Purpose Verification for Chain of Thought Prompting
Robert Vacareanu, Anurag Pratik, Evangelia Spiliopoulou, Zheng Qi, Giovanni Paolini, Neha Anna John, Jie Ma, Yassine Benajiba, Miguel Ballesteros
Can Large Language Models put 2 and 2 together? Probing for Entailed Arithmetical Relationships
D. Panas, S. Seth, V. Belle
Can't say cant? Measuring and Reasoning of Dark Jargons in Large Language Models
Xu Ji, Jianyi Zhang, Ziyin Zhou, Zhangchi Zhao, Qianqian Qiao, Kaiying Han, Md Imran Hossen, Xiali Hei
Evaluating Consistency and Reasoning Capabilities of Large Language Models
Yash Saxena, Sarthak Chopra, Arunendra Mani Tripathi