Reasoning Performance

Reasoning performance in large language models (LLMs) is a central research area aiming to enhance their ability to solve complex, multi-step problems. Current efforts focus on improving reasoning through techniques like chain-of-thought prompting, incorporating diverse perspectives, and leveraging preference models and verifiers to refine reasoning paths and filter out errors. These advancements are crucial for building more reliable and robust AI systems, with implications for various fields including education, healthcare, and autonomous driving, where accurate and dependable reasoning is paramount.

Papers

September 5, 2024

Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation
Yu Wang, Shiwan Zhao, Zhihu Wang, Heyuan Huang, Ming Fan, Yubo Zhang, Zhixing Wang, Haijun Wang, Ting Liu
Large Language Model Chain of Thought Reasoning Capability Complex Reasoning Task Reasoning Performance Reasoning Datasets Eliciting Knowledge

August 31, 2024

Testing and Evaluation of Large Language Models: Correctness, Non-Toxicity, and Fairness
Wenxuan Wang
Large Language Model Language Model Global Evaluation ChatGPT Generated Conversation Procedural Fairness Level Test Reasoning Performance

August 29, 2024

August 27, 2024

Generative Verifiers: Reward Modeling as Next-Token Prediction
Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal
Next Token Prediction Reasoning Performance State of the Art Verifier

August 20, 2024

Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information
Ming Jiang, Tingting Huang, Biao Guo, Yao Lu, Feng Zhang
Large Language Model Native Robustness Global Impact Reasoning Capability Complex Reasoning Task Problem Solving Reasoning Performance Irrelevant Information

August 6, 2024

Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons
Yifei Wang, Yuheng Chen, Wanting Wen, Yu Sheng, Linjing Li, Daniel Dajun Zeng
Reasoning Task Complex Reasoning Task Reasoning Performance Factual Recall Knowledge Neuron

July 25, 2024

Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
Tianduo Wang, Shichen Li, Wei Lu
Language Model Self Training Direct Preference Optimization Reasoning Performance

July 14, 2024

Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model
Xunyu Zhu, Jian Li, Can Ma, Weiping Wang
Large Language Model Chain of Thought Mathematical Reasoning Reasoning Ability Reasoning Performance

July 3, 2024

GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models
Zike Yuan, Ming Liu, Hui Wang, Bing Qin
Complex Reasoning Reasoning Performance Semantic Enrichment Graph Understanding

June 25, 2024

Benchmarking Mental State Representations in Language Models
Matteo Bortoletto, Constantin Ruhdorfer, Lei Shi, Andreas Bulling
Language Model Internal Representation Reasoning Performance Mental State Mind Reasoning Mind Task

June 23, 2024

PORT: Preference Optimization on Reasoning Traces
Salem Lahlou, Abdalgader Abubaker, Hakim Hacid
Large Language Model Language Model Reasoning Task Preference Optimization Reasoning Performance Natural Language Task Port Operation Reasoning Trace

June 20, 2024

Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
Chaojie Wang, Yanchen Deng, Zhiyi Lyu, Liang Zeng, Jujie He, Shuicheng Yan, Bo An
Large Language Model Natural Language Multi Step Reasoning Plug and Play Reasoning Performance Anticipatory Planning

June 18, 2024

A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning
Lijie Hu, Liang Liu, Shu Yang, Xin Chen, Hongru Xiao, Mengdi Li, Pan Zhou, Muhammad Asif Ali, Di Wang
Large Language Model Reasoning Path Reasoning Performance Hopfield Network

June 4, 2024

Break the Chain: Large Language Models Can be Shortcut Reasoners
Mengru Ding, Hanmeng Liu, Zhizhang Fu, Jian Song, Wenbo Xie, Yue Zhang
Language Model Side Chain Reasoning Performance Commonsense Reasoning Task Reasoning Shortcut Primal Heuristic

May 30, 2024

Improve Student's Reasoning Generalizability through Cascading Decomposed CoTs Distillation
Chengwei Dai, Kun Li, Wei Zhou, Songlin Hu
Large Language Model Chain of Thought Reasoning Benchmark Reasoning Performance Reasoning Process

May 29, 2024

Calibrating Reasoning in Language Models with Internal Consistency
Zhihui Xie, Jizhou Guo, Tong Yu, Shuai Li
Language Model Chain of Thought Strong Consistency Reasoning Task Internal Representation Reasoning Performance Reasoning Process

May 21, 2024

Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models
Zhangyue Yin, Qiushi Sun, Qipeng Guo, Zhiyuan Zeng, Xiaonan Li, Tianxiang Sun, Cheng Chang, Qinyuan Cheng, Ding Wang, Xiaofeng Mou, Xipeng Qiu, XuanJing Huang
Complex Reasoning Complex Reasoning Task Data Aggregation Reasoning ChAin Reasoning Performance Hierarchical Framework Answer Selection Hierarchical Reasoning

April 26, 2024

Small Language Models Need Strong Verifiers to Self-Correct Reasoning
Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang
Language Model Reasoning Performance Self Feedback Self Correction Self Refinement State of the Art Verifier

Reasoning Performance

Papers

Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation

Testing and Evaluation of Large Language Models: Correctness, Non-Toxicity, and Fairness

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic

Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

Generative Verifiers: Reward Modeling as Next-Token Prediction

Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information

Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons

Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model

GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models

Benchmarking Mental State Representations in Language Models

PORT: Preference Optimization on Reasoning Traces

Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning

Break the Chain: Large Language Models Can be Shortcut Reasoners

Improve Student's Reasoning Generalizability through Cascading Decomposed CoTs Distillation

Calibrating Reasoning in Language Models with Internal Consistency

Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models

Small Language Models Need Strong Verifiers to Self-Correct Reasoning