Reasoning Benchmark
Reasoning benchmarks are standardized tests designed to evaluate the logical reasoning capabilities of large language models (LLMs). Current research focuses on developing more challenging benchmarks that go beyond simple question-answering, including those requiring multi-step reasoning, handling long contexts, and incorporating diverse reasoning types (deductive, inductive, abductive, analogical). These benchmarks utilize various techniques like chain-of-thought prompting, in-context learning, and model architectures incorporating generator-discriminator networks or hybrid thinking frameworks to improve LLM performance. The development of robust and comprehensive reasoning benchmarks is crucial for advancing the field of artificial intelligence by providing objective measures of progress and identifying areas needing further research.
Papers
RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios
Ruiwen Zhou, Wenyue Hua, Liangming Pan, Sitao Cheng, Xiaobao Wu, En Yu, William Yang Wang
Phi-4 Technical Report
Marah Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison, Russell J. Hewett, Mojan Javaheripi, Piero Kauffmann, James R. Lee, Yin Tat Lee, Yuanzhi Li, Weishung Liu, Caio C. T. Mendes, Anh Nguyen, Eric Price, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Xin Wang, Rachel Ward, Yue Wu, Dingli Yu, Cyril Zhang, Yi Zhang
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J.H. Liu
LAR-ECHR: A New Legal Argument Reasoning Task and Dataset for Cases of the European Court of Human Rights
Odysseas S. Chlapanis, Dimitrios Galanis, Ion Androutsopoulos
Mars: Situated Inductive Reasoning in an Open-World Environment
Xiaojuan Tang, Jiaqi Li, Yitao Liang, Song-chun Zhu, Muhan Zhang, Zilong Zheng
Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning
Hyun Ryu, Gyeongman Kim, Hyemin S. Lee, Eunho Yang