Mathematical Reasoning
Mathematical reasoning in large language models (LLMs) is a burgeoning research area focused on evaluating and improving the ability of these models to solve mathematical problems, encompassing both symbolic and numerical reasoning. Current research emphasizes developing more robust benchmarks that assess not only final accuracy but also the reasoning process itself, including error detection and correction, and exploring various training methods such as reinforcement learning from human feedback and instruction tuning to enhance model performance. This field is significant because advancements in mathematical reasoning capabilities in LLMs have broad implications for various applications, including education, scientific discovery, and automated problem-solving.
Papers
KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning
Wei Sun, Wen Yang, Pu Jian, Qianlong Du, Fuwei Cui, Shuo Ren, Jiajun ZhangChinese Academy of Sciences●University of Chinese Academy of Sciences●Wuhan AI ResearchSMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving
Yujie Hou, Ting Zhang, Mei Wang, Xuetao Ma, Hu HuangBeijing Normal University
Can LLMs understand Math? -- Exploring the Pitfalls in Mathematical Reasoning
Tiasa Singha Roy, Aditeya Baral, Ayush Rajesh Jhaveri, Yusuf BaigNew York UniversityTowards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems
Chengwei Wei, Bin Wang, Jung-jae Kim, Nancy F. ChenA*STAR●A*STARLearning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision
Eric Hanchen Jiang, Haozheng Luo, Shengyuan Pang, Xiaomin Li, Zhenting Qi, Hengli Li, Cheng-Fu Yang, Zongyu Lin, Xinfeng Li, Hao Xu+2Los Angeles●Northwestern University●Zhejiang University●Harvard University●Peking University●Nanyang Technological University
EasyMath: A 0-shot Math Benchmark for SLMs
Drishya Karki, Michiel Kamphuis, Angelecia FreyAssistantslabBeyond the First Error: Process Reward Models for Reflective Mathematical Reasoning
Zhaohui Yang, Chenghua He, Xiaowen Shi, Linjing Li, Qiyue Yin, Shihong Deng, Daxin JiangInstitute of Automation●StepFun●MeituanLet's Verify Math Questions Step by Step
Chengyu Shen, Zhen Hao Wong, Runming He, Hao Liang, Meiyi Qiang, Zimo Meng, Zhengyang Zhao, Bohan Zeng, Zhengzhou Zhu, Bin Cui+1Peking University
RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics
Jie Zhang, Cezara Petrui, Kristina Nikolić, Florian TramèrETH ZurichMARGE: Improving Math Reasoning for LLMs with Guided Exploration
Jingyue Gao, Runji Lin, Keming Lu, Bowen Yu, Junyang Lin, Jianyu ChenTsinghua University●Alibaba Group●Shanghai Qi Zhi Institute
NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning
Syeda Nahida Akter, Shrimai Prabhumoye, Matvei Novikov, Seungju Han, Ying Lin, Evelina Bakhturi, Eric Nyberg, Yejin Choi, Mostofa Patwary+2Assessment of Evolving Large Language Models in Upper Secondary Mathematics
Mika Setälä, Pieta Sikström, Ville Heilala, Tommi KärkkäinenUniversity of Jyv¨ askyl¨ a