MATH Dataset
MATH is a benchmark dataset used to evaluate the mathematical reasoning capabilities of large language models (LLMs). Current research focuses on improving LLMs' performance on MATH through techniques like instruction tuning with large, high-quality datasets (often generated by LLMs themselves), model merging strategies, and the incorporation of external tools and code interpreters. These advancements aim to enhance LLMs' ability to solve complex mathematical problems, impacting fields like education and scientific research by providing more powerful AI assistants for problem-solving and knowledge discovery.
Papers
Machine Learning Clifford invariants of ADE Coxeter elements
Siqi Chen, Pierre-Philippe Dechant, Yang-Hui He, Elli Heyes, Edward Hirst, Dmitrii Riabchenko
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen