Massive Multitask Language Understanding
Massive Multitask Language Understanding (MMLU) research aims to evaluate the breadth and depth of knowledge and reasoning capabilities in large language models (LLMs) across diverse domains. Current research focuses on developing more robust and challenging benchmarks like MMLU-Pro and its variants, addressing issues like shortcut learning, answer order bias, and data contamination to obtain more reliable performance metrics. These efforts are crucial for improving LLM development and ensuring responsible deployment, impacting both the scientific understanding of AI and the practical application of LLMs in various fields.
17papers
Papers
February 27, 2025
February 24, 2025
January 29, 2025
January 5, 2025
September 3, 2024