Text Benchmark
Text benchmarks are standardized datasets and evaluation protocols used to assess the performance of large language models (LLMs) on various text-related tasks. Current research focuses on developing benchmarks that evaluate LLMs' abilities to handle long-form text generation, multi-modal inputs (combining text with images, tables, or other data types), and complex reasoning tasks such as those involving actions and change or advanced data analysis. These benchmarks are crucial for objectively comparing different LLMs and identifying areas for improvement, ultimately driving advancements in both the capabilities of LLMs and their real-world applications across diverse fields.
Papers
December 31, 2024
December 12, 2024
December 6, 2024
November 30, 2024
November 18, 2024
September 3, 2024
June 17, 2024
April 19, 2024
April 18, 2024
April 9, 2024
December 21, 2023
December 7, 2023
November 16, 2023
November 14, 2023
October 6, 2023
September 28, 2023
May 19, 2023
May 17, 2023
December 20, 2022