Multi Task Benchmark
Multi-task benchmarks evaluate the performance of machine learning models across multiple related tasks simultaneously, aiming to assess generalization ability and identify strengths and weaknesses in different model architectures. Current research focuses on developing more comprehensive and diverse benchmarks spanning various domains (e.g., natural language processing, computer vision, materials science), often incorporating realistic noise and distribution shifts to better reflect real-world scenarios. These benchmarks are crucial for advancing model development, facilitating fair comparisons between different approaches, and ultimately improving the robustness and applicability of machine learning in diverse fields.
Papers
SciRepEval: A Multi-Format Benchmark for Scientific Document Representations
Amanpreet Singh, Mike D'Arcy, Arman Cohan, Doug Downey, Sergey Feldman
This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish
Łukasz Augustyniak, Kamil Tagowski, Albert Sawczyn, Denis Janiak, Roman Bartusiak, Adrian Szymczak, Marcin Wątroba, Arkadiusz Janz, Piotr Szymański, Mikołaj Morzy, Tomasz Kajdanowicz, Maciej Piasecki