Multilingual Benchmark
Multilingual benchmarks are datasets designed to evaluate the performance of large language models (LLMs) across multiple languages, aiming to assess their cross-lingual capabilities and identify biases. Current research focuses on developing comprehensive benchmarks encompassing diverse tasks (e.g., question answering, code generation, translation) and languages, including low-resource ones, often employing instruction fine-tuning and various model architectures like transformers. These benchmarks are crucial for advancing the development of truly multilingual LLMs, improving their fairness and reliability, and enabling broader access to AI technologies across diverse linguistic communities.
Papers
xMEN: A Modular Toolkit for Cross-Lingual Medical Entity Normalization
Florian Borchert, Ignacio Llorca, Roland Roller, Bert Arnrich, Matthieu-P. Schapranow
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion
Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, Bing Xiang