Multilingual Dataset
Multilingual datasets are collections of text and/or speech data spanning multiple languages, aiming to improve the performance and cross-lingual capabilities of language models. Current research focuses on creating high-quality, diverse datasets for various tasks, including machine translation, sentiment analysis, and speech emotion recognition, often employing techniques like parameter-efficient transfer learning and leveraging pre-trained models such as BERT and Whisper. These datasets are crucial for developing more robust and inclusive language technologies, addressing the limitations of English-centric models and enabling applications in diverse linguistic and cultural contexts.
Papers
Comparative Study of Multilingual Idioms and Similes in Large Language Models
Paria Khoshtab, Danial Namazifard, Mostafa Masoudi, Ali Akhgary, Samin Mahdizadeh Sani, Yadollah Yaghoobzadeh
Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model
Divyanshu Aggarwal, Sankarshan Damle, Navin Goyal, Satya Lokam, Sunayana Sitaram
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
Jiawei Huang, Chen Zhang, Yi Ren, Ziyue Jiang, Zhenhui Ye, Jinglin Liu, Jinzheng He, Xiang Yin, Zhou Zhao
Large Language Models for cross-language code clone detection
Micheline Bénédicte Moumoula, Abdoul Kader Kabore, Jacques Klein, Tegawendé Bissyande