Multilingual Corpus
Multilingual corpora, collections of text and speech data spanning multiple languages, are crucial for developing language technologies that work across linguistic boundaries. Current research focuses on creating and improving these corpora, addressing issues like data imbalance, bias detection, and efficient cross-lingual transfer learning using techniques such as deep learning models (e.g., BERT, mT5) and contrastive learning. These advancements are vital for bridging the language gap in natural language processing, enabling applications like multilingual machine translation, speech recognition, and information retrieval to serve a wider global population and fostering research into under-resourced languages.
Papers
August 17, 2023
August 6, 2023
June 14, 2023
June 13, 2023
May 25, 2023
May 20, 2023
April 18, 2023
March 7, 2023
February 27, 2023
December 19, 2022
December 14, 2022
November 15, 2022
November 8, 2022
June 28, 2022
June 14, 2022
May 23, 2022
May 17, 2022
March 28, 2022