Multilingual Corpus
Multilingual corpora, collections of text and speech data spanning multiple languages, are crucial for developing language technologies that work across linguistic boundaries. Current research focuses on creating and improving these corpora, addressing issues like data imbalance, bias detection, and efficient cross-lingual transfer learning using techniques such as deep learning models (e.g., BERT, mT5) and contrastive learning. These advancements are vital for bridging the language gap in natural language processing, enabling applications like multilingual machine translation, speech recognition, and information retrieval to serve a wider global population and fostering research into under-resourced languages.
Papers
October 5, 2024
August 8, 2024
July 12, 2024
June 28, 2024
June 26, 2024
April 11, 2024
April 10, 2024
April 1, 2024
March 27, 2024
March 4, 2024
March 1, 2024
February 22, 2024
February 19, 2024
December 22, 2023
December 15, 2023
December 6, 2023
November 16, 2023
November 14, 2023
November 3, 2023