Language Dataset
Language datasets are crucial for training and evaluating natural language processing (NLP) models, with current research heavily focused on addressing the scarcity of high-quality data for low-resource languages. This involves developing techniques like cross-lingual transfer learning, employing novel data collection methods (e.g., storyboards), and designing model architectures such as Mixture-of-Experts (MoE) to mitigate catastrophic forgetting during multilingual training. These advancements are vital for broadening the reach of NLP applications and fostering inclusivity in the field by enabling the development of robust and accurate models for a wider range of languages.
Papers
October 14, 2024
October 4, 2024
July 14, 2024
July 6, 2024
June 25, 2024
June 6, 2024
April 2, 2024
February 20, 2024
October 9, 2023
June 30, 2023
June 14, 2023
May 23, 2023
February 1, 2023
November 26, 2022
November 17, 2022
November 14, 2022
October 25, 2022
August 24, 2022
May 12, 2022
December 27, 2021