Language Label
Language labeling, the process of assigning linguistic identifiers to text or speech data, is crucial for numerous natural language processing tasks. Current research focuses on improving label accuracy through techniques like geographically-informed modeling, leveraging multilingual translation models and their inherent linguistic relationships (including the creation of "pseudo-language families"), and employing self-supervised learning with pseudo-labels to enhance visual grounding and speech-to-text applications. These advancements significantly impact downstream applications, improving the performance of multilingual systems and enabling more accurate analysis of large-scale corpora across diverse languages.
Papers
March 14, 2024
December 5, 2023
May 15, 2023
April 24, 2023
March 21, 2023
June 17, 2022