Unknown Language
Research on unknown languages focuses on developing and evaluating computational methods to analyze and process text and speech across a wide range of languages, particularly those with limited digital resources. Current efforts concentrate on improving large language models (LLMs) for multilingual tasks, including translation, question answering, and toxicity detection, often employing techniques like self-supervised learning, preference tuning, and multilingual feedback mechanisms. This work is crucial for advancing natural language processing capabilities globally, enabling more equitable access to technology and fostering deeper cross-cultural understanding within the scientific community and various practical applications.
Papers
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
Jiatong Shi, Shih-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, Shinji Watanabe
Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation
Zhi Qu, Chenchen Ding, Taro Watanabe