Unknown Language
Research on unknown languages focuses on developing and evaluating computational methods to analyze and process text and speech across a wide range of languages, particularly those with limited digital resources. Current efforts concentrate on improving large language models (LLMs) for multilingual tasks, including translation, question answering, and toxicity detection, often employing techniques like self-supervised learning, preference tuning, and multilingual feedback mechanisms. This work is crucial for advancing natural language processing capabilities globally, enabling more equitable access to technology and fostering deeper cross-cultural understanding within the scientific community and various practical applications.
Papers
Quantifying the Dialect Gap and its Correlates Across Languages
Anjali Kantharuban, Ivan Vulić, Anna Korhonen
Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages
Libo Qin, Qiguang Chen, Fuxuan Wei, Shijue Huang, Wanxiang Che
The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages
Chiyu Zhang, Khai Duy Doan, Qisheng Liao, Muhammad Abdul-Mageed