Bilingual Data
Bilingual data research focuses on developing and utilizing datasets containing parallel text or speech in two languages to improve multilingual natural language processing (NLP) models. Current research emphasizes creating high-quality bilingual corpora for various domains (e.g., finance, medicine, general knowledge), often employing large language models (LLMs) for tasks like translation, question answering, and safety detection. This work is crucial for advancing multilingual NLP capabilities, particularly for low-resource languages, and has significant implications for cross-cultural communication and information access.
Papers
January 7, 2023
December 2, 2022
November 18, 2022
November 13, 2022
June 9, 2022
March 28, 2022
March 4, 2022
November 29, 2021