Bilingual Data
Bilingual data research focuses on developing and utilizing datasets containing parallel text or speech in two languages to improve multilingual natural language processing (NLP) models. Current research emphasizes creating high-quality bilingual corpora for various domains (e.g., finance, medicine, general knowledge), often employing large language models (LLMs) for tasks like translation, question answering, and safety detection. This work is crucial for advancing multilingual NLP capabilities, particularly for low-resource languages, and has significant implications for cross-cultural communication and information access.
Papers
November 14, 2024
November 6, 2024
November 1, 2024
September 27, 2024
September 19, 2024
September 4, 2024
July 9, 2024
June 5, 2024
April 25, 2024
March 18, 2024
March 16, 2024
March 10, 2024
February 26, 2024
January 11, 2024
October 30, 2023
September 25, 2023
September 19, 2023
August 2, 2023
June 14, 2023