Code Switched Data
Code-switched data, encompassing text and speech where multiple languages are interwoven within a single utterance, presents a significant challenge and opportunity for natural language processing. Current research focuses on mitigating data scarcity for low-resource languages through techniques like data augmentation using large language models (e.g., GPT) and fine-tuning pre-trained multilingual models (e.g., wav2vec 2.0 XLSR) or adapting existing multilingual models for code-switching. These efforts aim to improve the performance of various NLP tasks, including speech recognition, machine translation, and information retrieval, ultimately leading to more inclusive and accurate language technologies for multilingual communities.
Papers
April 26, 2024
November 25, 2023
October 31, 2023
October 11, 2023
May 9, 2023
November 14, 2022
February 19, 2022