Code Mixing
Code-mixing, the blending of multiple languages within a single utterance, is a prevalent linguistic phenomenon increasingly studied using computational linguistics. Current research focuses on developing robust models, often leveraging transformer-based architectures and ensemble learning techniques, to perform tasks like sentiment analysis, toxicity detection, and machine translation on code-mixed data, particularly in low-resource language settings. These efforts are crucial for improving the accessibility and inclusivity of natural language processing (NLP) technologies, addressing digital inequalities, and enhancing cross-cultural communication in diverse online environments. The development of new, multilingual datasets is also a key area of focus, enabling more accurate and nuanced analysis of code-mixed language.
Papers
OffMix-3L: A Novel Code-Mixed Dataset in Bangla-English-Hindi for Offensive Language Identification
Dhiman Goswami, Md Nishat Raihan, Antara Mahmud, Antonios Anastasopoulos, Marcos Zampieri
SentMix-3L: A Bangla-English-Hindi Code-Mixed Dataset for Sentiment Analysis
Md Nishat Raihan, Dhiman Goswami, Antara Mahmud, Antonios Anastasopoulos, Marcos Zampieri