Code Mixed
Code-mixing, the blending of multiple languages within a single text or conversation, is a prevalent linguistic phenomenon increasingly studied in natural language processing (NLP). Current research focuses on developing robust models, often leveraging transformer architectures like BERT and its variants, to perform tasks such as sentiment analysis, hate speech detection, and machine translation on code-mixed data, often addressing challenges posed by data scarcity through techniques like synthetic data generation and transfer learning. This research is significant for improving cross-lingual communication and building more inclusive NLP systems capable of understanding and generating text in diverse multilingual contexts, with applications ranging from social media monitoring to improved human-computer interaction.
Papers
Elevating Code-mixed Text Handling through Auditory Information of Words
Mamta, Zishan Ahmad, Asif Ekbal
OffMix-3L: A Novel Code-Mixed Dataset in Bangla-English-Hindi for Offensive Language Identification
Dhiman Goswami, Md Nishat Raihan, Antara Mahmud, Antonios Anastasopoulos, Marcos Zampieri
SentMix-3L: A Bangla-English-Hindi Code-Mixed Dataset for Sentiment Analysis
Md Nishat Raihan, Dhiman Goswami, Antara Mahmud, Antonios Anastasopoulos, Marcos Zampieri