Code Mixed
Code-mixing, the blending of multiple languages within a single text or conversation, is a prevalent linguistic phenomenon increasingly studied in natural language processing (NLP). Current research focuses on developing robust models, often leveraging transformer architectures like BERT and its variants, to perform tasks such as sentiment analysis, hate speech detection, and machine translation on code-mixed data, often addressing challenges posed by data scarcity through techniques like synthetic data generation and transfer learning. This research is significant for improving cross-lingual communication and building more inclusive NLP systems capable of understanding and generating text in diverse multilingual contexts, with applications ranging from social media monitoring to improved human-computer interaction.
Papers
BanglishRev: A Large-Scale Bangla-English and Code-mixed Dataset of Product Reviews in E-Commerce
Mohammad Nazmush Shamael, Sabila Nawshin, Swakkhar Shatabda, Salekul Islam
Revealing the impact of synthetic native samples and multi-tasking strategies in Hindi-English code-mixed humour and sarcasm detection
Debajyoti Mazumder, Aakash Kumar, Jasabanta Patro
YouTube Comments Decoded: Leveraging LLMs for Low Resource Language Classification
Aniket Deroy, Subhankar Maity
Prompt Engineering Using GPT for Word-Level Code-Mixed Language Identification in Low-Resource Dravidian Languages
Aniket Deroy, Subhankar Maity
Crystal: Illuminating LLM Abilities on Language and Code
Tianhua Tao, Junbo Li, Bowen Tan, Hongyi Wang, William Marshall, Bhargav M Kanakiya, Joel Hestness, Natalia Vassilieva, Zhiqiang Shen, Eric P. Xing, Zhengzhong Liu