Indonesian Language
Indonesian, a low-resource language with significant linguistic diversity, is a growing focus of natural language processing (NLP) research, aiming to develop robust and accurate language models for various tasks. Current research emphasizes the development and application of transformer-based models like BERT and T5, along with graph convolutional networks, to address challenges such as hate speech detection, code-mixing, and low-resource language modeling in Indonesian and its numerous regional dialects. These advancements have significant implications for improving access to information, facilitating cross-cultural communication, and advancing NLP techniques applicable to other under-resourced languages globally.
Papers
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese
Khang T. Doan, Bao G. Huynh, Dung T. Hoang, Thuc D. Pham, Nhat H. Pham, Quan T. M. Nguyen, Bang Q. Vo, Suong N. Hoang
Enhancing Natural Language Inference Performance with Knowledge Graph for COVID-19 Automated Fact-Checking in Indonesian Language
Arief Purnama Muharram, Ayu Purwarianti