Urdu Language
Urdu, a low-resource language with complex phonetic and morphological features, is a growing focus of natural language processing (NLP) research. Current efforts concentrate on improving the performance of various NLP tasks in Urdu, including keyword spotting, sentiment analysis, fake news detection, and machine translation, leveraging models like transformers, BERT, and support vector machines. These advancements are crucial for bridging the digital divide and enabling broader access to NLP technologies for Urdu speakers, impacting fields such as information retrieval, social media monitoring, and language technology development. The development of large, publicly available datasets is also a key area of ongoing research.
Papers
UrduFake@FIRE2021: Shared Track on Fake News Identification in Urdu
Maaz Amjad, Sabur Butt, Hamza Imam Amjad, Grigori Sidorov, Alisa Zhila, Alexander Gelbukh
Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2021
Maaz Amjad, Sabur Butt, Hamza Imam Amjad, Alisa Zhila, Grigori Sidorov, Alexander Gelbukh
Urdu Morphology, Orthography and Lexicon Extraction
Muhammad Humayoun, Harald Hammarström, Aarne Ranta
The 2021 Urdu Fake News Detection Task using Supervised Machine Learning and Feature Combinations
Muhammad Humayoun
Abusive and Threatening Language Detection in Urdu using Supervised Machine Learning and Feature Combinations
Muhammad Humayoun