Arabic Speaker
Research on Arabic speakers focuses on developing and evaluating natural language processing (NLP) models capable of accurately understanding and generating Arabic text and speech, accounting for its diverse dialects and cultural nuances. Current efforts concentrate on adapting and improving large language models (LLMs) like BERT and Llama, employing techniques such as retrieval augmented generation (RAG), instruction tuning, and multimodal learning, to enhance tasks such as machine translation, sentiment analysis, and question answering. This work is crucial for bridging the language gap in AI, democratizing access to advanced technologies for the hundreds of millions of Arabic speakers, and advancing the broader field of multilingual NLP. The resulting models and datasets are increasingly being made publicly available, fostering collaboration and accelerating progress.
Papers
Normalized Orthography for Tunisian Arabic
Houcemeddine Turki, Kawthar Ellouze, Hager Ben Ammar, Mohamed Ali Hadj Taieb, Imed Adel, Mohamed Ben Aouicha, Pier Luigi Farri, Abderrezak Bennour
ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic
Fajri Koto, Haonan Li, Sara Shatnawi, Jad Doughman, Abdelrahman Boda Sadallah, Aisha Alraeesi, Khalid Almubarak, Zaid Alyafeai, Neha Sengupta, Shady Shehata, Nizar Habash, Preslav Nakov, Timothy Baldwin