Arabic Multimodal

Arabic multimodal research focuses on developing and applying artificial intelligence models that understand and generate content combining text and images, particularly within the context of Arabic language and its diverse dialects. Current efforts concentrate on creating large language models (LLMs) capable of handling multimodal tasks like image captioning and sentiment analysis, often leveraging transformer architectures and techniques like word alignment. This work is crucial due to the scarcity of Arabic multimodal datasets, hindering progress in areas such as propaganda detection and improving the accessibility of AI technologies across different languages and cultures.

Papers