Persian Natural Language Processing

Persian Natural Language Processing (PNLP) focuses on developing computational methods to understand and process the Persian language, addressing its unique linguistic characteristics and the relative scarcity of resources compared to high-resource languages like English. Current research heavily utilizes transformer-based architectures like BERT, adapting and pre-training these models on large corpora of Persian text, including both formal and informal varieties, to improve performance on tasks such as semantic similarity measurement, instruction following, and question answering. These advancements are crucial for bridging the language technology gap, enabling the development of applications like machine translation, chatbots, and improved information access for the large Persian-speaking population.

Papers