Brazilian Portuguese
Brazilian Portuguese (BP) is a vibrant language with a growing body of research focused on developing and adapting natural language processing (NLP) tools. Current research emphasizes building and evaluating large language models (LLMs) for BP, often leveraging architectures like BERT and Transformer networks, and applying them to tasks such as machine translation, event extraction, and hate speech detection. This work addresses the relative scarcity of resources for BP compared to English, aiming to improve NLP capabilities for this language and fostering broader access to technology in Portuguese-speaking communities. The development of high-quality corpora and benchmarks is also a key focus, enabling more robust model training and evaluation.
Papers
PeLLE: Encoder-based language models for Brazilian Portuguese based on open data
Guilherme Lamartine de Mello, Marcelo Finger, and Felipe Serras, Miguel de Mello Carpi, Marcos Menon Jose, Pedro Henrique Domingues, Paulo Cavalim
Advancing Generative AI for Portuguese with Open Decoder Gerv\'asio PT*
Rodrigo Santos, João Silva, Luís Gomes, João Rodrigues, António Branco