Large Corpus
Large corpora, massive collections of text and other data, are fundamental to training advanced language models and other AI systems. Current research focuses on improving the efficiency and effectiveness of training with diverse and heterogeneous corpora, including techniques like decoupled embeddings and data augmentation to mitigate issues like the "curse of multilinguality" and domain-specific biases. This work is crucial for advancing natural language processing, enabling the development of more robust, accurate, and versatile AI systems across various languages and domains, with applications ranging from question answering to knowledge graph construction.
Papers
Intelligent Learning Rate Distribution to reduce Catastrophic Forgetting in Transformers
Philip Kenneweg, Alexander Schulz, Sarah Schröder, Barbara Hammer
Neural Architecture Search for Sentence Classification with BERT
Philip Kenneweg, Sarah Schröder, Barbara Hammer
A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages
Lisa Raithel, Hui-Syuan Yeh, Shuntaro Yada, Cyril Grouin, Thomas Lavergne, Aurélie Névéol, Patrick Paroubek, Philippe Thomas, Tomohiro Nishiyama, Sebastian Möller, Eiji Aramaki, Yuji Matsumoto, Roland Roller, Pierre Zweigenbaum
Advancing Speech Translation: A Corpus of Mandarin-English Conversational Telephone Speech
Shannon Wotherspoon, William Hartmann, Matthew Snover
TEI2GO: A Multilingual Approach for Fast Temporal Expression Identification
Hugo Sousa, Ricardo Campos, Alípio Jorge
Automatic Construction of a Large-Scale Corpus for Geoparsing Using Wikipedia Hyperlinks
Keyaki Ohno, Hirotaka Kameko, Keisuke Shirai, Taichi Nishimura, Shinsuke Mori
A New Massive Multilingual Dataset for High-Performance Language Technologies
Ona de Gibert, Graeme Nail, Nikolay Arefyev, Marta Bañón, Jelmer van der Linde, Shaoxiong Ji, Jaume Zaragoza-Bernabeu, Mikko Aulamo, Gema Ramírez-Sánchez, Andrey Kutuzov, Sampo Pyysalo, Stephan Oepen, Jörg Tiedemann
How Gender Interacts with Political Values: A Case Study on Czech BERT Models
Adnan Al Ali, Jindřich Libovický