Corpus Creation
Corpus creation focuses on building large, high-quality datasets of text and/or speech for training and evaluating natural language processing (NLP) models. Current research emphasizes creating corpora tailored to specific tasks, such as scientific mention detection, adverse drug event identification, and analysis of argumentative structures, often incorporating multimodal data (text and images) and leveraging deep learning architectures like transformers (e.g., BERT) and large language models (LLMs). These corpora are crucial for advancing NLP research, particularly in low-resource languages, and improving applications ranging from information retrieval and machine translation to healthcare and education.
Papers
October 28, 2024
June 20, 2024
May 24, 2024
March 23, 2024
March 13, 2024
March 3, 2024
February 8, 2024
November 21, 2023
October 27, 2023
September 18, 2023
June 20, 2023
May 23, 2023
April 3, 2023
February 27, 2023
November 7, 2022
October 27, 2022
October 19, 2022
October 6, 2022
June 10, 2022
April 28, 2022