Small Corpus
Research on small corpora focuses on developing methods for effectively training and utilizing language models with limited data, addressing challenges in various NLP tasks. Current efforts involve adapting existing architectures like Transformers and BERT, employing techniques such as transfer learning, data augmentation (including hallucinated data), and novel annotation schemes to maximize performance despite data scarcity. This research is crucial for advancing NLP in low-resource languages and domains where large datasets are unavailable, enabling applications in areas like healthcare, legal tech, and accessibility.
Papers
October 16, 2024
June 28, 2024
June 17, 2024
April 29, 2024
April 10, 2024
February 22, 2024
July 11, 2023
May 25, 2023
January 27, 2023
January 12, 2023
January 3, 2023
December 20, 2022
November 16, 2022
June 13, 2022
January 15, 2022