Language Data
Language data research focuses on developing and improving methods for collecting, processing, and utilizing textual and spoken language data to train and enhance natural language processing (NLP) models. Current research emphasizes addressing data scarcity in low-resource languages, mitigating biases and ethical concerns in data collection, and improving model performance through techniques like multilingual fine-tuning, self-supervised learning, and data augmentation strategies. This work is crucial for advancing NLP capabilities across diverse languages and cultures, impacting applications ranging from machine translation and speech recognition to sentiment analysis and hate speech detection.
Papers
November 8, 2024
October 22, 2024
September 23, 2024
September 15, 2024
September 8, 2024
August 23, 2024
June 27, 2024
June 25, 2024
June 17, 2024
May 23, 2024
May 5, 2024
April 25, 2024
April 23, 2024
April 3, 2024
April 2, 2024
March 21, 2024
February 7, 2024
January 25, 2024
November 16, 2023