Data Augmentation
Data augmentation is a technique used to artificially expand datasets by creating modified versions of existing data, primarily to improve the performance and robustness of machine learning models, especially when training data is scarce. Current research focuses on developing more sophisticated augmentation methods, including those leveraging generative models like GANs and diffusion models, and integrating augmentation with other techniques such as contrastive learning and transfer learning, often applied within architectures like transformers and convolutional neural networks. This work is significant because it addresses the limitations of limited datasets across various domains, from image classification and object detection to natural language processing and time series forecasting, leading to more accurate and generalizable models for diverse applications.
Papers
Nostra Domina at EvaLatin 2024: Improving Latin Polarity Detection through Data Augmentation
Stephen Bothwell, Abigail Swenor, David Chiang
Generalization Gap in Data Augmentation: Insights from Illumination
Jianqiang Xiao, Weiwen Guo, Junfeng Liu, Mengze Li
Leveraging Data Augmentation for Process Information Extraction
Julian Neuberger, Leonie Doll, Benedict Engelmann, Lars Ackermann, Stefan Jablonski
CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP
Chandra Kiran Reddy Evuru, Sreyan Ghosh, Sonal Kumar, Ramaneswaran S, Utkarsh Tyagi, Dinesh Manocha
A Comprehensive Study on NLP Data Augmentation for Hate Speech Detection: Legacy Methods, BERT, and LLMs
Md Saroar Jahan, Mourad Oussalah, Djamila Romaissa Beddia, Jhuma kabir Mim, Nabil Arhab