Data Augmentation
Data augmentation is a technique used to artificially expand datasets by creating modified versions of existing data, primarily to improve the performance and robustness of machine learning models, especially when training data is scarce. Current research focuses on developing more sophisticated augmentation methods, including those leveraging generative models like GANs and diffusion models, and integrating augmentation with other techniques such as contrastive learning and transfer learning, often applied within architectures like transformers and convolutional neural networks. This work is significant because it addresses the limitations of limited datasets across various domains, from image classification and object detection to natural language processing and time series forecasting, leading to more accurate and generalizable models for diverse applications.
Papers
Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models
Philip Harris, Michael Kagan, Jeffrey Krupa, Benedikt Maier, Nathaniel Woodward
Genetic Learning for Designing Sim-to-Real Data Augmentations
Bram Vanherle, Nick Michiels, Frank Van Reeth
Distributionally Generative Augmentation for Fair Facial Attribute Classification
Fengda Zhang, Qianpei He, Kun Kuang, Jiashuo Liu, Long Chen, Chao Wu, Jun Xiao, Hanwang Zhang
BSDA: Bayesian Random Semantic Data Augmentation for Medical Image Classification
Yaoyao Zhu, Xiuding Cai, Xueyao Wang, Xiaoqing Chen, Yu Yao, Zhongliang Fu
Large Language Models on Fine-grained Emotion Detection Dataset with Data Augmentation and Transfer Learning
Kaipeng Wang, Zhi Jing, Yongye Su, Yikun Han
Augmentations vs Algorithms: What Works in Self-Supervised Learning
Warren Morningstar, Alex Bijamov, Chris Duvarney, Luke Friedman, Neha Kalibhat, Luyang Liu, Philip Mansfield, Renan Rojas-Gomez, Karan Singhal, Bradley Green, Sushant Prakash
Cross-lingual Transfer or Machine Translation? On Data Augmentation for Monolingual Semantic Textual Similarity
Sho Hoshino, Akihiko Kato, Soichiro Murakami, Peinan Zhang
Large Language Models for Document-Level Event-Argument Data Augmentation for Challenging Role Types
Joseph Gatto, Parker Seegmiller, Omar Sharif, Sarah M. Preum
Emergent Equivariance in Deep Ensembles
Jan E. Gerken, Pan Kessel
Data Augmentation using LLMs: Data Perspectives, Learning Paradigms and Challenges
Bosheng Ding, Chengwei Qin, Ruochen Zhao, Tianze Luo, Xinze Li, Guizhen Chen, Wenhan Xia, Junjie Hu, Anh Tuan Luu, Shafiq Joty
Predicting UAV Type: An Exploration of Sampling and Data Augmentation for Time Series Classification
Tarik Crnovrsanin, Calvin Yu, Dane Hankamer, Cody Dunne
Improving Android Malware Detection Through Data Augmentation Using Wasserstein Generative Adversarial Networks
Kawana Stalin, Mikias Berhanu Mekoya
Enhancing Protein Predictive Models via Proteins Data Augmentation: A Benchmark and New Directions
Rui Sun, Lirong Wu, Haitao Lin, Yufei Huang, Stan Z. Li
Improving Socratic Question Generation using Data Augmentation and Preference Optimization
Nischal Ashok Kumar, Andrew Lan