Counterfactual Data Augmentation
Counterfactual data augmentation (CDA) is a technique used to improve machine learning models by generating synthetic data points that represent "what if" scenarios, altering specific features while maintaining others. Current research focuses on developing CDA methods for various tasks, including bias mitigation, anomaly detection, and improving model robustness, often employing large language models, diffusion models, and contrastive learning within model-based or rationale-centric frameworks. This approach addresses limitations of existing data, such as class imbalance, spurious correlations, and biases, leading to more accurate, fair, and generalizable models across diverse applications in natural language processing and other domains.
Papers
Does Your Model Classify Entities Reasonably? Diagnosing and Mitigating Spurious Correlations in Entity Typing
Nan Xu, Fei Wang, Bangzheng Li, Mingtao Dong, Muhao Chen
Counterfactual Data Augmentation improves Factuality of Abstractive Summarization
Dheeraj Rajagopal, Siamak Shakeri, Cicero Nogueira dos Santos, Eduard Hovy, Chung-Ching Chang