Counterfactual Data Augmentation
Counterfactual data augmentation (CDA) is a technique used to improve machine learning models by generating synthetic data points that represent "what if" scenarios, altering specific features while maintaining others. Current research focuses on developing CDA methods for various tasks, including bias mitigation, anomaly detection, and improving model robustness, often employing large language models, diffusion models, and contrastive learning within model-based or rationale-centric frameworks. This approach addresses limitations of existing data, such as class imbalance, spurious correlations, and biases, leading to more accurate, fair, and generalizable models across diverse applications in natural language processing and other domains.