Training Data Distribution

Training data distribution research focuses on how the characteristics of data used to train machine learning models affect model performance and robustness, particularly concerning out-of-distribution (OOD) generalization and federated learning scenarios. Current research explores techniques like logit scaling and weight perturbations to improve OOD detection, and methods such as data augmentation and knowledge distillation to enhance in-distribution generalization and address data heterogeneity in decentralized settings. These efforts are crucial for building reliable and adaptable AI systems, improving their performance in real-world applications where data distributions are often non-uniform and may shift over time.

Papers