Proxy Dataset

Proxy datasets are simplified representations of complex data used to improve efficiency or address limitations in various machine learning tasks. Current research focuses on developing effective proxy models, including those based on kernel methods, neural networks (like ResNets and Transformers), and autoencoders, and optimizing their use in diverse applications such as causal inference, federated learning, and adversarial robustness. The development and application of robust and informative proxy datasets are crucial for advancing machine learning research and enabling practical applications where using full datasets is computationally prohibitive or ethically challenging.

Papers