Optimal Split

Optimal splitting of datasets is crucial for reliable model training and evaluation in machine learning, aiming to minimize bias and accurately estimate generalization performance. Current research focuses on developing efficient algorithms for optimal splits in various contexts, including streaming data, federated learning (with methods like ESFL optimizing resource allocation across devices), and handling near-duplicates in datasets. These advancements improve model accuracy and robustness, particularly addressing challenges like bias detection and out-of-distribution generalization, ultimately leading to more reliable and trustworthy machine learning models across diverse applications.

Papers