Data Splitting
Data splitting, the partitioning of datasets into training, validation, and testing subsets, is crucial for developing and evaluating machine learning models. Current research emphasizes developing splitting strategies that avoid data leakage and bias, particularly addressing challenges posed by non-IID data, temporal dependencies (as in time series or video data), and imbalanced class distributions. These improved splitting techniques, often coupled with advanced model architectures like transformers and physics-informed neural networks, aim to enhance model generalizability and reliability, leading to more robust and trustworthy machine learning applications across diverse fields.
Papers
November 1, 2022
October 15, 2022
April 19, 2022
April 11, 2022