Dataset Similarity
Dataset similarity research focuses on developing robust methods to quantify the resemblance between datasets, crucial for evaluating model generalization, detecting data drift, and optimizing federated learning. Current efforts concentrate on creating dataset-agnostic metrics that are computationally efficient and privacy-preserving, often leveraging techniques like prototype-based representations or feature-importance analysis, and moving beyond simple distance measures to incorporate downstream task performance. These advancements are vital for improving the reliability of machine learning model evaluations and enhancing the efficiency and trustworthiness of data-driven applications across various domains.
Papers
September 11, 2024
April 29, 2024
April 15, 2024
March 31, 2024
August 7, 2023
March 23, 2023
September 5, 2022