Coreset Size

Coreset size research focuses on minimizing the size of a weighted subset of data (a coreset) that accurately approximates the original dataset for various machine learning tasks. Current research emphasizes developing coresets for diverse applications, including regression, classification, federated learning, and clustering, often employing techniques like sensitivity sampling, gradient matching, and greedy algorithms to achieve dimension-independent or near-optimal coreset sizes. This work is significant because smaller coresets reduce computational costs and improve efficiency in large-scale data analysis, impacting both theoretical understanding and practical applications of machine learning.

Papers