Large Datasets
Large datasets are driving advancements in machine learning, with research focusing on efficiently managing, processing, and extracting insights from massive amounts of data. Current efforts concentrate on developing scalable algorithms and model architectures, such as those based on Gaussian processes, optimal transport, and hierarchical representations, to handle the computational and storage challenges posed by these datasets. This research is crucial for improving the accuracy and generalizability of machine learning models across diverse applications, from recommendation systems and natural language processing to medical image analysis and earth observation. Furthermore, methods for data valuation, pruning, and distillation are being explored to enhance data quality and efficiency.
Papers
A Systematic Review of NeurIPS Dataset Management Practices
Yiwei Wu, Leah Ajmani, Shayne Longpre, Hanlin Li
Cycle-Constrained Adversarial Denoising Convolutional Network for PET Image Denoising: Multi-Dimensional Validation on Large Datasets with Reader Study and Real Low-Dose Data
Yucun Hou, Fenglin Zhan, Xin Cheng, Chenxi Li, Ziquan Yuan, Runze Liao, Haihao Wang, Jianlang Hua, Jing Wu, Jianyong Jiang