High Quality
High-quality data is paramount for the success of machine learning models, driving research into efficient and reliable methods for data creation, curation, and evaluation. Current efforts focus on developing novel algorithms and model architectures, such as diffusion models, generative adversarial networks (GANs), and large language models (LLMs), to improve data quality across diverse domains, including image generation, speech processing, and natural language processing. These advancements are crucial for enhancing the performance and reliability of machine learning systems and enabling new applications in various fields, from medical imaging to robotics. The development of robust evaluation metrics and automated quality control methods is also a key area of focus.
Papers
Transferring Knowledge from High-Quality to Low-Quality MRI for Adult Glioma Diagnosis
Yanguang Zhao, Long Bai, Zhaoxi Zhang, Yanan Wu, Mobarakol Islam, Hongliang Ren
Little Giants: Synthesizing High-Quality Embedding Data at Scale
Haonan Chen, Liang Wang, Nan Yang, Yutao Zhu, Ziliang Zhao, Furu Wei, Zhicheng Dou
CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models
Liangdong Wang, Bo-Wen Zhang, Chengwei Wu, Hanyu Zhao, Xiaofeng Shi, Shuhao Gu, Jijie Li, Quanyue Ma, TengFei Pan, Guang Liu