Diverse Datasets
Diverse datasets are crucial for training robust and unbiased machine learning models, addressing the limitations of relying on homogenous data. Current research focuses on creating and utilizing such datasets across various domains, including computer vision (e.g., facial expressions, medical imaging), natural language processing (e.g., multilingual summarization, sentiment analysis), and audio processing (e.g., environmental sounds, speech), often employing deep learning architectures like convolutional and recurrent neural networks, and graph neural networks. The availability of these datasets is driving advancements in model performance and fairness, impacting fields ranging from healthcare diagnostics to social media content moderation and beyond.
Papers
Dereflection Any Image with Diffusion Priors and Diversified Data
Jichen Hu, Chen Yang, Zanwei Zhou, Jiemin Fang, Xiaokang Yang, Qi Tian, Wei ShenShanghai Jiao Tong University●Huawei Inc.TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning
Sheng Wang, Pengan Chen, Jingqi Zhou, Qintong Li, Jingwei Dong, Jiahui Gao, Boyang Xue, Jiyue Jiang, Lingpeng Kong, Chuan WuThe University of Hong Kong●Xi’an Jiaotong University●The Chinese University of Hong Kong