Diverse Set
Diverse sets, encompassing varied data points or model outputs, are a central focus in current machine learning research, aiming to improve model robustness, generalization, and explainability. Researchers are exploring diverse set generation and evaluation across various domains, employing techniques like diffusion models, contrastive learning, and gradient-based methods to achieve both diversity and quality in outputs, such as image generation, text generation, and audio captioning. This focus on diversity is crucial for addressing biases, enhancing model performance on underrepresented data, and improving the reliability and trustworthiness of AI systems in real-world applications. The development of new metrics and benchmarks for evaluating diversity is also a key area of ongoing work.
Papers
RetroGFN: Diverse and Feasible Retrosynthesis using GFlowNets
Piotr Gaiński, Michał Koziarski, Krzysztof Maziarz, Marwin Segler, Jacek Tabor, Marek Śmieja
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets
Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong
G-DIG: Towards Gradient-based Diverse and High-quality Instruction Data Selection for Machine Translation
Xingyuan Pan, Luyang Huang, Liyan Kang, Zhicheng Liu, Yu Lu, Shanbo Cheng
Diverse and Effective Synthetic Data Generation for Adaptable Zero-Shot Dialogue State Tracking
James D. Finch, Jinho D. Choi