Faithful Generation
Faithful generation focuses on creating outputs—text, images, audio, code, or other data—that accurately reflect a given input or prompt, prioritizing correctness and adherence to specifications. Current research emphasizes improving the fidelity and controllability of generation using various model architectures, including diffusion models, transformers, and variational autoencoders, often incorporating techniques like retrieval-augmented generation and multi-agent frameworks. This field is significant for advancing AI capabilities across numerous domains, from improving large language model evaluations and enhancing human-computer interaction to creating more realistic synthetic data for training and analysis in various scientific fields.
Papers
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
Kun Su, Xiulong Liu, Eli Shlizerman
HardCore Generation: Generating Hard UNSAT Problems for Data Augmentation
Joseph Cotnareanu, Zhanguang Zhang, Hui-Ling Zhen, Yingxue Zhang, Mark Coates
GenesisTex2: Stable, Consistent and High-Quality Text-to-Texture Generation
Jiawei Lu, Yingpeng Zhang, Zengjun Zhao, He Wang, Kun Zhou, Tianjia Shao
Embodied-RAG: General non-parametric Embodied Memory for Retrieval and Generation
Quanting Xie, So Yeon Min, Tianyi Zhang, Aarav Bajaj, Ruslan Salakhutdinov, Matthew Johnson-Roberson, Yonatan Bisk
CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches
Sifan Wu, Amir Khasahmadi, Mor Katz, Pradeep Kumar Jayaraman, Yewen Pu, Karl Willis, Bang Liu
Simultaneous Music Separation and Generation Using Multi-Track Latent Diffusion Models
Tornike Karchkhadze, Mohammad Rasool Izadi, Shlomo Dubnov
Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models
Lorenzo Mandelli, Stefano Berretti
Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation
Haohan Guo, Fenglong Xie, Dongchao Yang, Xixin Wu, Helen Meng
VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos
Yan-Bo Lin, Yu Tian, Linjie Yang, Gedas Bertasius, Heng Wang
StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos
Sijie Zhao, Wenbo Hu, Xiaodong Cun, Yong Zhang, Xiaoyu Li, Zhe Kong, Xiangjun Gao, Muyao Niu, Ying Shan
TrialSynth: Generation of Synthetic Sequential Clinical Trial Data
Chufan Gao, Mandis Beigi, Afrah Shafquat, Jacob Aptekar, Jimeng Sun