Faithful Generation
Faithful generation focuses on creating outputs—text, images, audio, code, or other data—that accurately reflect a given input or prompt, prioritizing correctness and adherence to specifications. Current research emphasizes improving the fidelity and controllability of generation using various model architectures, including diffusion models, transformers, and variational autoencoders, often incorporating techniques like retrieval-augmented generation and multi-agent frameworks. This field is significant for advancing AI capabilities across numerous domains, from improving large language model evaluations and enhancing human-computer interaction to creating more realistic synthetic data for training and analysis in various scientific fields.
Papers
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Ruiqi Li, Siqi Zheng, Xize Cheng, Ziang Zhang, Shengpeng Ji, Zhou Zhao
3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation
Dewei Zhou, Ji Xie, Zongxin Yang, Yi Yang
Retrieval-Reasoning Large Language Model-based Synthetic Clinical Trial Generation
Zerui Xu, Fang Wu, Tianfan Fu, Yue Zhao
Imagine2Servo: Intelligent Visual Servoing with Diffusion-Driven Goal Generation for Robotic Tasks
Pranjali Pathre, Gunjan Gupta, M. Nomaan Qureshi, Mandyam Brunda, Samarth Brahmbhatt, K. Madhava Krishna
Theoretical Analysis of Hierarchical Language Recognition and Generation by Transformers without Positional Encoding
Daichi Hayakawa, Issei Sato
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Huadai Liu, Jialei Wang, Rongjie Huang, Yang Liu, Heng Lu, Wei Xue, Zhou Zhao
EmotionCaps: Enhancing Audio Captioning Through Emotion-Augmented Data Generation
Mithun Manivannan (1), Vignesh Nethrapalli (1), Mark Cartwright (1) ((1) New Jersey Institute of Technology)
SEER: Self-Aligned Evidence Extraction for Retrieval-Augmented Generation
Xinping Zhao, Dongfang Li, Yan Zhong, Boren Hu, Yibin Chen, Baotian Hu, Min Zhang
ChatHouseDiffusion: Prompt-Guided Generation and Editing of Floor Plans
Sizhong Qin, Chengyu He, Qiaoyun Chen, Sen Yang, Wenjie Liao, Yi Gu, Xinzheng Lu
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Shi Yu, Chaoyue Tang, Bokai Xu, Junbo Cui, Junhao Ran, Yukun Yan, Zhenghao Liu, Shuo Wang, Xu Han, Zhiyuan Liu, Maosong Sun
UniGEM: A Unified Approach to Generation and Property Prediction for Molecules
Shikun Feng, Yuyan Ni, Yan Lu, Zhi-Ming Ma, Wei-Ying Ma, Yanyan Lan
Skill Learning Using Process Mining for Large Language Model Plan Generation
Andrei Cosmin Redis, Mohammadreza Fani Sani, Bahram Zarrin, Andrea Burattin
Parameterize Structure with Differentiable Template for 3D Shape Generation
Changfeng Ma, Pengxiao Guo, Shuangyu Yang, Yinuo Chen, Jie Guo, Chongjun Wang, Yanwen Guo, Wenping Wang
Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models?
Zeliang Zhang, Xin Liang, Mingqian Feng, Susan Liang, Chenliang Xu
M2M-Gen: A Multimodal Framework for Automated Background Music Generation in Japanese Manga Using Large Language Models
Megha Sharma, Muhammad Taimoor Haseeb, Gus Xia, Yoshimasa Tsuruoka
Multi class activity classification in videos using Motion History Image generation
Senthilkumar Gopal
ECIS-VQG: Generation of Entity-centric Information-seeking Questions from Videos
Arpan Phukan, Manish Gupta, Asif Ekbal
Quebec Automobile Insurance Question-Answering With Retrieval-Augmented Generation
David Beauchemin, Zachary Gagnon, Ricahrd Khoury
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation
Guanting Dong, Xiaoshuai Song, Yutao Zhu, Runqi Qiao, Zhicheng Dou, Ji-Rong Wen
The Future of Learning in the Age of Generative AI: Automated Question Generation and Assessment with Large Language Models
Subhankar Maity, Aniket Deroy