Faithful Generation
Faithful generation focuses on creating outputs—text, images, audio, code, or other data—that accurately reflect a given input or prompt, prioritizing correctness and adherence to specifications. Current research emphasizes improving the fidelity and controllability of generation using various model architectures, including diffusion models, transformers, and variational autoencoders, often incorporating techniques like retrieval-augmented generation and multi-agent frameworks. This field is significant for advancing AI capabilities across numerous domains, from improving large language model evaluations and enhancing human-computer interaction to creating more realistic synthetic data for training and analysis in various scientific fields.
Papers
OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation
Weiqi Li, Shijie Zhao, Chong Mou, Xuhan Sheng, Zhenyu Zhang, Qian Wang, Junlin Li, Li Zhang, Jian Zhang
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Hao Li, Changyao Tian, Jie Shao, Xizhou Zhu, Zhaokai Wang, Jinguo Zhu, Wenhan Dou, Xiaogang Wang, Hongsheng Li, Lewei Lu, Jifeng Dai
On the Generation and Removal of Speaker Adversarial Perturbation for Voice-Privacy Protection
Chenyang Guo, Liping Chen, Zhuhai Li, Kong Aik Lee, Zhen-Hua Ling, Wu Guo
Greek2MathTex: A Greek Speech-to-Text Framework for LaTeX Equations Generation
Evangelia Gkritzali, Panagiotis Kaliosis, Sofia Galanaki, Elisavet Palogiannidi, Theodoros Giannakopoulos
ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation
Daniel Winter, Asaf Shul, Matan Cohen, Dana Berman, Yael Pritch, Alex Rav-Acha, Yedid Hoshen
Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation
Hongming Guo, Ruibo Fu, Yizhong Geng, Shuai Liu, Shuchen Shi, Tao Wang, Chunyu Qiang, Chenxing Li, Ya Li, Zhengqi Wen, Yukun Liu, Xuefei Liu
Learning Flow Fields in Attention for Controllable Person Image Generation
Zijian Zhou, Shikun Liu, Xiao Han, Haozhe Liu, Kam Woh Ng, Tian Xie, Yuren Cong, Hang Li, Mengmeng Xu, Juan-Manuel Pérez-Rúa, Aditya Patel, Tao Xiang, Miaojing Shi, Sen He
Multi-GraspLLM: A Multimodal LLM for Multi-Hand Semantic Guided Grasp Generation
Haosheng Li, Weixin Mao, Weipeng Deng, Chenyu Meng, Haoqiang Fan, Tiancai Wang, Ping Tan, Hongan Wang, Xiaoming Deng
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
Xiao Fu, Xian Liu, Xintao Wang, Sida Peng, Menghan Xia, Xiaoyu Shi, Ziyang Yuan, Pengfei Wan, Di Zhang, Dahua Lin
PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation
Fatemeh Nazarieh, Zhenhua Feng, Diptesh Kanojia, Muhammad Awais, Josef Kittler
TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation
Alfredo Garrachón Ruiz, Tomás de la Rosa, Daniel Borrajo
Label-Confidence-Aware Uncertainty Estimation in Natural Language Generation
Qinhong Lin, Linna Zhou, Zhongliang Yang, Yuang Cai
Motion-aware Contrastive Learning for Temporal Panoptic Scene Graph Generation
Thong Thanh Nguyen, Xiaobao Wu, Yi Bin, Cong-Duy T Nguyen, See-Kiong Ng, Anh Tuan Luu
JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM
Takuro Fujii, Satoru Katsumata
MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large Language Models
Shansong Liu, Atin Sakkeer Hussain, Qilong Wu, Chenshuo Sun, Ying Shan
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations
Mingjie Xu, Mengyang Wu, Yuzhi Zhao, Jason Chun Lok Li, Weifeng Ou
VidMusician: Video-to-Music Generation with Semantic-Rhythmic Alignment via Hierarchical Visual Features
Sifei Li, Binxin Yang, Chunji Yin, Chong Sun, Yuxin Zhang, Weiming Dong, Chen Li
Towards Long Video Understanding via Fine-detailed Video Story Generation
Zeng You, Zhiquan Wen, Yaofo Chen, Xin Li, Runhao Zeng, Yaowei Wang, Mingkui Tan
Anti-Reference: Universal and Immediate Defense Against Reference-Based Generation
Yiren Song, Shengtao Lou, Xiaokang Liu, Hai Ci, Pei Yang, Jiaming Liu, Mike Zheng Shou
Language-Guided Image Tokenization for Generation
Kaiwen Zha, Lijun Yu, Alireza Fathi, David A. Ross, Cordelia Schmid, Dina Katabi, Xiuye Gu