Faithful Generation
Faithful generation focuses on creating outputs—text, images, audio, code, or other data—that accurately reflect a given input or prompt, prioritizing correctness and adherence to specifications. Current research emphasizes improving the fidelity and controllability of generation using various model architectures, including diffusion models, transformers, and variational autoencoders, often incorporating techniques like retrieval-augmented generation and multi-agent frameworks. This field is significant for advancing AI capabilities across numerous domains, from improving large language model evaluations and enhancing human-computer interaction to creating more realistic synthetic data for training and analysis in various scientific fields.
Papers
DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models
Yizhuo Li, Yuying Ge, Yixiao Ge, Ping Luo, Ying Shan
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Yuying Ge, Yizhuo Li, Yixiao Ge, Ying Shan
Multi-Scale Node Embeddings for Graph Modeling and Generation
Riccardo Milocco, Fabian Jansen, Diego Garlaschelli
BodyMetric: Evaluating the Realism of HumanBodies in Text-to-Image Generation
Nefeli Andreou, Varsha Vivek, Ying Wang, Alex Vorobiov, Tiffany Deng, Raja Bala, Larry Davis, Betty Mohler Tesch
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
Yifan Lu, Xuanchi Ren, Jiawei Yang, Tianchang Shen, Zhangjie Wu, Jun Gao, Yue Wang, Siheng Chen, Mike Chen, Sanja Fidler, Jiahui Huang
DiffSign: AI-Assisted Generation of Customizable Sign Language Videos With Enhanced Realism
Sudha Krishnamurthy, Vimal Bhat, Abhinav Jain
CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
Hui Zhang, Dexiang Hong, Tingwei Gao, Yitong Wang, Jie Shao, Xinglong Wu, Zuxuan Wu, Yu-Gang Jiang
Style3D: Attention-guided Multi-view Style Transfer for 3D Object Generation
Bingjie Song, Xin Huang, Ruting Xie, Xue Wang, Qing Wang
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Qingdong He, Jinlong Peng, Pengcheng Xu, Boyuan Jiang, Xiaobin Hu, Donghao Luo, Yong Liu, Yabiao Wang, Chengjie Wang, Xiangtai Li, Jiangning Zhang
Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation
Gianni Franchi, Dat Nguyen Trong, Nacim Belkhir, Guoxuan Xia, Andrea Pilzer
PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
Qihan Huang, Long Chan, Jinlong Liu, Wanggui He, Hao Jiang, Mingli Song, Jie Song
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
Liao Qu, Huichao Zhang, Yiheng Liu, Xu Wang, Yi Jiang, Yiming Gao, Hu Ye, Daniel K. Du, Zehuan Yuan, Xinglong Wu
SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation
Alexey Bokhovkin, Quan Meng, Shubham Tulsiani, Angela Dai
XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation
Xiang Li, Kai Qiu, Hao Chen, Jason Kuen, Jiuxiang Gu, Jindong Wang, Zhe Lin, Bhiksha Raj
Using Large Language Models in Automatic Hint Ranking and Generation Tasks
Jamshid Mozafari, Florian Gerhold, Adam Jatowt
MBA-RAG: a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity
Xiaqiang Tang, Qiang Gao, Jian Li, Nan Du, Qi Li, Sihong Xie
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Shufan Li, Konstantinos Kallidromitis, Akash Gokul, Zichun Liao, Yusuke Kato, Kazuki Kozuka, Aditya Grover