Text Modality
Text modality research explores how textual information can be effectively integrated with other data modalities (e.g., images, audio, video) to improve the performance and capabilities of AI models. Current research focuses on developing multimodal models using transformer architectures and diffusion models, often incorporating techniques like prompt tuning and meta-learning to enhance controllability and generalization. This work is significant because it enables more sophisticated AI systems capable of understanding and generating complex information across various data types, with applications ranging from improved medical diagnosis to more realistic virtual environments.
Papers
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance
Jinbo Xing, Menghan Xia, Yuxin Liu, Yuechen Zhang, Yong Zhang, Yingqing He, Hanyuan Liu, Haoxin Chen, Xiaodong Cun, Xintao Wang, Ying Shan, Tien-Tsin Wong
FigGen: Text to Scientific Figure Generation
Juan A Rodriguez, David Vazquez, Issam Laradji, Marco Pedersoli, Pau Rodriguez
Don't Retrain, Just Rewrite: Countering Adversarial Perturbations by Rewriting Text
Ashim Gupta, Carter Wood Blum, Temma Choji, Yingjie Fei, Shalin Shah, Alakananda Vempala, Vivek Srikumar
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models
Xingqian Xu, Jiayi Guo, Zhangyang Wang, Gao Huang, Irfan Essa, Humphrey Shi
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting
Lei Shu, Liangchen Luo, Jayakumar Hoskere, Yun Zhu, Yinxiao Liu, Simon Tong, Jindong Chen, Lei Meng
Multi-modal Machine Learning for Vehicle Rating Predictions Using Image, Text, and Parametric Data
Hanqi Su, Binyang Song, Faez Ahmed
Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data
Petar Ivanov, Ivan Koychev, Momchil Hardalov, Preslav Nakov
Alt-Text with Context: Improving Accessibility for Images on Twitter
Nikita Srivatsan, Sofia Samaniego, Omar Florez, Taylor Berg-Kirkpatrick
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions
Woojeong Jin, Subhabrata Mukherjee, Yu Cheng, Yelong Shen, Weizhu Chen, Ahmed Hassan Awadallah, Damien Jose, Xiang Ren
Enabling Large Language Models to Generate Text with Citations
Tianyu Gao, Howard Yen, Jiatong Yu, Danqi Chen
Vision + Language Applications: A Survey
Yutong Zhou, Nobutaka Shimada