Image to Text Task

Image-to-text tasks aim to automatically generate textual descriptions from images, a crucial area in artificial intelligence bridging computer vision and natural language processing. Current research focuses on improving model accuracy and robustness, particularly using transformer-based architectures like VL-BART and VL-T5, while also addressing challenges such as adversarial attacks and ensuring semantic alignment between generated text and image content. These advancements have significant implications for various applications, including social media analysis, content generation, and accessibility technologies, driving ongoing efforts to enhance model efficiency and security.

Papers

October 2, 2024

Backdooring Vision-Language Models with Out-Of-Distribution Data
Weimin Lyu, Jiachen Yao, Saumya Gupta, Lu Pang, Tao Sun, Lingjie Yi, Lijie Hu, Haibin Ling, Chao Chen
Vision Language Model Backdoor Attack Multimodal Model Distribution Data Image to Text Task

September 14, 2023

Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks
Danae Sánchez Villegas, Daniel Preoţiuc-Pietro, Nikolaos Aletras
Social Medium Multimodal Model Contrastive Language Image Multimodal Information Multimodal Classification Image Text Representation Image to Text Task

August 3, 2023

Multimodal Neurons in Pretrained Text-Only Transformers
Sarah Schwettmann, Neil Chowdhury, Samuel Klein, David Bau, Antonio Torralba
Language Model Transformer Megatron Decepticons Pre Trained Visual Representation Individual Neuron Image to Text Task

June 13, 2023

I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models
Raz Lapid, Moshe Sipper
Adversarial Example Image to Text Gray Box Image to Text Task

May 17, 2023

What You See is What You Read? Improving Text-Image Alignment Evaluation
Michal Yarom, Yonatan Bitton, Soravit Changpinyo, Roee Aharoni, Jonathan Herzig, Oran Lang, Eran Ofek, Idan Szpektor
Image Text Pair Image Text Alignment Image Alignment Image to Text Generation Image to Text Task

January 5, 2023

Adaptively Clustering Neighbor Elements for Image-Text Generation
Zihua Wang, Xu Yang, Hanwang Zhang, Haiyang Xu, Ming Yan, Fei Huang, Yu Zhang
Neighborhood Selection Image to Text Generation Image to Text Task Phrase Alignment

October 20, 2022

Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation
Yu Zhao, Jianguo Wei, Zhichao Lin, Yueheng Sun, Meishan Zhang, Min Zhang
Image to Text Image to Text Generation Image to Text Task Visual Spatial Description

December 13, 2021

VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
Yi-Lin Sung, Jaemin Cho, Mohit Bansal
Language Model Vision Language Task Video Text Parameter Efficient Transfer Learning Video Text Task SAM2 Adapter Parameter Efficient Adapter Image to Text Task