Text Modality
Text modality research explores how textual information can be effectively integrated with other data modalities (e.g., images, audio, video) to improve the performance and capabilities of AI models. Current research focuses on developing multimodal models using transformer architectures and diffusion models, often incorporating techniques like prompt tuning and meta-learning to enhance controllability and generalization. This work is significant because it enables more sophisticated AI systems capable of understanding and generating complex information across various data types, with applications ranging from improved medical diagnosis to more realistic virtual environments.
Papers
Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval
Delong Liu, Haiwen Li, Zhicheng Zhao, Yuan Dong, Nikolaos V. Boulgouris
Multimodal Machine Learning for Extraction of Theorems and Proofs in the Scientific Literature
Shrey Mishra, Antoine Gauquier, Pierre Senellart
ActionPrompt: Action-Guided 3D Human Pose Estimation With Text and Pose Prompting
Hongwei Zheng, Han Li, Bowen Shi, Wenrui Dai, Botao Wan, Yu Sun, Min Guo, Hongkai Xiong
Text + Sketch: Image Compression at Ultra Low Rates
Eric Lei, Yiğit Berkay Uslu, Hamed Hassani, Shirin Saeedi Bidokhti
Dipping PLMs Sauce: Bridging Structure and Text for Effective Knowledge Graph Completion via Conditional Soft Prompting
Chen Chen, Yufei Wang, Aixin Sun, Bing Li, Kwok-Yan Lam
Racial Bias Trends in the Text of US Legal Opinions
Rohan Jinturkar