Text Modality
Text modality research explores how textual information can be effectively integrated with other data modalities (e.g., images, audio, video) to improve the performance and capabilities of AI models. Current research focuses on developing multimodal models using transformer architectures and diffusion models, often incorporating techniques like prompt tuning and meta-learning to enhance controllability and generalization. This work is significant because it enables more sophisticated AI systems capable of understanding and generating complex information across various data types, with applications ranging from improved medical diagnosis to more realistic virtual environments.
Papers
Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities
Chayan Tank, Sarthak Pol, Vinayak Katoch, Shaina Mehta, Avinash Anand, Rajiv Ratn Shah
Epistemological Bias As a Means for the Automated Detection of Injustices in Text
Kenya Andrews, Lamogha Chiazor
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong, Yunji Kim, Sanghyuk Chun, Jiyoung Lee