Text Modality
Text modality research explores how textual information can be effectively integrated with other data modalities (e.g., images, audio, video) to improve the performance and capabilities of AI models. Current research focuses on developing multimodal models using transformer architectures and diffusion models, often incorporating techniques like prompt tuning and meta-learning to enhance controllability and generalization. This work is significant because it enables more sophisticated AI systems capable of understanding and generating complex information across various data types, with applications ranging from improved medical diagnosis to more realistic virtual environments.
Papers
This Paper Had the Smartest Reviewers -- Flattery Detection Utilising an Audio-Textual Transformer-Based Approach
Lukas Christ, Shahin Amiriparian, Friederike Hawighorst, Ann-Kathrin Schill, Angelo Boutalikakis, Lorenz Graf-Vlachy, Andreas König, Björn W. Schuller
Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text
Xinyang Li, Zhangyu Lai, Linning Xu, Yansong Qu, Liujuan Cao, Shengchuan Zhang, Bo Dai, Rongrong Ji
Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights
Hao Yang, Lizhen Qu, Ehsan Shareghi, Gholamreza Haffari
A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens
Zhijie Nie, Richong Zhang, Zhanyu Wu
MatText: Do Language Models Need More than Text & Scale for Materials Modeling?
Nawaf Alampara, Santiago Miret, Kevin Maik Jablonka
Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods
Kathleen C. Fraser, Hillary Dawkins, Svetlana Kiritchenko
Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations
Lichao Zhang, Jia Yu, Shuai Zhang, Long Li, Yangyang Zhong, Guanbao Liang, Yuming Yan, Qing Ma, Fangsheng Weng, Fayu Pan, Jing Li, Renjun Xu, Zhenzhong Lan
Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models through Question Answering from Text to Video
Zhengbang Yang, Haotian Xia, Jingxi Li, Zezhi Chen, Zhuangdi Zhu, Weining Shen
Is This a Bad Table? A Closer Look at the Evaluation of Table Generation from Text
Pritika Ramu, Aparna Garimella, Sambaran Bandyopadhyay