Text Modality
Text modality research explores how textual information can be effectively integrated with other data modalities (e.g., images, audio, video) to improve the performance and capabilities of AI models. Current research focuses on developing multimodal models using transformer architectures and diffusion models, often incorporating techniques like prompt tuning and meta-learning to enhance controllability and generalization. This work is significant because it enables more sophisticated AI systems capable of understanding and generating complex information across various data types, with applications ranging from improved medical diagnosis to more realistic virtual environments.
Papers
Leveraging Pre-trained AudioLDM for Text to Sound Generation: A Benchmark Study
Yi Yuan, Haohe Liu, Jinhua Liang, Xubo Liu, Mark D. Plumbley, Wenwu Wang
Adaptive Knowledge Distillation between Text and Speech Pre-trained Models
Jinjie Ni, Yukun Ma, Wen Wang, Qian Chen, Dianwen Ng, Han Lei, Trung Hieu Nguyen, Chong Zhang, Bin Ma, Erik Cambria
Joint Representations of Text and Knowledge Graphs for Retrieval and Evaluation
Teven Le Scao, Claire Gardent
Automatically Classifying Emotions based on Text: A Comparative Exploration of Different Datasets
Anna Koufakou, Jairo Garciga, Adam Paul, Joseph Morelli, Christopher Frank
UniFLG: Unified Facial Landmark Generator from Text or Speech
Kentaro Mitsui, Yukiya Hono, Kei Sawada