Text Modality
Text modality research explores how textual information can be effectively integrated with other data modalities (e.g., images, audio, video) to improve the performance and capabilities of AI models. Current research focuses on developing multimodal models using transformer architectures and diffusion models, often incorporating techniques like prompt tuning and meta-learning to enhance controllability and generalization. This work is significant because it enables more sophisticated AI systems capable of understanding and generating complex information across various data types, with applications ranging from improved medical diagnosis to more realistic virtual environments.
Papers
Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François Germain, Jonathan Le Roux, Shinji Watanabe
Contextualising Levels of Language Resourcedness affecting Digital Processing of Text
C. Maria Keet, Langa Khumalo
ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning
Hosein Hasanbeig, Hiteshi Sharma, Leo Betthauser, Felipe Vieira Frujeri, Ida Momennejad
A Text Classification-Based Approach for Evaluating and Enhancing the Machine Interpretability of Building Codes
Zhe Zheng, Yu-Cheng Zhou, Ke-Yin Chen, Xin-Zheng Lu, Zhong-Tian She, Jia-Rui Lin