Text Modality
Text modality research explores how textual information can be effectively integrated with other data modalities (e.g., images, audio, video) to improve the performance and capabilities of AI models. Current research focuses on developing multimodal models using transformer architectures and diffusion models, often incorporating techniques like prompt tuning and meta-learning to enhance controllability and generalization. This work is significant because it enables more sophisticated AI systems capable of understanding and generating complex information across various data types, with applications ranging from improved medical diagnosis to more realistic virtual environments.
Papers
Towards an On-device Agent for Text Rewriting
Yun Zhu, Yinxiao Liu, Felix Stahlberg, Shankar Kumar, Yu-hui Chen, Liangchen Luo, Lei Shu, Renjie Liu, Jindong Chen, Lei Meng
Few-shot Anomaly Detection in Text with Deviation Learning
Anindya Sundar Das, Aravind Ajay, Sriparna Saha, Monowar Bhuyan
CiteTracker: Correlating Image and Text for Visual Tracking
Xin Li, Yuqing Huang, Zhenyu He, Yaowei Wang, Huchuan Lu, Ming-Hsuan Yang
Universal Defensive Underpainting Patch: Making Your Text Invisible to Optical Character Recognition
JiaCheng Deng, Li Dong, Jiahao Chen, Diqun Yan, Rangding Wang, Dengpan Ye, Lingchen Zhao, Jinyu Tian
Text2KGBench: A Benchmark for Ontology-Driven Knowledge Graph Generation from Text
Nandana Mihindukulasooriya, Sanju Tiwari, Carlos F. Enguix, Kusum Lata