Perceptual Information
Perceptual information research focuses on understanding how sensory inputs are processed and interpreted to form our experience of the world, exploring both biological and artificial systems. Current research emphasizes integrating perceptual knowledge from various modalities (vision, audio, language) using techniques like multimodal learning with vision transformers, diffusion models, and large language models to improve tasks such as object recognition, image classification, and cross-modal retrieval. These advancements have significant implications for improving AI systems' ability to understand and interact with the world in a more human-like way, as well as for gaining deeper insights into the neural mechanisms underlying human perception.
Papers
Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding
Zirui Shao, Chuwei Luo, Zhaoqing Zhu, Hangdi Xing, Zhi Yu, Qi Zheng, Jiajun Bu
Understanding Audiovisual Deepfake Detection: Techniques, Challenges, Human Factors and Perceptual Insights
Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, Hsin-Min Wang
Stitching Gaps: Fusing Situated Perceptual Knowledge with Vision Transformers for High-Level Image Classification
Delfina Sol Martinez Pandiani, Nicolas Lazzari, Valentina Presutti
Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition
Boyu Chen, Siran Chen, Kunchang Li, Qinglin Xu, Yu Qiao, Yali Wang