Visual Feature
Visual features are fundamental to computer vision, aiming to extract meaningful information from images for various tasks like object recognition, image retrieval, and scene understanding. Current research emphasizes improving the robustness of feature extraction, particularly against variations in illumination and viewpoint, often employing deep learning models such as transformers and convolutional neural networks, along with techniques like self-supervised learning and multimodal fusion to integrate information from other modalities (e.g., text, audio). This work is crucial for advancing applications in diverse fields, including robotics, medical imaging, and accessibility technologies, by enabling more accurate, reliable, and interpretable computer vision systems.
Papers
The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023
Yurui Huang, Yang Yang, Shou Chen, Xiangyu Wu, Qingguo Chen, Jianfeng Lu
From Category to Scenery: An End-to-End Framework for Multi-Person Human-Object Interaction Recognition in Videos
Tanqiu Qiao, Ruochen Li, Frederick W. B. Li, Hubert P. H. Shum
TExplain: Explaining Learned Visual Features via Pre-trained (Frozen) Language Models
Saeid Asgari Taghanaki, Aliasghar Khani, Ali Saheb Pasand, Amir Khasahmadi, Aditya Sanghi, Karl D. D. Willis, Ali Mahdavi-Amiri
AAN: Attributes-Aware Network for Temporal Action Detection
Rui Dai, Srijan Das, Michael S. Ryoo, Francois Bremond