Computer Vision Task
Computer vision research focuses on enabling computers to "see" and interpret images, addressing challenges like object recognition, scene understanding, and image manipulation. Current efforts concentrate on improving model robustness to adverse conditions (e.g., low light, bad weather), handling out-of-distribution data, and enhancing efficiency through model compression techniques like pruning and quantization. Transformer-based architectures and state-space models are prominent, alongside ongoing exploration of generative AI for data augmentation and novel approaches like leveraging natural language models for visual tasks. These advancements are crucial for applications ranging from autonomous driving and medical image analysis to industrial automation and aerospace missions.
Papers
The Key of Understanding Vision Tasks: Explanatory Instructions
Yang Shen, Xiu-Shen Wei, Yifan Sun, Yuxin Song, Tao Yuan, Jian Jin, Heyang Xu, Yazhou Yao, Errui Ding
Semantics Disentanglement and Composition for Versatile Codec toward both Human-eye Perception and Machine Vision Task
Jinming Liu, Yuntao Wei, Junyan Lin, Shengyang Zhao, Heming Sun, Zhibo Chen, Wenjun Zeng, Xin Jin
HyperCLIP: Adapting Vision-Language models with Hypernetworks
Victor Akinwande, Mohammad Sadegh Norouzzadeh, Devin Willmott, Anna Bair, Madan Ravi Ganesh, J. Zico Kolter
ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition
Seungdong Yoa, Seungjun Lee, Hyeseung Cho, Bumsoo Kim, Woohyung Lim