Vision Task
Vision tasks, encompassing image and video analysis for diverse applications, are a central focus in computer vision research. Current efforts concentrate on improving model efficiency and robustness, particularly through multi-task learning, the development of novel architectures like Vision Transformers and state-space models, and the incorporation of human feedback for improved alignment with user preferences. These advancements are driving progress in areas such as image compression for machine learning pipelines, multi-image understanding, and the creation of more robust and fair models for real-world deployment.
Papers
The Key of Understanding Vision Tasks: Explanatory Instructions
Yang Shen, Xiu-Shen Wei, Yifan Sun, Yuxin Song, Tao Yuan, Jian Jin, Heyang Xu, Yazhou Yao, Errui Ding
Semantics Disentanglement and Composition for Versatile Codec toward both Human-eye Perception and Machine Vision Task
Jinming Liu, Yuntao Wei, Junyan Lin, Shengyang Zhao, Heming Sun, Zhibo Chen, Wenjun Zeng, Xin Jin
YOLOv11: An Overview of the Key Architectural Enhancements
Rahima Khanam, Muhammad Hussain
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Ziyu Liu, Yuhang Zang, Xiaoyi Dong, Pan Zhang, Yuhang Cao, Haodong Duan, Conghui He, Yuanjun Xiong, Dahua Lin, Jiaqi Wang