Vision Task
Vision tasks, encompassing image and video analysis for diverse applications, are a central focus in computer vision research. Current efforts concentrate on improving model efficiency and robustness, particularly through multi-task learning, the development of novel architectures like Vision Transformers and state-space models, and the incorporation of human feedback for improved alignment with user preferences. These advancements are driving progress in areas such as image compression for machine learning pipelines, multi-image understanding, and the creation of more robust and fair models for real-world deployment.
Papers
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Fei Wang, Xingyu Fu, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen
Parameter-Efficient Active Learning for Foundational models
Athmanarayanan Lakshmi Narayanan, Ranganath Krishnan, Amrutha Machireddy, Mahesh Subedar
Fusion of regional and sparse attention in Vision Transformers
Nabil Ibtehaz, Ning Yan, Masood Mortazavi, Daisuke Kihara
Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment
Venkanna Babu Guthula, Stefan Oehmcke, Remigio Chilaule, Hui Zhang, Nico Lang, Ankit Kariryaa, Johan Mottelson, Christian Igel
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Xingkui Zhu, Yiran Guan, Dingkang Liang, Yuchao Chen, Yuliang Liu, Xiang Bai