Vision Paper
Vision research currently focuses on developing robust and efficient methods for processing and understanding visual information, often integrating it with other modalities like language and touch. Key areas include improving the accuracy and efficiency of models like transformers and exploring alternatives such as Mamba and structured state space models for various tasks, ranging from object detection and segmentation to navigation and scene understanding. This work is driven by the need for improved performance in applications such as robotics, autonomous systems, medical image analysis, and assistive technologies, with a strong emphasis on addressing challenges like limited data, computational cost, and generalization to unseen scenarios.
Papers
Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations
Jianfeng Chi, Ujjwal Karn, Hongyuan Zhan, Eric Smith, Javier Rando, Yiming Zhang, Kate Plawiak, Zacharie Delpierre Coudert, Kartikeya Upasani, Mahesh Pasupuleti
Fill in the blanks: Rethinking Interpretability in vision
Pathirage N. Deelaka, Tharindu Wickremasinghe, Devin Y. De Silva, Lisara N. Gajaweera
Reducing Hyperparameter Tuning Costs in ML, Vision and Language Model Training Pipelines via Memoization-Awareness
Abdelmajid Essofi, Ridwan Salahuddeen, Munachiso Nwadike, Elnura Zhalieva, Kun Zhang, Eric Xing, Willie Neiswanger, Qirong Ho
To Ask or Not to Ask? Detecting Absence of Information in Vision and Language Navigation
Savitha Sam Abraham, Sourav Garg, Feras Dayoub
An Application-Agnostic Automatic Target Recognition System Using Vision Language Models
Anthony Palladino, Dana Gajewski, Abigail Aronica, Patryk Deptula, Alexander Hamme, Seiyoung C. Lee, Jeff Muri, Todd Nelling, Michael A. Riley, Brian Wong, Margaret Duff
Monocular Event-Based Vision for Obstacle Avoidance with a Quadrotor
Anish Bhattacharya, Marco Cannici, Nishanth Rao, Yuezhan Tao, Vijay Kumar, Nikolai Matni, Davide Scaramuzza
Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Shenghao Xie, Wenqiang Zu, Mingyang Zhao, Duo Su, Shilong Liu, Ruohua Shi, Guoqi Li, Shanghang Zhang, Lei Ma
Vision Paper: Designing Graph Neural Networks in Compliance with the European Artificial Intelligence Act
Barbara Hoffmann, Jana Vatter, Ruben Mayer
Elucidating the design space of language models for image generation
Xuantong Liu, Shaozhe Hao, Xianbiao Qi, Tianyang Hu, Jun Wang, Rong Xiao, Yuan Yao
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
Xumeng Han, Longhui Wei, Zhiyang Dou, Zipeng Wang, Chenhui Qiang, Xin He, Yingfei Sun, Zhenjun Han, Qi Tian
CL-HOI: Cross-Level Human-Object Interaction Distillation from Vision Large Language Models
Jianjun Gao, Chen Cai, Ruoyu Wang, Wenyang Liu, Kim-Hui Yap, Kratika Garg, Boon-Siew Han