Vision Paper
Vision research currently focuses on developing robust and efficient methods for processing and understanding visual information, often integrating it with other modalities like language and touch. Key areas include improving the accuracy and efficiency of models like transformers and exploring alternatives such as Mamba and structured state space models for various tasks, ranging from object detection and segmentation to navigation and scene understanding. This work is driven by the need for improved performance in applications such as robotics, autonomous systems, medical image analysis, and assistive technologies, with a strong emphasis on addressing challenges like limited data, computational cost, and generalization to unseen scenarios.
Papers
Online Estimation of Articulated Objects with Factor Graphs using Vision and Proprioceptive Sensing
Russell Buchanan, Adrian Röfer, João Moura, Abhinav Valada, Sethu Vijayakumar
Pushing Large Language Models to the 6G Edge: Vision, Challenges, and Opportunities
Zheng Lin, Guanqiao Qu, Qiyuan Chen, Xianhao Chen, Zhe Chen, Kaibin Huang
LuViRA Dataset Validation and Discussion: Comparing Vision, Radio, and Audio Sensors for Indoor Localization
Ilayda Yaman, Guoda Tian, Erik Tegler, Jens Gulin, Nikhil Challa, Fredrik Tufvesson, Ove Edfors, Kalle Astrom, Steffen Malkowsky, Liang Liu
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima, Hadar Averbuch-Elor, Yoav Artzi