Vision Paper
Vision research currently focuses on developing robust and efficient methods for processing and understanding visual information, often integrating it with other modalities like language and touch. Key areas include improving the accuracy and efficiency of models like transformers and exploring alternatives such as Mamba and structured state space models for various tasks, ranging from object detection and segmentation to navigation and scene understanding. This work is driven by the need for improved performance in applications such as robotics, autonomous systems, medical image analysis, and assistive technologies, with a strong emphasis on addressing challenges like limited data, computational cost, and generalization to unseen scenarios.
Papers
StratXplore: Strategic Novelty-seeking and Instruction-aligned Exploration for Vision and Language Navigation
Muraleekrishna Gopinathan, Jumana Abu-Khalaf, David Suter, Martin Masek
Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling
Georgios Pantazopoulos, Malvina Nikandrou, Alessandro Suglia, Oliver Lemon, Arash Eshghi
PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery
Adrito Das, Danyal Z. Khan, Dimitrios Psychogyios, Yitong Zhang, John G. Hanrahan, Francisco Vasconcelos, You Pang, Zhen Chen, Jinlin Wu, Xiaoyang Zou, Guoyan Zheng, Abdul Qayyum, Moona Mazher, Imran Razzak, Tianbin Li, Jin Ye, Junjun He, Szymon Płotka, Joanna Kaleta, Amine Yamlahi, Antoine Jund, Patrick Godau, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa, Dominik Rivoir, Alejandra Pérez, Santiago Rodriguez, Pablo Arbeláez, Danail Stoyanov, Hani J. Marcus, Sophia Bano
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Jeong Hun Yeo, Chae Won Kim, Hyunjun Kim, Hyeongseop Rha, Seunghee Han, Wen-Huang Cheng, Yong Man Ro