Vision Paper
Vision research currently focuses on developing robust and efficient methods for processing and understanding visual information, often integrating it with other modalities like language and touch. Key areas include improving the accuracy and efficiency of models like transformers and exploring alternatives such as Mamba and structured state space models for various tasks, ranging from object detection and segmentation to navigation and scene understanding. This work is driven by the need for improved performance in applications such as robotics, autonomous systems, medical image analysis, and assistive technologies, with a strong emphasis on addressing challenges like limited data, computational cost, and generalization to unseen scenarios.
Papers
Tighnari: Multi-modal Plant Species Prediction Based on Hierarchical Cross-Attention Using Graph-Based and Vision Backbone-Extracted Features
Haixu Liu, Penghao Jiang, Zerui Tao, Muyan Wan, Qiuzhuang Sun
From Language To Vision: A Case Study of Text Animation
Ping Chen, Richard Alo, Justin Rundell
Vision-Driven Prompt Optimization for Large Language Models in Multimodal Generative Tasks
Leo Franklin, Apiradee Boonmee, Kritsada Wongsuwan
Megatron: Evasive Clean-Label Backdoor Attacks against Vision Transformer
Xueluan Gong, Bowei Tian, Meng Xue, Shuike Li, Yanjiao Chen, Qian Wang
Foundation Models for Low-Resource Language Education (Vision Paper)
Zhaojun Ding, Zhengliang Liu, Hanqi Jiang, Yizhu Gao, Xiaoming Zhai, Tianming Liu, Ninghao Liu