Computer Vision
Computer vision, a field focused on enabling computers to "see" and interpret images and videos, aims to develop algorithms that can perform tasks such as object detection, image classification, and scene understanding. Current research heavily utilizes deep learning, particularly convolutional neural networks (CNNs) and vision transformers (ViTs), often combined with techniques like multi-modal fusion (integrating data from different sensors) and transfer learning to improve efficiency and accuracy. These advancements are driving significant progress in diverse applications, including precision agriculture, robotics, medical imaging analysis, and autonomous systems, by providing automated, efficient, and objective solutions to complex visual tasks.
Papers
An Inpainting-Infused Pipeline for Attire and Background Replacement
Felipe Rodrigues Perche-Mahlow, André Felipe-Zanella, William Alberto Cruz-Castañeda, Marcellus Amadeus
PFDM: Parser-Free Virtual Try-on via Diffusion Model
Yunfang Niu, Dong Yi, Lingxiang Wu, Zhiwei Liu, Pengxiang Cai, Jinqiao Wang
Exploring the Synergies of Hybrid CNNs and ViTs Architectures for Computer Vision: A survey
Haruna Yunusa, Shiyin Qin, Abdulrahman Hamman Adama Chukkol, Abdulganiyu Abdu Yusuf, Isah Bello, Adamu Lawan
OmniSCV: An Omnidirectional Synthetic Image Generator for Computer Vision
Bruno Berenguel-Baeta, Jesus Bermudez-Cameo, Jose J. Guerrero
Bridging Human Concepts and Computer Vision for Explainable Face Verification
Miriam Doh, Caroline Mazini Rodrigues, Nicolas Boutry, Laurent Najman, Matei Mancas, Hugues Bersini
MESA: Matching Everything by Segmenting Anything
Yesheng Zhang, Xu Zhao
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg, Timo Lüddecke, Jonathan Henrich, Sharmita Dey, Matthias Nuske, Valentin Hassler, Derek Murphy, Julia Fischer, Julia Ostner, Oliver Schülke, Peter M. Kappeler, Claudia Fichtel, Alexander Gail, Stefan Treue, Hansjörg Scherberger, Florentin Wörgötter, Alexander S. Ecker
Leveraging Positional Encoding for Robust Multi-Reference-Based Object 6D Pose Estimation
Jaewoo Park, Jaeguk Kim, Nam Ik Cho
VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models
Yi Zhao, Yilin Zhang, Rong Xiang, Jing Li, Hillming Li