Computer Vision
Computer vision, a field focused on enabling computers to "see" and interpret images and videos, aims to develop algorithms that can perform tasks such as object detection, image classification, and scene understanding. Current research heavily utilizes deep learning, particularly convolutional neural networks (CNNs) and vision transformers (ViTs), often combined with techniques like multi-modal fusion (integrating data from different sensors) and transfer learning to improve efficiency and accuracy. These advancements are driving significant progress in diverse applications, including precision agriculture, robotics, medical imaging analysis, and autonomous systems, by providing automated, efficient, and objective solutions to complex visual tasks.
Papers
Cycle Pixel Difference Network for Crisp Edge Detection
Changsong Liu, Wei Zhang, Yanyan Liu, Mingyang Li, Wenlin Li, Yimeng Fan, Xiangnan Bai, Liang Zhang
On Evaluation of Vision Datasets and Models using Human Competency Frameworks
Rahul Ramachandran, Tejal Kulkarni, Charchit Sharma, Deepak Vijaykeerthy, Vineeth N Balasubramanian
Unveiling Deep Shadows: A Survey on Image and Video Shadow Detection, Removal, and Generation in the Era of Deep Learning
Xiaowei Hu, Zhenghao Xing, Tianyu Wang, Chi-Wing Fu, Pheng-Ann Heng
Real-Time Indoor Object Detection based on hybrid CNN-Transformer Approach
Salah Eddine Laidoudi, Madjid Maidi, Samir Otmane
PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery
Adrito Das, Danyal Z. Khan, Dimitrios Psychogyios, Yitong Zhang, John G. Hanrahan, Francisco Vasconcelos, You Pang, Zhen Chen, Jinlin Wu, Xiaoyang Zou, Guoyan Zheng, Abdul Qayyum, Moona Mazher, Imran Razzak, Tianbin Li, Jin Ye, Junjun He, Szymon Płotka, Joanna Kaleta, Amine Yamlahi, Antoine Jund, Patrick Godau, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa, Dominik Rivoir, Alejandra Pérez, Santiago Rodriguez, Pablo Arbeláez, Danail Stoyanov, Hani J. Marcus, Sophia Bano
SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation
Alberto Bacchin, Davide Allegro, Stefano Ghidoni, Emanuele Menegatti
Upgrading Pepper Robot s Social Interaction with Advanced Hardware and Perception Enhancements
Paolo Magri, Javad Amirian, Mohamed Chetouani
A nonlinear elasticity model in computer vision
John M. Ball, Christopher L. Horner
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
Asifullah Khan, Anabia Sohail, Mustansar Fiaz, Mehdi Hassan, Tariq Habib Afridi, Sibghat Ullah Marwat, Farzeen Munir, Safdar Ali, Hannan Naseem, Muhammad Zaigham Zaheer, Kamran Ali, Tangina Sultana, Ziaurrehman Tanoli, Naeem Akhter
LV-UNet: A Lightweight and Vanilla Model for Medical Image Segmentation
Juntao Jiang, Mengmeng Wang, Huizhong Tian, Lingbo Cheng, Yong Liu
Mismatched: Evaluating the Limits of Image Matching Approaches and Benchmarks
Sierra Bonilla, Chiara Di Vece, Rema Daher, Xinwei Ju, Danail Stoyanov, Francisco Vasconcelos, Sophia Bano
PoseWatch: A Transformer-based Architecture for Human-centric Video Anomaly Detection Using Spatio-temporal Pose Tokenization
Ghazal Alinezhad Noghre, Armin Danesh Pazho, Hamed Tabkhi
A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships
Gracile Astlin Pereira, Muhammad Hussain
RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images
Ziteng Cui, Tatsuya Harada
Comparative Analysis: Violence Recognition from Videos using Transfer Learning
Dursun Dashdamirov
Uncertainties of Latent Representations in Computer Vision
Michael Kirchhof
Beyond Few-shot Object Detection: A Detailed Survey
Vishal Chudasama, Hiran Sarkar, Pankaj Wasnik, Vineeth N Balasubramanian, Jayateja Kalla