Computer Vision
Computer vision, a field focused on enabling computers to "see" and interpret images and videos, aims to develop algorithms that can perform tasks such as object detection, image classification, and scene understanding. Current research heavily utilizes deep learning, particularly convolutional neural networks (CNNs) and vision transformers (ViTs), often combined with techniques like multi-modal fusion (integrating data from different sensors) and transfer learning to improve efficiency and accuracy. These advancements are driving significant progress in diverse applications, including precision agriculture, robotics, medical imaging analysis, and autonomous systems, by providing automated, efficient, and objective solutions to complex visual tasks.
Papers
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Duy-Kien Nguyen, Mahmoud Assran, Unnat Jain, Martin R. Oswald, Cees G. M. Snoek, Xinlei Chen
Suitability of KANs for Computer Vision: A preliminary investigation
Basim Azam, Naveed Akhtar
UruBots UAV -- Air Emergency Service Indoor Team Description Paper for FIRA 2024
Hiago Sodre, Sebastian Barcelona, Anthony Scirgalea, Brandon Macedo, Gabriel Sampson, Pablo Moraes, William Moraes, Victoria Saravia, Juan Deniz, Bruna Guterres, Andre Kelbouscas, Ricardo Grando
A Sociotechnical Lens for Evaluating Computer Vision Models: A Case Study on Detecting and Reasoning about Gender and Emotion
Sha Luo, Sang Jung Kim, Zening Duan, Kaiping Chen
CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer
Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Qiao Liang, Min-Jian Zhao, Jieping Ye
Watching Swarm Dynamics from Above: A Framework for Advanced Object Tracking in Drone Videos
Duc Pham, Matthew Hansen, Félicie Dhellemmes, Jens Krause, Pia Bideau
A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation
Riccardo Fogliato, Pratik Patil, Mathew Monfort, Pietro Perona
Which Country Is This? Automatic Country Ranking of Street View Photos
Tim Menzner, Jochen L. Leidner, Florian Mittag
Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
Jiyao Zhang, Weiyao Huang, Bo Peng, Mingdong Wu, Fei Hu, Zijian Chen, Bo Zhao, Hao Dong
3rd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation
Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang