Computer Vision
Computer vision, a field focused on enabling computers to "see" and interpret images and videos, aims to develop algorithms that can perform tasks such as object detection, image classification, and scene understanding. Current research heavily utilizes deep learning, particularly convolutional neural networks (CNNs) and vision transformers (ViTs), often combined with techniques like multi-modal fusion (integrating data from different sensors) and transfer learning to improve efficiency and accuracy. These advancements are driving significant progress in diverse applications, including precision agriculture, robotics, medical imaging analysis, and autonomous systems, by providing automated, efficient, and objective solutions to complex visual tasks.
Papers
On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications
Chenjiao Tan, Qian Cao, Yiwei Li, Jielu Zhang, Xiao Yang, Huaqin Zhao, Zihao Wu, Zhengliang Liu, Hao Yang, Nemin Wu, Tao Tang, Xinyue Ye, Lilong Chai, Ninghao Liu, Changying Li, Lan Mu, Tianming Liu, Gengchen Mai
PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments
Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam W. Harley, Leonidas Guibas, Cewu Lu
Perception Test 2023: A Summary of the First Challenge And Outcome
Joseph Heyward, João Carreira, Dima Damen, Andrew Zisserman, Viorica Pătrăucean
Integration and Performance Analysis of Artificial Intelligence and Computer Vision Based on Deep Learning Algorithms
Bo Liu, Liqiang Yu, Chang Che, Qunwei Lin, Hao Hu, Xinyu Zhao
Quantum Annealing for Computer Vision Minimization Problems
Shahrokh Heidari, Michael J. Dinneen, Patrice Delmas
Hierarchical Vision Transformers for Context-Aware Prostate Cancer Grading in Whole Slide Images
Clément Grisi, Geert Litjens, Jeroen van der Laak
Unveiling Spaces: Architecturally meaningful semantic descriptions from images of interior spaces
Demircan Tas, Rohit Priyadarshi Sanatani
3D-LFM: Lifting Foundation Model
Mosam Dabhi, Laszlo A. Jeni, Simon Lucey
MineObserver 2.0: A Deep Learning & In-Game Framework for Assessing Natural Language Descriptions of Minecraft Imagery
Jay Mahajan, Samuel Hum, Jack Henhapl, Diya Yunus, Matthew Gadbury, Emi Brown, Jeff Ginger, H. Chad Lane
Unified framework for diffusion generative models in SO(3): applications in computer vision and astrophysics
Yesukhei Jagvaral, Francois Lanusse, Rachel Mandelbaum
Orientation-Constrained System for Lamp Detection in Buildings Based on Computer Vision
Francisco Troncoso-Pastoriza, Pablo Eguía-Oller, Rebeca P. Díaz-Redondo, Enrique Granada-Álvarez, Aitor Erkoreka
A Powerful Face Preprocessing For Robust Kinship Verification based Tensor Analyses
Ammar chouchane, Mohcene Bessaoudi, Abdelmalik Ouamane
SeeBel: Seeing is Believing
Sourajit Saha, Shubhashis Roy Dipta