Vision Architecture

Vision architecture research focuses on designing and improving computer vision models to accurately interpret and process visual information. Current efforts concentrate on hybrid models combining convolutional neural networks (CNNs) and vision transformers (ViTs), leveraging the strengths of each architecture, as well as exploring the use of multi-layer perceptrons (MLPs) for 3D object recognition and gaze estimation. These advancements are driving progress in diverse applications, including medical image analysis, human-computer interaction, and urban planning, by improving accuracy, efficiency, and explainability of visual data processing.

Papers