Deep Vision Model

Deep vision models, primarily convolutional neural networks (CNNs) and vision transformers (ViTs), aim to enable computers to "see" and understand images and videos, achieving human-level performance in tasks like object recognition and video analysis. Current research heavily emphasizes improving model explainability, focusing on techniques like class activation maps and concept-based explanations to understand model decision-making processes and address the "black box" nature of deep learning. This work is crucial for building trust in these models, particularly in high-stakes applications like autonomous driving and medical image analysis, and for developing more robust and efficient architectures.

Papers