CNN Transformer

CNN-Transformer hybrid models aim to leverage the strengths of convolutional neural networks (CNNs) for local feature extraction and transformers for capturing global context in various computer vision tasks. Current research focuses on developing efficient architectures that combine these approaches, such as integrating CNNs within transformer blocks or using parallel CNN and transformer branches, often within encoder-decoder frameworks like U-Net. These hybrid models demonstrate improved performance in diverse applications, including medical image segmentation, hyperspectral image classification, and object detection, surpassing the capabilities of either CNNs or transformers alone.

Papers