Convolution Augmented Transformer

Convolution-augmented transformers (CATs), also known as Conformers, combine the global context modeling of transformers with the local feature extraction of convolutional neural networks. Current research focuses on applying CATs to diverse tasks, including speech processing (recognition, enhancement, translation), image processing (segmentation, super-resolution), and music information retrieval (cover song identification), demonstrating their effectiveness across modalities. This architectural approach improves performance in various applications by leveraging the strengths of both transformer and convolutional architectures, leading to more robust and efficient models.

Papers

July 8, 2024

On the Power of Convolution Augmented Transformer
Mingchen Li, Xuechen Zhang, Yixiao Huang, Samet Oymak
Transformer Architecture Real Power Attention Layer Convolution Augmented Transformer

September 30, 2023

Dual-Augmented Transformer Network for Weakly Supervised Semantic Segmentation
Jingliang Deng, Zonghan Li
Vision Transformer Semantic Segmentation Weakly Supervised Semantic Segmentation Convolution Augmented Transformer

June 15, 2023

CoverHunter: Cover Song Identification with Refined Attention and Alignments
Feng Liu, Deyi Tuo, Yinan Xu, Xintong Han
Convolutional Neural Network Temporal Attention Better Alignment Convolution Augmented Transformer Cover Song

June 9, 2023

Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive Feature Learning in Speech Enhancement
Junyu Wang
Speech Enhancement Encoder Decoder Channel Attention Intermediate Layer Uconv Conformer Convolution Augmented Transformer

May 18, 2023

A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe
Automatic Speech Recognition Speech Recognition View Translation Librispeech Speech Recognition One Pas Multiple Conformer Encoder Architecture Convolution Augmented Transformer

March 31, 2023

Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer
Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Lei Zhang, Ran He
Domain Adaptation Source Free Domain Adaptation Real World Image Super Resolution Convolution Augmented Transformer

March 29, 2022

MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification
Yang Zhang, Zhiqiang Lv, Haibin Wu, Shanshan Zhang, Pengfei Hu, Zhiyong Wu, Hung-yi Lee, Helen Meng
Speaker Verification Feature Aggregation Convolution Augmented Transformer

March 28, 2022

CMGAN: Conformer-based Metric GAN for Speech Enhancement
Ruizhe Cao, Sherif Abdulatif, Bin Yang
Automatic Speech Recognition Speech Enhancement Speech Signal Convolution Augmented Transformer Time Domain Speech Enhancement

November 18, 2021

Wiggling Weights to Improve the Robustness of Classifiers
Sadaf Gulshad, Ivan Sosnovik, Arnold Smeulders
Native Robustness Simple Classifier Balancing Weight General Robustness Convolution Augmented Transformer