One Pas Multiple Conformer
One-pass multiple conformer research aims to optimize the Conformer architecture, a hybrid convolutional-transformer model, for efficient and robust speech processing across various tasks. Current efforts focus on enhancing Conformer's performance in long-form speech recognition, multilingual applications, and noisy environments through techniques like incorporating state-space models, multiple convolution kernels, and efficient attention mechanisms. These advancements are significant for improving the speed, accuracy, and resource efficiency of automatic speech recognition, speech separation, and other audio-visual processing applications, impacting both research and practical deployment of speech technology.
Papers
Augmenting conformers with structured state-space sequence models for online speech recognition
Haozhe Shan, Albert Gu, Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara Sainath
HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
Hyun-seo Shin, Jungwoo Heo, Ju-ho Kim, Chan-yeong Lim, Wonbin Kim, Ha-Jin Yu