Attention Based Encoder Decoder
Attention-based encoder-decoder models are a powerful class of neural networks used for sequence-to-sequence tasks, primarily aiming to improve the accuracy and efficiency of processing sequential data like speech and text. Current research focuses on enhancing these models through architectural innovations such as incorporating non-autoregressive decoding for faster inference, multimodal integration of acoustic and semantic information, and hybrid approaches combining different decoder architectures (e.g., CTC, attention, transducer). These advancements have significant implications across diverse fields, including speech recognition, machine translation, and medical image analysis, by enabling more accurate, efficient, and robust processing of complex sequential data.
Papers
A Survey of Visual Transformers
Yang Liu, Yao Zhang, Yixin Wang, Feng Hou, Jin Yuan, Jiang Tian, Yang Zhang, Zhongchao Shi, Jianping Fan, Zhiqiang He
Uformer: A Unet based dilated complex & real dual-path conformer network for simultaneous speech enhancement and dereverberation
Yihui Fu, Yun Liu, Jingdong Li, Dawei Luo, Shubo Lv, Yukai Jv, Lei Xie