Unsupervised Automatic Speech Recognition
Unsupervised automatic speech recognition (ASR) aims to build speech recognition systems without relying on paired speech and text data, a crucial step towards enabling ASR for low-resource languages. Current research focuses on developing novel model architectures, often employing self-supervised learning, reinforcement learning, and adversarial training techniques, to learn the mapping between speech and text from unpaired corpora. These advancements leverage techniques like masked token infilling, boundary segmentation, and cross-lingual pseudo-labeling to improve accuracy and robustness, leading to significant progress in unsupervised speech-to-text and even speech-to-speech tasks. The ultimate goal is to make ASR technology more widely accessible and applicable across diverse languages and domains.
Papers
EURO: ESPnet Unsupervised ASR Open-source Toolkit
Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, Sanjeev Khudanpur
Location analysis of players in UEFA EURO 2020 and 2022 using generalized valuation of defense by estimating probabilities
Rikuhei Umemoto, Kazushi Tsutsui, Keisuke Fujii