ESPnet ST

ESPnet is an open-source toolkit facilitating research and development in various speech processing tasks. Current efforts focus on expanding its capabilities, including streamlined fine-tuning (ESPnet-EZ), robust speaker recognition (ESPnet-SPK), and versatile spoken language translation (ESPnet-ST-v2), encompassing offline, simultaneous, and speech-to-speech translation using diverse architectures like transducers and hybrid CTC/attention models. These advancements, along with integrations for speech enhancement (ESPnet-SE++) and spoken language understanding (ESPnet-SLU), provide a unified platform accelerating research and enabling reproducible results across a wide range of speech-related applications.

Papers