Speech Technology

Speech technology aims to enable computers to understand, process, and generate human speech, facilitating seamless human-computer interaction. Current research heavily focuses on improving the accuracy and robustness of automatic speech recognition (ASR) and speech synthesis across diverse languages and speaker demographics, employing deep learning models like transformers and leveraging self-supervised learning to address data scarcity. This field is crucial for broader accessibility of information and services, particularly for low-resource languages and individuals with communication disorders, while also raising important ethical considerations regarding bias and privacy in data collection and model development.

Papers