Large Scale Speech

Large-scale speech research focuses on developing and improving systems that process and understand vast amounts of spoken language data. Current efforts concentrate on leveraging large language models (LLMs) and self-supervised learning techniques to enhance speech recognition, synthesis, and speaker verification, often employing transformer-based architectures and convolutional neural networks for feature extraction and classification. This work is crucial for advancing applications like voice assistants, multilingual communication tools, and digital health technologies that rely on accurate and efficient speech processing, while also addressing challenges like data bias and robustness.

Papers