Speech Recording
Speech recording analysis is a rapidly evolving field focused on extracting meaningful information from audio data for diverse applications, ranging from medical diagnostics to security and accessibility. Current research emphasizes the development of robust models, including graph neural networks and transformer-based architectures like Wav2vec 2.0, to analyze acoustic and prosodic features for tasks such as disease detection, speaker anonymization, and abusive speech identification. This work is significant because it offers non-invasive methods for assessing health conditions, enhancing privacy protections, and improving the accessibility of information across languages and diverse populations.
Papers
Predicting EEG Responses to Attended Speech via Deep Neural Networks for Speech
Emina Alickovic, Tobias Dorszewski, Thomas U. Christiansen, Kasper Eskelund, Leonardo Gizzi, Martin A. Skoglund, Dorothea Wendt
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video
Minsu Kim, Chae Won Kim, Yong Man Ro