Computational Paralinguistics

Computational paralinguistics aims to extract information about a speaker's emotional state, identity, or health from their speech, going beyond the literal meaning of words. Current research heavily utilizes large pre-trained models, such as transformers and variations of Wav2Vec, often incorporating multimodal approaches that combine audio with text data for improved accuracy in tasks like emotion recognition, speaker profiling, and health monitoring. These advancements are driving progress in applications like telemedicine, improving accessibility and efficiency of remote diagnosis and monitoring, and also contributing to a deeper understanding of human communication.

Papers