Paralinguistic Feature
Paralinguistic features, encompassing aspects of speech beyond the literal words (e.g., tone, emotion, speaking style), are increasingly central to research in speech processing and understanding. Current efforts focus on integrating paralinguistic information into large language models (LLMs) using various techniques, including hierarchical feature fusion, contrastive learning, and multimodal architectures like transformer encoders, to improve tasks such as speech emotion recognition and spoken dialogue modeling. This research is significant for advancing human-computer interaction, improving the accuracy of speech-based health monitoring, and developing more nuanced and empathetic AI systems.
Papers
Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load
Gasser Elbanna, Alice Biryukov, Neil Scheidwasser-Clow, Lara Orlandic, Pablo Mainar, Mikolaj Kegler, Pierre Beckmann, Milos Cernak
Generative Spoken Dialogue Language Modeling
Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, Ali Elkahky, Paden Tomasello, Robin Algayres, Benoit Sagot, Abdelrahman Mohamed, Emmanuel Dupoux