Utterance Representation
Utterance representation focuses on creating effective numerical summaries of spoken or written language units, aiming to capture crucial information for downstream tasks like emotion recognition, speaker verification, and dialogue understanding. Current research emphasizes developing robust representations that account for contextual information (e.g., preceding conversational turns) and leverage techniques like contrastive learning, self-supervised learning, and transformer-based architectures to improve accuracy and disentangle relevant features. These advancements are significant for improving human-computer interaction, enabling more nuanced analysis of conversational data, and advancing fields like speech processing and natural language understanding.
Papers
Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from Speech
Jaejin Cho, Jes'us Villalba, Laureano Moro-Velazquez, Najim Dehak
Non-Contrastive Self-Supervised Learning of Utterance-Level Speech Representations
Jaejin Cho, Raghavendra Pappagari, Piotr Żelasko, Laureano Moro-Velazquez, Jesús Villalba, Najim Dehak