End to End Spoken Language

End-to-end spoken language understanding (SLU) aims to directly translate spoken audio into semantic representations, bypassing the traditional pipeline approach of separate speech recognition and natural language understanding. Current research emphasizes improving model robustness to noisy audio and ASR errors, often employing transformer-based architectures and techniques like knowledge distillation and multi-task learning to enhance accuracy and efficiency, particularly in low-resource settings. This field is crucial for advancing human-computer interaction, enabling more natural and effective voice-controlled interfaces for applications ranging from virtual assistants to smart home devices.

Papers

January 13, 2025

Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
Jiliang Hu, Zuchao Li, Mengjia Shen, Haojun Ai, Sheng Li, Jun Zhang
Spoken Language Understanding Structure Learning Good Better Sequence to Sequence Task End to End Spoken Language Joint Audio Speech Enhancement Module

June 12, 2024

PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding
Trang Le, Daniel Lazar, Suyoun Kim, Shan Jiang, Duc Le, Adithya Sagar, Aleksandr Livshits, Ahmed Aly, Akshat Shrivastava
Automatic Speech Recognition Spoken Language Understanding Deliberation Model Political Deliberation End to End Spoken Language

March 22, 2024

Privacy-Preserving End-to-End Spoken Language Understanding
Yinggui Wang, Wei Huang, Le Yang
Speech Recognition Privacy Preserving Spoken Language Understanding Human Speech End to End Spoken Language

October 9, 2023

Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding
Pavel Denisov, Ngoc Thang Vu
Self Supervised Multilingual Data Multilingual Capability Language Dataset End to End Spoken Language

July 22, 2023

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding
Suyoun Kim, Akshat Shrivastava, Duc Le, Ju Lin, Ozlem Kalinli, Michael L. Seltzer
Language Understanding Pre Trained Automatic Speech Recognition End 2 End Confidence Aware Automatic Speech Recognition Hypothesis End to End Spoken Language

May 29, 2023

Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target
Guan-Wei Wu, Guan-Ting Lin, Shang-Wen Li, Hung-yi Lee
Spoken Language Understanding Self Supervised Speech Model Pre Trained Automatic Speech Recognition End to End Spoken Language Speech Text Data Intermediate Target

May 23, 2023

Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding
Umberto Cappellazzo, Muqiao Yang, Daniele Falavigna, Alessio Brutti
Knowledge Distillation Continual Learning Sequence to Sequence Spoken Language Understanding End to End Spoken Language Sequence Level Knowledge Distillation Entity Prediction

May 2, 2023

A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge
Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe
Language Model Speech Recognition Semantic Parsing Spoken Language Understanding Pipeline System End to End Spoken Language Stop or Go Decision

October 29, 2022

End-to-end Spoken Language Understanding with Tree-constrained Pointer Generator
Guangzhi Sun, Chao Zhang, Philip C. Woodland
Speech Recognition Language Understanding Spoken Language Understanding Contextual Biasing End to End Spoken Language

October 27, 2022

Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W Black, Shinji Watanabe
Spoken Language Understanding Token Level Sequence Labeling NLU Model Compositional Model End to End Spoken Language

October 11, 2022

On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding
Gaëlle Laperrière, Valentin Pelloin, Mickaël Rouvier, Themos Stafylakis, Yannick Estève
Language Understanding Greater Public Use Spoken Language Understanding Acoustic Word Embeddings End to End Spoken Language Frame Level

July 17, 2022

End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting
Thierry Desot, François Portet, Michel Vacher
Automatic Speech Recognition Language Understanding Low Resource Spoken Language Understanding Speech Processing Speech Signal Performance Analysis End to End Spoken Language Voice Command

July 15, 2022

Low-bit Shift Network for End-to-End Spoken Language Understanding
Anderson R. Avila, Khalil Bibi, Rui Heng Yang, Xinlin Li, Chao Xing, Xiao Chen
Convolutional Neural Network Deep Neural Network Low Bit Quantization End to End Spoken Language Bit Weight Power of Two Quantization Bit Shift Network

July 14, 2022

Two-Pass Low Latency End-to-End Spoken Language Understanding
Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan Black, Shinji Watanabe
Language Model Spoken Language Understanding Acoustic Representation End to End Spoken Language End to End Speech Recognition

July 1, 2022

Toward Low-Cost End-to-End Spoken Language Understanding
Marco Dinarelli, Marco Naguib, François Portet
Language Understanding Spoken Language Understanding Self Supervised Model Speech Corpus End to End Spoken Language

June 29, 2022

STOP: A dataset for Spoken Task Oriented Semantic Parsing
Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-Chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin Algayres, Tu Ahn Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed
Data Set Automatic Speech Recognition Language Understanding Semantic Parsing Unlabeled Speech End to End Spoken Language

April 7, 2022

Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model
Nick J. C. Wang, Lu Wang, Yandan Sun, Haimei Kang, Dejun Zhang
Automatic Speech Recognition Spoken Language Understanding Pre Trained Speech Model Intent Classifier End to End Spoken Language Layered Approach

December 13, 2021

Attentive Contextual Carryover for Multi-Turn End-to-End Spoken Language Understanding
Kai Wei, Thanh Tran, Feng-Ju Chang, Kanthashree Mysore Sathyendra, Thejaswi Muniyappa, Jing Liu, Anirudh Raju, Ross McGowan, Nathan Susanj, Ariya Rastrow, Grant P. Strimel
Language Understanding Multi Turn Dialogue Task Oriented Dialogue Natural Language Understanding End to End Spoken Language

End to End Spoken Language

Papers

Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding

PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding

Privacy-Preserving End-to-End Spoken Language Understanding

Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target

Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding

A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge

End-to-end Spoken Language Understanding with Tree-constrained Pointer Generator

Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models

On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding

End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting

Low-bit Shift Network for End-to-End Spoken Language Understanding

Two-Pass Low Latency End-to-End Spoken Language Understanding

Toward Low-Cost End-to-End Spoken Language Understanding

STOP: A dataset for Spoken Task Oriented Semantic Parsing

Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model

Attentive Contextual Carryover for Multi-Turn End-to-End Spoken Language Understanding