Spoken Language Understanding

Spoken Language Understanding (SLU) focuses on enabling computers to comprehend human speech, aiming to extract meaning and intent from spoken dialogue. Current research emphasizes improving the robustness and accuracy of SLU systems, particularly in handling noisy speech, low-resource languages, and out-of-distribution data, often employing large language models (LLMs) and contrastive learning techniques within various architectures like end-to-end models and hybrid approaches combining speech encoders with LLMs. Advances in SLU are crucial for enhancing human-computer interaction in applications such as virtual assistants, improving accessibility for diverse languages, and advancing the broader field of artificial intelligence.

Papers

April 1, 2022

Multi-task RNN-T with Semantic Decoder for Streamable Spoken Language Understanding
Xuandi Fu, Feng-Ju Chang, Martin Radfar, Kai Wei, Jing Liu, Grant P. Strimel, Kanthashree Mysore Sathyendra
Multi Task Language Understanding Spoken Language Understanding Natural Language Understanding End 2 End Semantic Decoder

March 29, 2022

WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models
Heting Gao, Junrui Ni, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson
Language Model Shot Learning Spoken Language Understanding Frozen Language Model

March 22, 2022

Building Robust Spoken Language Understanding by Cross Attention between Phoneme Sequence and ASR Hypothesis
Zexun Wang, Yuquan Le, Yi Zhu, Yuming Zhao, Mingchao Feng, Meng Chen, Xiaodong He
Automatic Speech Recognition Cross Attention Spoken Language Understanding Downstream NLP Task Automatic Speech Recognition Hypothesis Automatic Speech Recognition Error Phoneme Sequence

February 26, 2022

Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems
Samuel Thomas, Hong-Kwang J. Kuo, Brian Kingsbury, George Saon
Training Data Speech Data Spoken Language Understanding Community Need Speech Input Text Only Training

February 23, 2022

Knowledge Augmented BERT Mutual Network in Multi-turn Spoken Dialogues
Ting-Wei Wu, Biing-Hwang Juang
BERT Model Spoken Language Understanding BERT Based Multi Turn Dialogue Dialogue Context

February 17, 2022

AISHELL-NER: Named Entity Recognition from Chinese Speech
Boli Chen, Guangwei Xu, Xiaobin Wang, Pengjun Xie, Meishan Zhang, Fei Huang
Automatic Speech Recognition Entity Recognition Named Entity Recognition Chinese Character Spoken Language Understanding

January 28, 2022

Improving End-to-End Models for Set Prediction in Spoken Language Understanding
Hong-Kwang J. Kuo, Zoltan Tuske, Samuel Thomas, Brian Kingsbury, George Saon
Spoken Language Understanding Long Sequence Prediction Set End to End Model RNN Transducer Speech Modeling

January 18, 2022

Dialog Intent Induction via Density-based Deep Clustering Ensemble
Jiashu Pu, Guandan Chen, Yongzhu Chang, Xiaoxi Mao
Spoken Language Understanding Intent Detection Deep Clustering

December 22, 2021

Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding
Xiao Xu, Libo Qin, Kaiji Chen, Guoxing Wu, Linlin Li, Wanxiang Che
New Benchmark Text Modality Spoken Language Understanding User Utterance Intent Learning

December 14, 2021

On the Use of External Data for Spoken Named Entity Recognition
Ankita Pasad, Felix Wu, Suwon Shon, Karen Livescu, Kyu J. Han
Speech Recognition Entity Recognition Greater Public Use Named Entity Recognition Spoken Language Understanding Self Supervised Speech Representation Data Source

December 10, 2021

Revisiting the Boundary between ASR and NLU in the Age of Conversational Dialog Systems
Manaal Faruqui, Dilek Hakkani-Tür
Automatic Speech Recognition Language Understanding Spoken Language Understanding Speech Based Age Automatic Speech Recognition Model Document Boundary Natural Language Understanding

November 29, 2021

November 19, 2021

SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech
Suwon Shon, Ankita Pasad, Felix Wu, Pablo Brusco, Yoav Artzi, Karen Livescu, Kyu J. Han
Automatic Speech Recognition Spoken Language Understanding Speech Processing Natural Sounding Speech Benchmark Task

November 12, 2021

Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR
Ondrej Klejch, Electra Wallington, Peter Bell
Automatic Speech Recognition Cross Lingual Transfer Cross Lingual Spoken Language Understanding Speech Corpus ASR System Semi Supervised Training

November 4, 2021

A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding
Yingzhi Wang, Abdelmoumene Boumadane, Abdelwahab Heba
Automatic Speech Recognition New Benchmark Speaker Verification Speech Emotion Recognition Spoken Language Understanding

Spoken Language Understanding

Papers

Multi-task RNN-T with Semantic Decoder for Streamable Spoken Language Understanding

WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models

Building Robust Spoken Language Understanding by Cross Attention between Phoneme Sequence and ASR Hypothesis

Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems

Knowledge Augmented BERT Mutual Network in Multi-turn Spoken Dialogues

AISHELL-NER: Named Entity Recognition from Chinese Speech

Improving End-to-End Models for Set Prediction in Spoken Language Understanding

Dialog Intent Induction via Density-based Deep Clustering Ensemble

Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

On the Use of External Data for Spoken Named Entity Recognition

Revisiting the Boundary between ASR and NLU in the Age of Conversational Dialog Systems

ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

Do We Still Need Automatic Speech Recognition for Spoken Language Understanding?

SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech

Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding