Spoken Text

Spoken text analysis is a rapidly evolving field focused on understanding and processing human speech, encompassing tasks like automatic speech recognition (ASR), speech translation, and understanding speaker characteristics within conversations. Current research heavily utilizes large language models (LLMs) and transformer architectures, often incorporating multimodal approaches that integrate audio with other data modalities like brain activity or visual cues to improve accuracy and contextual understanding. This work has significant implications for various applications, including improved accessibility for individuals with hearing impairments, more effective public health monitoring through social media analysis, and advancements in human-computer interaction.

Papers

December 12, 2024

Comparative Analysis of Mel-Frequency Cepstral Coefficients and Wavelet Based Audio Signal Processing for Emotion Detection and Mental Health Assessment in Spoken Speech
Idoko Agbo, Dr Hoda El-Sayed, M.D Kamruzzan Sarker
LSTM Network Affective Computing Wavelet Based LSTM Model Emotion Detection Mel Frequency Cepstral Coefficient Spoken Text

September 29, 2024

A multimodal LLM for the non-invasive decoding of spoken text from brain recordings
Youssef Hmamouche, Ismail Chihab, Lahoucine Kdouri, Amal El Fallah Seghrouchni
Functional Magnetic Resonance Imaging Multimodal LLM Neural Recording Non Invasive fMRI Signal Spoken Text

September 7, 2024

Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue
Junkai Wu, Xulin Fan, Bo-Ru Lu, Xilin Jiang, Nima Mesgarani, Mark Hasegawa-Johnson, Mari Ostendorf
Language Model Medical LLM Spoken Language Understanding Critique Ability Speech Language Model Spoken Dialogue Spoken Text

June 12, 2024

Semi-Supervised Spoken Language Glossification
Huijie Yao, Wengang Zhou, Hao Zhou, Houqiang Li
Speech Corpus Gloss Translation Automatic Annotation Spoken Text

April 17, 2024

A Data-Driven Representation for Sign Language Production
Harry Walsh, Abolfazl Ravanshad, Mariam Rahmani, Richard Bowden
Sign Language Codebook Learning Sign Language Production Spoken Text Phonetic Representation

October 20, 2023

Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models
Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Ke Wu
Large Language Model Segmentation Based Approach Speech Translation Constrained Decoding Spoken Text

October 10, 2023

Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration
Piyush Singh Pasi, Karthikeya Battepati, Preethi Jyothi, Ganesh Ramakrishnan, Tanmay Mahapatra, Manoj Singh
Case Study Yes No Question Speaker Embeddings Automatic Speech Recognition Model Spoken Text Independent Phone to Audio Alignment Multimodal Data Integration

August 21, 2023

Can Language Models Learn to Listen?
Evonne Ng, Sanjay Subramanian, Dan Klein, Angjoo Kanazawa, Trevor Darrell, Shiry Ginosar
Language Model Facial Expression Facial Behaviour Response Quality Spoken Text

June 1, 2023

Leveraging Natural Language Processing For Public Health Screening On YouTube: A COVID-19 Case Study
Ahrar Bin Aslam, Zafi Sherhan Syed, Muhammad Faiz Khan, Asghar Baloch, Muhammad Shehram Shah Syed
Natural Language Processing Spoken Text Public Health Surveillance Coronavirus Disease

December 19, 2022

Improved Long-Form Spoken Language Translation with Large Language Models
Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Axel H. Ng
Large Language Model Speech Translation Simultaneous Speech Translation Spoken Text

October 26, 2022

Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition
Sharman Tan, Piyush Behre, Nick Kibre, Issac Alphonso, Shuangyu Chang
Automatic Speech Recognition Speech to Text Punctuation Mark Disfluent Speech Spoken Text Inverse Text Normalization

October 25, 2022

Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach
Xulong Zhang, Jianzong Wang, Ning Cheng, Kexin Zhu, Jing Xiao
Speech Representation Noise Masking Spoken Text Masking Approach

February 16, 2022

ADIMA: Abuse Detection In Multilingual Audio
Vikram Gupta, Rini Sharon, Ramit Sawhney, Debdoot Mukherjee
Automatic Speech Recognition Audio Datasets Abuse Detection Spoken Text Multilingual Track

Spoken Text

Papers

Comparative Analysis of Mel-Frequency Cepstral Coefficients and Wavelet Based Audio Signal Processing for Emotion Detection and Mental Health Assessment in Spoken Speech

A multimodal LLM for the non-invasive decoding of spoken text from brain recordings

Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue

Semi-Supervised Spoken Language Glossification

A Data-Driven Representation for Sign Language Production

Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models

Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration

Can Language Models Learn to Listen?

Leveraging Natural Language Processing For Public Health Screening On YouTube: A COVID-19 Case Study

Improved Long-Form Spoken Language Translation with Large Language Models

Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition

Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach

ADIMA: Abuse Detection In Multilingual Audio