Speech Signal

Speech signals are the acoustic representations of spoken language, and research focuses on improving their processing for various applications. Current efforts concentrate on developing robust models for speech enhancement (e.g., using diffusion models and state-space models like Mamba), source separation (leveraging techniques like attention mechanisms and incorporating spatial information), and accurate recognition, even in noisy or challenging environments. These advancements have significant implications for improving human-computer interaction, assistive technologies for individuals with hearing impairments, and applications in healthcare (e.g., disease detection using speech biomarkers) and security (e.g., synthetic speech detection).

Papers

June 11, 2024

June 7, 2024

XANE: eXplainable Acoustic Neural Embeddings
Sri Harsha Dumpala, Dushyant Sharma, Chandramouli Shama Sastri, Stanislav Kruchinin, James Fosburgh, Patrick A. Naylor
Speech Signal Feature Embeddings Speech Detection Acoustic Word Embeddings Background Sound

June 5, 2024

Speech-based Clinical Depression Screening: An Empirical Study
Yangbin Chen, Chenyang Xu, Chunfeng Liang, Yanbao Tao, Chuan Shi
Empirical Study Speech Processing Speech Signal Speech Based Depression Detection

May 30, 2024

Fill in the Gap! Combining Self-supervised Representation Learning with Neural Audio Synthesis for Speech Inpainting
Ihab Asaad, Maxime Jacquelin, Olivier Perrotin, Laurent Girin, Thomas Hueber
Speech Analysis Speech Representation Self Supervised Representation Learning Speech Signal Speaker Recognition Neural Vocoder Speech Supervised Learning Model Neural Audio Synthesis

May 23, 2024

End-to-End User-Defined Keyword Spotting using Shifted Delta Coefficients
Kesavaraj V, Anuprabha M, Anil Kumar Vuppala
Speech Signal Keyword Spotting Spoken Language Keyword Enrollment Audio Text Pair

May 11, 2024

Diff-ETS: Learning a Diffusion Probabilistic Model for Electromyography-to-Speech Conversion
Zhao Ren, Kevin Scheck, Qinhan Hou, Stefano van Gogh, Michael Wand, Tanja Schultz
Diffusion Probabilistic Model Speech Signal High Fidelity Vocoder Synthetic Voice Automated Conversion Electrolaryngeal Speech

May 10, 2024

An Investigation of Incorporating Mamba for Speech Enhancement
Rong Chao, Wen-Huang Cheng, Moreno La Quatra, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Szu-Wei Fu, Yu Tsao
Speech Enhancement State Space Model Mamba in Mamba Speech Signal

April 17, 2024

FairSSD: Understanding Bias in Synthetic Speech Detectors
Amit Kumar Singh Yadav, Kratika Bhagtani, Davide Salvi, Paolo Bestagini, Edward J. Delp
Absolute Stance Bias Synthesized Speech Speech Signal Language Disorder Human Speech Synthetic Speech Detector

March 17, 2024

Advanced Artificial Intelligence Algorithms in Cochlear Implants: Review of Healthcare Strategies, Challenges, and Perspectives
Billel Essaid, Hamza Kheddar, Noureddine Batel, Abderrahmane Lakas, Muhammad E. H. Chowdhury
Automatic Speech Recognition Technical Challenge Speech Enhancement Synthesized View Speech Signal Artificial Intelligence Algorithm Speech Distortion Cochlear Implant

March 9, 2024

An Audio-textual Diffusion Model For Converting Speech Signals Into Ultrasound Tongue Imaging Data
Yudong Yang, Rongfeng Su, Xiaokang Liu, Nan Yan, Lan Wang
Diffusion Model Speech Signal Articulatory Inversion Ultrasound Tongue Tongue Motion

March 5, 2024

Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction
Yue Li, Koen V Hindriks, Florian Kunneman
Human Robot Interaction Speech Signal Human VOICE Room Reverberation Human Speech Target Speech Extraction Egocentric AI

March 2, 2024

REWIND Dataset: Privacy-preserving Speaking Status Segmentation from Multimodal Body Movement Signals in the Wild
Jose Vargas Quiros, Chirag Raman, Stephanie Tan, Ekin Gedik, Laura Cabrera-Quiros, Hayley Hung
Cross Modal Wild Challenge Multimodal Dataset Speech Signal Quantitative Segmentation Multimodal Signal Non Speech Audio ID Datasets

February 22, 2024

SICRN: Advancing Speech Enhancement through State Space Model and Inplace Convolution Techniques
Changjiang Zhao, Shulin He, Xueliang Zhang
Speech Enhancement State Space Model Direct Convolution Speech Signal Traditional Convolution Upsampling Layer

February 17, 2024

When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection
Xiangyu Zhang, Hexin Liu, Kaishuai Xu, Qiquan Zhang, Daijiao Liu, Beena Ahmed, Julien Epps
Large Language Model Speech Analysis Speech Signal Depression Detection Acoustic Feature Multimodal Depression Level Acoustic Information

January 22, 2024

Lightweight Protection for Privacy in Offloaded Speech Understanding
Dongqi Cai
Privacy Policy Speech Signal Lightweight High Encoder Side

January 12, 2024

Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction
Ye-Xin Lu, Yang Ai, Hui-Peng Du, Zhen-Hua Ling
Generative Adversarial Network High Quality Speech Signal Amplitude Estimation Bandwidth Extension Wideband Speech Phase Prediction Speech Bandwidth Extension

January 3, 2024

December 21, 2023

BANSpEmo: A Bangla Emotional Speech Recognition Dataset
Md Gulzar Hussain, Mahmuda Rahman, Babe Sultana, Ye Shiren
Speech Analysis Speech Signal Emotional Speech Emotion Class

Speech Signal

Papers

ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets

RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention

XANE: eXplainable Acoustic Neural Embeddings

Speech-based Clinical Depression Screening: An Empirical Study

Fill in the Gap! Combining Self-supervised Representation Learning with Neural Audio Synthesis for Speech Inpainting

End-to-End User-Defined Keyword Spotting using Shifted Delta Coefficients

Diff-ETS: Learning a Diffusion Probabilistic Model for Electromyography-to-Speech Conversion

An Investigation of Incorporating Mamba for Speech Enhancement

FairSSD: Understanding Bias in Synthetic Speech Detectors

Advanced Artificial Intelligence Algorithms in Cochlear Implants: Review of Healthcare Strategies, Challenges, and Perspectives

An Audio-textual Diffusion Model For Converting Speech Signals Into Ultrasound Tongue Imaging Data

Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction

REWIND Dataset: Privacy-preserving Speaking Status Segmentation from Multimodal Body Movement Signals in the Wild

SICRN: Advancing Speech Enhancement through State Space Model and Inplace Convolution Techniques

When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection

Lightweight Protection for Privacy in Offloaded Speech Understanding

Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction

Multichannel blind speech source separation with a disjoint constraint source model

Independent low-rank matrix analysis based on the Sinkhorn divergence source model for blind source separation

BANSpEmo: A Bangla Emotional Speech Recognition Dataset