Speech Segment

Speech segment analysis focuses on extracting meaningful information from discrete portions of spoken audio, aiming to improve various speech-related applications. Current research emphasizes developing robust models, such as transformer networks and graph convolutional networks, to handle challenges like noise, speaker variability, and overlapping speech, often incorporating multimodal data (audio-visual) and self-supervised learning techniques for improved performance. These advancements are driving progress in diverse fields, including mental health assessment, speech-to-speech translation, and speaker diarization, by enabling more accurate and efficient processing of spoken language.

Papers

September 22, 2022

Cross-domain Voice Activity Detection with Self-Supervised Representations
Sina Alisamir, Fabien Ringeval, Francois Portet
Self Supervised Learning Voice Activity Detection Speech Segment Filter Bank

August 11, 2022

Speech Synthesis with Mixed Emotions
Kun Zhou, Berrak Sisman, Rajib Rana, B. W. Schuller, Haizhou Li
Speech Synthesis Speech Segment Emotion Vector

July 25, 2022

ConceptBeam: Concept Driven Target Speech Extraction
Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada, Kunio Kashino
Speech Segment Target Speech Extraction Audio Caption Speech Mixture Modality Independent

June 9, 2022

Audio-video fusion strategies for active speaker detection in meetings
Lionel Pibre, Francisco Madrigal, Cyrille Equoy, Frédéric Lerasle, Thomas Pellegrini, Julien Pinquier, Isabelle Ferrané
Audio Visual Speaker Diarization Human VOICE Active Speaker Detection Speech Segment Meeting Minute

May 19, 2022

Bi-LSTM Scoring Based Similarity Measurement with Agglomerative Hierarchical Clustering (AHC) for Speaker Diarization
Siddharth S. Nijhawan, Homayoon Beigi
Speaker Diarization Hierarchical Clustering Speech Segment Spoken Conversation Agglomerative Hierarchical Clustering Deep Similarity Audio Segmentation

May 9, 2022

Insights on Modelling Physiological, Appraisal, and Affective Indicators of Stress using Audio Features
Andreas Triantafyllopoulos, Sandra Zänkert, Alice Baird, Julian Konzok, Brigitte M. Kudielka, Björn W. Schuller
DCU Insight AQ Speech Signal Target Emotion Acoustic Feature Continuous Chronic Stress Speech Segment Human Stress Audio Feature Physiological Measurement

March 26, 2022

SpeechSplit 2.0: Unsupervised speech disentanglement for voice conversion Without tuning autoencoder Bottlenecks
Chak Ho Chan, Kaizhi Qian, Yang Zhang, Mark Hasegawa-Johnson
Voice Conversion Speech Representation Disentanglement Speech Segment Autoencoder Bottleneck

February 15, 2022

SpeechPainter: Text-conditioned Speech Inpainting
Zalán Borsos, Matt Sharifi, Marco Tagliasacchi
Speech Analysis Speaker Identity Speech Segment Adaptive Text to Speech

December 15, 2021

Speech frame implementation for speech analysis and recognition
A. A. Konev, V. S. Khlebnikov, A. Yu. Yakimuk
Speech Analysis Recognition Rate Speech Signal Rich Attribute Russian Language Speech Segment Speech Frame

December 10, 2021

Directed Speech Separation for Automatic Speech Recognition of Long Form Conversational Speech
Rohit Paturi, Sundararajan Srinivasan, Katrin Kirchhoff, Daniel Garcia-Romero
Automatic Speech Recognition Speaker Embeddings Speech Separation Speech Segment