Synthesized Speech

Synthesized speech research focuses on creating realistic and natural-sounding artificial speech, primarily for applications like voice assistants, audiobooks, and accessibility tools. Current efforts concentrate on improving the naturalness and expressiveness of synthesized speech, often using deep learning models like GANs, diffusion models, and transformers, and addressing challenges such as detecting synthetic speech (deepfakes) and mitigating biases in these detection systems. This field is crucial for advancing human-computer interaction, improving accessibility technologies, and combating the malicious use of synthetic audio in fraud and disinformation.

Papers

October 26, 2022

Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection
Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari
Text to Speech Speech Data Synthesized Speech Data Selection Speech Corpus Text to Speech Model Text to Speech Synthesis

October 24, 2022

Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS
Ziqi Liang
Recurrent Neural Network Synthesized Speech Mongolian Text to Speech

October 21, 2022

Adaptive re-calibration of channel-wise features for Adversarial Audio Classification
Vardhan Dongre, Abhinav Thimma Reddy, Nikhitha Reddeddy
Synthesized Speech Deepfake Audio Channel Wise Synthetic Speech Detection Adversarial Audio Synthetic Speech Detector Adaptive Calibration

October 18, 2022

Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion
Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari
Native Robustness Synthesized Speech Subword Regularization Inappropriate Pause Spontaneous Speech Synthesis

October 14, 2022

October 11, 2022

Deep Spectro-temporal Artifacts for Detecting Synthesized Speech
Xiaohui Liu, Meng Liu, Lin Zhang, Linjuan Zhang, Chang Zeng, Kai Li, Nan Li, Kong Aik Lee, Longbiao Wang, Jianwu Dang
Synthesized Speech Fake Audio Detection Spectro Temporal

September 15, 2022

Detecting Synthetic Speech Manipulation in Real Audio Recordings
Md Hafizur Rahman, Martin Graciarena, Diego Castan, Chris Cobo-Kroenke, Mitchell McLaren, Aaron Lawson
Deep Fake Synthesized Speech Speech Generation Audio Recording Synthetic Speech Detection Synthetic Speech Detector

September 13, 2022

Automated detection of pronunciation errors in non-native English speech employing deep learning
Daniel Korzekwa
Deep Learning Data Detection Speech Synthesis Synthesized Speech Mispronunciation Detection Non Native Pronunciation Training

September 6, 2022

The Role of Vocal Persona in Natural and Synthesized Speech
Camille Noufi, Lloyd May, Jonathan Berger
Integral Role Synthesized Speech Human VOICE Synthetic Voice Context Dependent Visual Naturalness

August 21, 2022

Visualising Model Training via Vowel Space for Text-To-Speech Systems
Binu Abeysinghe, Jesin James, Catherine I. Watson, Felix Marattukalam
Speech Synthesis Synthesized Speech Text to Speech System Model Visualization

August 15, 2022

Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0
Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Csaba Zainkó, Géza Németh
Synthesized Speech Neural Vocoder High Fidelity Vocoder Modern Vocoders Envelope Tracking Wavelet Decomposition Gauss Markov F0 Subband

August 2, 2022

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features
Jun Xue, Cunhang Fan, Zhao Lv, Jianhua Tao, Jiangyan Yi, Chengshi Zheng, Zhengqi Wen, Minmin Yuan, Shegang Shao
Synthesized Speech Acoustic Feature New Combination Audio Deepfake Detection F0 Subband

July 11, 2022

June 30, 2022

TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder
Eunwoo Song, Ryuichi Yamamoto, Ohsung Kwon, Chan-Ho Song, Min-Jae Hwang, Suhyeon Oh, Hyun-Wook Yoon, Jin-Seob Kim, Jae-Min Kim
Variational Autoencoder Text to Speech Synthesized Speech Neural Speech Synthesis

May 13, 2022

Talking Face Generation with Multilingual TTS
Hyoung-Kyu Song, Sang Hoon Woo, Junhyeok Lee, Seungmin Yang, Hyunjae Cho, Youseong Lee, Dongho Choi, Kang-wook Kim
Synthesized Speech Face Generation Multilingual Speech Talking Face Video Multilingual Tt

May 3, 2022

Synthesized Speech Detection Using Convolutional Transformer-Based Spectrogram Analysis
Emily R. Bartusiak, Edward J. Delp
Deep Convolutional Neural Network Synthesized Speech Speech Signal Audio Spectrogram Transformer

April 12, 2022

Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch
Hanbin Bae, Young-Sun Joo
Synthesized Speech Feature Enhancement Pitch Controllability Pitch Augmentation

April 11, 2022

Fusion of Self-supervised Learned Models for MOS Prediction
Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li, Raj Dabre, Raphael Rubino, Yi Zhao
Automatic Speech Recognition Hybrid Fusion Synthesized Speech Self Supervised Model Wav2vec U Mean Opinion Score Prediction Challenge

Synthesized Speech

Papers

Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection

Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS

Adaptive re-calibration of channel-wise features for Adversarial Audio Classification

Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion

Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis

Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario

Deep Spectro-temporal Artifacts for Detecting Synthesized Speech

Detecting Synthetic Speech Manipulation in Real Audio Recordings

Automated detection of pronunciation errors in non-native English speech employing deep learning

The Role of Vocal Persona in Natural and Synthesized Speech

Visualising Model Training via Vowel Space for Text-To-Speech Systems

Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

PoeticTTS -- Controllable Poetry Reading for Literary Studies

Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data

TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder

Talking Face Generation with Multilingual TTS

Synthesized Speech Detection Using Convolutional Transformer-Based Spectrogram Analysis

Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch

Fusion of Self-supervised Learned Models for MOS Prediction