Speech Component

Speech component research focuses on understanding and manipulating the constituent elements of speech, such as content, pitch, rhythm, and timbre, for applications like speech recognition, voice conversion, and disorder diagnosis. Current research employs deep learning models, including attention-based encoder-decoder architectures and generative adversarial networks, to disentangle these components, often leveraging self-supervised learning and mutual information estimation techniques. These advancements are improving the accuracy and efficiency of speech processing technologies and providing valuable insights into human speech production and perception, with implications for both clinical applications and human-computer interaction.

Papers