Speaker Normalization

Speaker normalization aims to remove speaker-specific characteristics from speech signals, leaving only the core linguistic or emotional content. Current research focuses on developing methods that disentangle speaker and phonetic information within self-supervised speech representations, often employing techniques like principal component analysis or variational autoencoders, and leveraging discrete speech units for improved efficiency. These advancements are crucial for improving the robustness and generalizability of speech processing systems across diverse speakers, with applications ranging from speech synthesis and translation to assisting individuals with dysarthria and mitigating online hate speech.

Papers