Speech Driven
Speech-driven research focuses on developing computational models that effectively process and understand spoken language, encompassing tasks like speech recognition, speaker identification, and emotion detection. Current research emphasizes multi-task learning frameworks, often employing transformer-based architectures and diffusion models, to improve the robustness and efficiency of these models across diverse scenarios and languages. This field is crucial for advancing human-computer interaction, improving accessibility for individuals with communication challenges, and enabling more sophisticated applications in areas like personalized healthcare and virtual assistants.
Papers
The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments
Shareef Babu Kalluri, Prachi Singh, Pratik Roy Chowdhuri, Apoorva Kulkarni, Shikha Baghel, Pradyoth Hegde, Swapnil Sontakke, Deepak K T, S. R. Mahadeva Prasanna, Deepu Vijayasenan, Sriram Ganapathy
Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations
Mukhtar Mohamed, Oli Danyi Liu, Hao Tang, Sharon Goldwater
Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers
Guru Prakash Arumugam, Shuo-yiin Chang, Tara N. Sainath, Rohit Prabhavalkar, Quan Wang, Shaan Bijwadia
Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation
Hui Fu, Zeqing Wang, Ke Gong, Keze Wang, Tianshui Chen, Haojie Li, Haifeng Zeng, Wenxiong Kang