Child Adult Speaker

Child-adult speaker identification and profiling are active research areas focusing on accurately distinguishing child and adult voices within audio and video recordings of dyadic interactions. Current research leverages deep learning models, including speech foundation models and wav2vec 2.0, often incorporating multi-task learning and multimodal (audio-visual) approaches to improve accuracy and robustness, particularly in challenging scenarios like noisy home environments. These advancements have significant implications for applications requiring automated analysis of child-adult interactions, such as educational technology, clinical diagnostics, and the development of more personalized conversational agents.

Papers