Child Speech Recognition

Child speech recognition (CSR) aims to develop accurate automatic speech recognition systems specifically for children's speech, which differs significantly from adult speech. Current research focuses on adapting existing powerful models like Whisper and wav2vec2, employing techniques such as parameter-efficient fine-tuning (PEFT) and data augmentation (including voice conversion) to overcome the scarcity of child speech data and improve accuracy. These advancements are crucial for applications in education, healthcare, and human-robot interaction, promising more effective and accessible tools for children's learning and development.

Papers