Co Speech Gesture

Co-speech gesture research focuses on understanding and generating the natural hand and body movements that accompany speech, aiming to create more realistic and engaging human-computer interactions. Current research heavily utilizes deep learning, employing diffusion models, transformers, and generative adversarial networks to synthesize gestures from audio and/or text inputs, often incorporating techniques like contrastive learning and classifier-free guidance for improved realism and control. This field is significant for advancing human-robot interaction, virtual avatar creation, and the broader understanding of multimodal communication, with applications ranging from embodied conversational agents to assistive technologies for individuals with communication impairments.

Papers