Cued Speech
Cued Speech (CS) is a visual communication system combining lipreading with hand gestures to enhance speech understanding for the hearing impaired. Current research focuses on improving automatic CS recognition (ACSR) using advanced techniques like multi-modal fusion transformers and attention mechanisms to effectively integrate lip and hand information, as well as generating CS videos from audio or text using diffusion models. These advancements aim to create more accurate and efficient ACSR systems and improve CS video generation, ultimately enhancing communication accessibility for deaf and hard-of-hearing individuals. The development of large, multi-speaker CS datasets across various languages is also a key area of progress, facilitating the training and evaluation of these models.