Paper ID: 2501.13996 • Published Jan 23, 2025
Integrating Persian Lip Reading in Surena-V Humanoid Robot for Human-Robot Interaction
Ali Farshian Abbasi, Aghil Yousefi-Koma, Soheil Dehghani Firouzabadi, Parisa Rashidi, Alireza Naeini
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
Lip reading is vital for robots in social settings, improving their ability
to understand human communication. This skill allows them to communicate more
easily in crowded environments, especially in caregiving and customer service
roles. Generating a Persian Lip-reading dataset, this study integrates Persian
lip-reading technology into the Surena-V humanoid robot to improve its speech
recognition capabilities. Two complementary methods are explored, an indirect
method using facial landmark tracking and a direct method leveraging
convolutional neural networks (CNNs) and long short-term memory (LSTM)
networks. The indirect method focuses on tracking key facial landmarks,
especially around the lips, to infer movements, while the direct method
processes raw video data for action and speech recognition. The best-performing
model, LSTM, achieved 89\% accuracy and has been successfully implemented into
the Surena-V robot for real-time human-robot interaction. The study highlights
the effectiveness of these methods, particularly in environments where verbal
communication is limited.