Silent Speech

Silent speech interfaces (SSIs) aim to translate articulatory movements, such as lip movements or tongue positions, into spoken words without audible vocalization. Current research heavily utilizes deep learning, particularly employing contrastive learning and spatial transformer networks within neural networks, to improve the accuracy and robustness of these systems across different speakers and recording conditions. This technology holds significant promise for applications requiring private communication or hands-free control of devices, particularly in noisy environments or situations where vocalization is undesirable. Ongoing efforts focus on enhancing model adaptability and expanding vocabulary size for more natural and expressive silent communication.

Papers