Paper ID: 2406.02565 • Published May 2, 2024
Sequence-to-sequence models in peer-to-peer learning: A practical application
Robert Šajina, Ivo Ipšić
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
This paper explores the applicability of sequence-to-sequence (Seq2Seq) models based on LSTM units for Automatic Speech Recognition (ASR) task within peer-to-peer learning environments. Leveraging two distinct peer-to-peer learning methods, the study simulates the learning process of agents and evaluates their performance in ASR task using two different ASR datasets. In a centralized training setting, utilizing a scaled-down variant of the Deep Speech 2 model, a single model achieved a Word Error Rate (WER) of 84\% when trained on the UserLibri dataset, and 38\% when trained on the LJ Speech dataset. Conversely, in a peer-to-peer learning scenario involving 55 agents, the WER ranged from 87\% to 92\% for the UserLibri dataset, and from 52\% to 56\% for the LJ Speech dataset. The findings demonstrate the feasibility of employing Seq2Seq models in decentralized settings, albeit with slightly higher Word Error Rates (WER) compared to centralized training methods.