Paper ID: 2211.11584

Dialogs Re-enacted Across Languages

Nigel G. Ward, Jonathan E. Avila, Emilia Rivas, Divette Marco

To support machine learning of cross-language prosodic mappings and other ways to improve speech-to-speech translation, we present a protocol for collecting closely matched pairs of utterances across languages, a description of the resulting data collection and its public release, and some observations and musings. This report is intended for: people using this corpus, people extending this corpus, and people designing similar collections of bilingual dialog data.

Submitted: Nov 18, 2022

Topics

Large Corpus
Unknown Language
Speech to Speech Translation
Speech Utterance
Bilingual Data

Links

arXiv PDF