Parallel English Translation Dataset
Parallel English translation datasets are crucial for training and evaluating machine translation models, particularly for low-resource languages where such data is scarce. Current research focuses on developing methods to generate or augment these datasets, including techniques like unsupervised multilingual paraphrasing and semi-supervised pseudo-parallel data generation, often employing deep learning architectures such as transformers (e.g., BERT, mT5, mBART). The availability of high-quality parallel data significantly impacts the accuracy and robustness of machine translation systems, with implications for cross-lingual communication and various NLP applications.
Papers
October 14, 2024
January 9, 2024
December 5, 2023
June 13, 2023
April 3, 2023
March 13, 2023