Translation Benchmark
Translation benchmarks are crucial for evaluating and advancing machine translation (MT) models, focusing on metrics like BLEU and word error rate, and increasingly incorporating document-level context and discourse features. Current research emphasizes improving model efficiency and accuracy through techniques like non-autoregressive transformers, mixture-of-experts models, and innovative training strategies such as reinforced self-training and contrastive learning. These advancements are vital for bridging the gap between human and machine translation, particularly for low-resource languages and complex document-level tasks, impacting fields ranging from cross-cultural communication to software development.
Papers
Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus
Yuchen Eleanor Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Mrinmaya Sachan, Ryan Cotterell
On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation
Liang Chen, Shuming Ma, Dongdong Zhang, Furu Wei, Baobao Chang
Advancing Multilingual Pre-training: TRIP Triangular Document-level Pre-training for Multilingual Language Models
Hongyuan Lu, Haoyang Huang, Shuming Ma, Dongdong Zhang, Wai Lam, Furu Wei
Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual Machine Translation
Maha Elbayad, Anna Sun, Shruti Bhosale