Paper ID: 2203.12815

Revisiting the Effects of Leakage on Dependency Parsing

Nathaniel Krasner, Miriam Wanner, Antonios Anastasopoulos

Recent work by S{\o}gaard (2020) showed that, treebank size aside, overlap between training and test graphs (termed leakage) explains more of the observed variation in dependency parsing performance than other explanations. In this work we revisit this claim, testing it on more models and languages. We find that it only holds for zero-shot cross-lingual settings. We then propose a more fine-grained measure of such leakage which, unlike the original measure, not only explains but also correlates with observed performance variation. Code and data are available here: https://github.com/miriamwanner/reu-nlp-project

Submitted: Mar 24, 2022

Topics

Mixed Effect
Zero Shot Cross Lingual
Dependency Parsing
Intra Variability
Deep Leakage
Performance Variation

Links

arXiv PDF