en
Scientific article
Open access
English

Multi-lingual Dependency Parsing Evaluation : a Large-scale Analysis of Word Order Properties using Artificial Data

Publication date2016
Abstract

The growing work in multi-lingual parsing faces the challenge of fair comparative evaluation and performance analysis across languages and their treebanks. The difficulty lies in teasing apart the properties of treebanks, such as their size or average sentence length, from those of the annotation scheme, and from the linguistic properties of languages. We propose a method to evaluate the effects of word order of a language on dependency parsing performance, while controlling for confounding treebank properties. The method uses artificially-generated treebanks that are minimal permutations of actual treebanks with respect to two word order properties: word order variation and dependency lengths. Based on these artificial data on twelve languages, we show that longer dependencies and higher word order variability degrade parsing performance. Our method also extends to minimal pairs of individual sentences, leading to a finer-grained understanding of parsing errors.

Citation (ISO format)
GULORDAVA, Kristina, MERLO, Paola. Multi-lingual Dependency Parsing Evaluation : a Large-scale Analysis of Word Order Properties using Artificial Data. In: Transactions of the Association for Computational Linguistics, 2016, vol. 4, p. 343–356. doi: 10.1162/tacl_a_00103
Main files (1)
Article (Published version)
Identifiers
ISSN of the journal2307-387X
203views
90downloads

Technical informations

Creation30/07/2020 12:16:00
First validation30/07/2020 12:16:00
Update time15/03/2023 22:24:26
Status update15/03/2023 22:24:26
Last indexation12/02/2024 11:53:59
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack