

![]() |
A New Method for the Study of Correlations between MT Evaluation Metrics |
|
Authors | ||
Published in | Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI-07). Skvvde (Sweden). 2007, p. 55-64 | |
Abstract | This paper aims at providing a reliable method for measuring the correlations between different scores of evaluation metrics applied to machine translated texts. A series of examples from recent MT evaluation experiments are first discussed, including results and data from the recent French MT evaluation campaign, CESTA, which is used here. To compute correlation, a set of 1,500 samples for each system and each evaluation metric are created using bootstrapping. Correlations between metrics, both automatic and applied by human judges, are then computed over these samples. The results confirm the previously observed correlations between some automatic metrics, but also indicate a lack of correlation between human and automatic metrics on the CESTA data, which raises a number of questions regarding their validity. In addition, the roles of the corpus size and of the selection procedure for bootstrapping (low vs. high scores) are also examined. | |
Full text | ||
Structures | ||
Citation (ISO format) | ESTRELLA, Paula Susana, POPESCU-BELIS, Andréi, KING, Margaret. A New Method for the Study of Correlations between MT Evaluation Metrics. In: Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI-07). Skvvde (Sweden). [s.l.] : [s.n.], 2007. p. 55-64. https://archive-ouverte.unige.ch/unige:3460 |