en
Proceedings chapter
Open access
English

How Much Data is Needed for Reliable MT Evaluation? Using Bootstrapping to Study Human and Automatic Metrics

Presented at Copenhagen (Denmark), 10-14 September
Publication date2007
Abstract

Evaluating the output quality of machine translation system requires test data and quality metrics to be applied. Based on the results of the French MT evaluation campaign CESTA, this paper studies the statistical reliability of the scores depending on the amount of test data used to obtain them. Bootstrapping is used to compute standard deviation of scores assigned by human judges (mainly of adequacy) as well as of five automatic metrics. The reliability of the scores is measured using two formal criteria, and the minimal number of documents or segments needed to reach reliable scores is estimated. This number does not depend on the exact subset of documents that is used.

Citation (ISO format)
ESTRELLA, Paula Susana, HAMON, Olivier, POPESCU-BELIS, Andréi. How Much Data is Needed for Reliable MT Evaluation? Using Bootstrapping to Study Human and Automatic Metrics. In: Proceedings of Machine Translation Summit XI. Copenhagen (Denmark). [s.l.] : [s.n.], 2007. p. 167–174.
Main files (1)
Proceedings chapter
accessLevelPublic
Identifiers
  • PID : unige:3461
475views
199downloads

Technical informations

Creation10/02/2009 9:28:57 AM
First validation10/02/2009 9:28:57 AM
Update time03/14/2023 3:14:53 PM
Status update03/14/2023 3:14:53 PM
Last indexation02/12/2024 6:13:19 PM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack