UNIGE document Chapitre d'actes
previous document  unige:3461  next document
add to browser collection
Title

How Much Data is Needed for Reliable MT Evaluation? Using Bootstrapping to Study Human and Automatic Metrics

Authors
Hamon, Olivier
Published in Proceedings of Machine Translation Summit XI. Copenhagen (Denmark) - 10-14 September - . 2007, p. 167-174
Abstract Evaluating the output quality of machine translation system requires test data and quality metrics to be applied. Based on the results of the French MT evaluation campaign CESTA, this paper studies the statistical reliability of the scores depending on the amount of test data used to obtain them. Bootstrapping is used to compute standard deviation of scores assigned by human judges (mainly of adequacy) as well as of five automatic metrics. The reliability of the scores is measured using two formal criteria, and the minimal number of documents or segments needed to reach reliable scores is estimated. This number does not depend on the exact subset of documents that is used.
Full text
Proceedings chapter - public document Free access
Structures
Citation
(ISO format)
ESTRELLA, Paula Susana, HAMON, Olivier, POPESCU-BELIS, Andréi. How Much Data is Needed for Reliable MT Evaluation? Using Bootstrapping to Study Human and Automatic Metrics. In: Proceedings of Machine Translation Summit XI. Copenhagen (Denmark). [s.l.] : [s.n.], 2007. p. 167-174. https://archive-ouverte.unige.ch/unige:3461

197 hits

156 downloads

Update

Deposited on : 2009-10-02

Export document
Format :
Citation style :