UNIGE document Chapitre d'actes
previous document  unige:30919  next document
add to browser collection
Title

Using Source-Language Transformations to Address Register Mismatches in SMT

Authors
Haddow, Barry
Published in Proceedings of the tenth biennial conference of the Association for Machine Translation in the Americas (AMTA-2012). San Diego (California, USA) - Oct 28-Nov 1 - . 2012
Abstract Mismatches between training and test data are a ubiquitous problem for real SMT applica- tions. In this paper, we examine a type of mismatch that commonly arises when translat- ing from French and similar languages: avail- able training data is mostly formal register, but test data may well be informal register. We consider methods for defining surface trans- formations that map common informal lan- guage constructions into their formal language counterparts, or vice versa; we then describe two ways to use these mappings, either to cre- ate artificial training data or to pre-process source text at run-time. An initial evalua- tion performed using crowd-sourced compar- isons of alternate translations produced by a French-to-English SMT system suggests that both methods can improve performance, with run-time pre-processing being the more effec- tive of the two.
Full text
Structures
Research group TIM/ISSCO
Citation
(ISO format)
RAYNER, Emmanuel, BOUILLON, Pierrette, HADDOW, Barry. Using Source-Language Transformations to Address Register Mismatches in SMT. In: Proceedings of the tenth biennial conference of the Association for Machine Translation in the Americas (AMTA-2012). San Diego (California, USA). [s.l.] : [s.n.], 2012. https://archive-ouverte.unige.ch/unige:30919

204 hits

214 downloads

Update

Deposited on : 2013-11-04

Export document
Format :
Citation style :