

![]() |
Using Source-Language Transformations to Address Register Mismatches in SMT |
|
Authors | ||
Published in | Proceedings of the tenth biennial conference of the Association for Machine Translation in the Americas (AMTA-2012). San Diego (California, USA) - Oct 28-Nov 1 - . 2012 | |
Abstract | Mismatches between training and test data are a ubiquitous problem for real SMT applica- tions. In this paper, we examine a type of mismatch that commonly arises when translat- ing from French and similar languages: avail- able training data is mostly formal register, but test data may well be informal register. We consider methods for defining surface trans- formations that map common informal lan- guage constructions into their formal language counterparts, or vice versa; we then describe two ways to use these mappings, either to cre- ate artificial training data or to pre-process source text at run-time. An initial evalua- tion performed using crowd-sourced compar- isons of alternate translations produced by a French-to-English SMT system suggests that both methods can improve performance, with run-time pre-processing being the more effec- tive of the two. | |
Full text | ||
Structures | ||
Research group | TIM/ISSCO | |
Citation (ISO format) | RAYNER, Emmanuel, BOUILLON, Pierrette, HADDOW, Barry. Using Source-Language Transformations to Address Register Mismatches in SMT. In: Proceedings of the tenth biennial conference of the Association for Machine Translation in the Americas (AMTA-2012). San Diego (California, USA). [s.l.] : [s.n.], 2012. https://archive-ouverte.unige.ch/unige:30919 |