Proceedings chapter
Open access

Using Source-Language Transformations to Address Register Mismatches in SMT

Presented at San Diego (California, USA), Oct 28-Nov 1
Publication date2012

Mismatches between training and test data are a ubiquitous problem for real SMT applica- tions. In this paper, we examine a type of mismatch that commonly arises when translat- ing from French and similar languages: avail- able training data is mostly formal register, but test data may well be informal register. We consider methods for defining surface trans- formations that map common informal lan- guage constructions into their formal language counterparts, or vice versa; we then describe two ways to use these mappings, either to cre- ate artificial training data or to pre-process source text at run-time. An initial evalua- tion performed using crowd-sourced compar- isons of alternate translations produced by a French-to-English SMT system suggests that both methods can improve performance, with run-time pre-processing being the more effec- tive of the two.

Research group
Citation (ISO format)
RAYNER, Emmanuel, BOUILLON, Pierrette, HADDOW, Barry. Using Source-Language Transformations to Address Register Mismatches in SMT. In: Proceedings of the tenth biennial conference of the Association for Machine Translation in the Americas (AMTA-2012). San Diego (California, USA). [s.l.] : [s.n.], 2012.
Main files (1)
Proceedings chapter (Accepted version)
  • PID : unige:30919

Technical informations

Creation11/04/2013 6:56:00 PM
First validation11/04/2013 6:56:00 PM
Update time03/14/2023 8:35:52 PM
Status update03/14/2023 8:35:52 PM
Last indexation01/16/2024 8:06:05 AM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack