Open access

The Effect of Simple English Wikipedia on Machine Translation Output: An Evaluation with Cloze Procedure

ContributorsFitzroy, Kirsty
Master program titleMaîtrise universitaire en traduction et technologies Mention localisation et traduction automatique
Defense date2022

Simplified text is known to improve the quality of machine translation output. Wikipedia and its simplified counterpart, Simple English Wikipedia, have been used extensively in datasets designed to capitalize on this. However, Simple English Wikipedia has previously been found to be unreliable as a simplified corpus. In this work, passages from the original Wikipedia and Simple English Wikipedia were machine translated by two neural systems, Google Translate and DeepL, and the output evaluated by human judges: firstly for quality, and secondly for comprehensibility using cloze procedure, an uncommon method. No differences were found in output. A corpus analysis of the source texts yielded few signs of simplification. These findings support previous studies showing that Simple English Wikipedia cannot be relied upon as a simplified corpus. Results also showed DeepL to be the best performing system and cloze procedure to be a suitable evaluation protocol for comprehensibility.

  • Neural machine translation
  • Simple English Wikipedia
  • Cloze pocedure
  • Text simplification
  • Google Translate
  • DeepL
  • Gap-filling
  • Traduction automatique neuronale
  • Simplification de textes
Citation (ISO format)
FITZROY, Kirsty. The Effect of Simple English Wikipedia on Machine Translation Output: An Evaluation with Cloze Procedure. 2022.
Main files (1)
Master thesis
  • PID : unige:164439

Technical informations

Creation10/26/2022 9:45:00 AM
First validation10/26/2022 9:45:00 AM
Update time03/16/2023 8:03:44 AM
Status update03/16/2023 8:03:43 AM
Last indexation10/19/2023 7:05:19 PM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack