Master
OA Policy
English

Automatic post-editing of subtitles - Rule-based post-editing of subtitles from Swiss German to Standard German

Master program titleMaîtrise universitaire en traitement informatique multilingue
Defense date2022
Abstract

This master thesis is part of the PASSAGE project, which aims to develop a system to automatically generate Standard German subtitles for spoken Swiss German in collaboration with the « Schweizer Radio and Fernsehen » (SRF). For this task, a normalized transcription from speech recognition needs to be post-edited for the training of an automatic post-editing system. The goal of the thesis is to develop and evaluate a rule-based automatic post-editing system, which is intended to assist post-editors in generating corrected segments. The system exploits regular expressions and rules using the natural language processing library spaCy. The developed rules cover errors due to spoken language and specific Swiss German phenomenons. The system detected errors (performed modifications) in approx. 5% (4%) of the normalized transcription input. A human evaluation confirmed the necessity (correctness) of the modifications in 97% (88%) of the cases.

Keywords
  • Rule-based post-editing
  • Automatic post-editing
  • APE
  • Swiss German
  • Regular expressions
  • Regex
  • SpaCy
  • Natural language processing
  • Nlp
Citation (ISO format)
HABERKORN, Veronika Christine. Automatic post-editing of subtitles - Rule-based post-editing of subtitles from Swiss German to Standard German. Master, 2022.
Main files (1)
Master thesis
accessLevelPublic
Identifiers
  • PID : unige:160778
345views
419downloads

Technical informations

Creation26/04/2022 14:37:00
First validation26/04/2022 14:37:00
Update time16/03/2023 06:31:58
Status update16/03/2023 06:31:57
Last indexation17/12/2024 15:37:46
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack