Proceedings chapter
OA Policy
English

Automatic normalisation of the Swiss German ArchiMob corpus using character-level machine translation

Presented atBochum (Germany), September 19-21, 2016
Published inDipper, S., Neubarth, F. & Zinsmeister , H. (Ed.), Proceedings of the 13th Conference on Natural Language Processing (KONVENS), p. 248-255
PublisherBochum : Ruhr-Universität Bochum
Collection
  • Bochumer Linguistische Arbeitsberichte; 16
Publication date2016
Abstract

The Swiss German dialect corpus ArchiMob poses great challenges for NLP and corpus linguistic research due to the massive amount of variation found in the transcriptions: dialectal variation is combined with intra-speaker variation and with transcriber inconsistencies. This variation is reduced through the addition of a normalisation layer. In this paper, we propose to use character-level machine translation to learn the normalisation process. We show that a character-level machine translation system trained on pairs of segments (not pairs of words) and including multiple language models is able to achieve up to 90.46% of word normalisation accuracy, an error reduction of 45% over a strong baseline and of 34% over a heterogeneous system proposed by Samardzic et al. (2015).

Citation (ISO format)
SCHERRER, Yves, LJUBEŠIĆ, Nikola. Automatic normalisation of the Swiss German ArchiMob corpus using character-level machine translation. In: Proceedings of the 13th Conference on Natural Language Processing (KONVENS). Dipper, S., Neubarth, F. & Zinsmeister , H. (Ed.). Bochum (Germany). Bochum : Ruhr-Universität Bochum, 2016. p. 248–255. (Bochumer Linguistische Arbeitsberichte)
Main files (1)
Proceedings chapter (Accepted version)
accessLevelPublic
Identifiers
  • PID : unige:90846
672views
387downloads

Technical informations

Creation03/01/2017 12:03:00
First validation03/01/2017 12:03:00
Update time12/11/2024 13:13:21
Status update12/11/2024 13:13:21
Last indexation12/11/2024 13:13:23
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack