Proceedings chapter
Open access

Normalising orthographic and dialectal variants for the automatic processing of Swiss German

Presented at Poznan, 27-29 Nov 2015
Publication date2015

Swiss dialects of German are, unlike most dialects of well standardised languages, widely used in everyday communication. Despite this fact, they lack tools and resources for natural language processing. The main reason for this is the fact that the dialects are mostly spoken and that written resources are small and highly inconsistent. This paper addresses the great variability in writing that poses a problem for automatic processing. We propose an automatic approach to normalising the variants to a single representation intended for processing tools' internal use (not shown to human users). We manually create a sample of transcribed and normalised texts, which we use to train and test three methods based on machine translation: word-by-word mappings, character-based machine translation, and language modelling. We show that an optimal combination of the three approaches gives better results than any of them separately.

Citation (ISO format)
SAMARDZIC, Tanja, SCHERRER, Yves, GLASER, Elvira. Normalising orthographic and dialectal variants for the automatic processing of Swiss German. In: Proceedings of the 7th Language and Technology Conference. Poznan. [s.l.] : [s.n.], 2015.
Main files (1)
Proceedings chapter (Accepted version)
  • PID : unige:82397

Technical informations

Creation04/01/2016 12:32:00 PM
First validation04/01/2016 12:32:00 PM
Update time03/15/2023 12:15:23 AM
Status update03/15/2023 12:15:22 AM
Last indexation01/16/2024 8:33:30 PM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack