Proceedings chapter
OA Policy
English

Normalising orthographic and dialectal variants for the automatic processing of Swiss German

Presented atPoznan, 27-29 Nov 2015
Publication date2015
Abstract

Swiss dialects of German are, unlike most dialects of well standardised languages, widely used in everyday communication. Despite this fact, they lack tools and resources for natural language processing. The main reason for this is the fact that the dialects are mostly spoken and that written resources are small and highly inconsistent. This paper addresses the great variability in writing that poses a problem for automatic processing. We propose an automatic approach to normalising the variants to a single representation intended for processing tools' internal use (not shown to human users). We manually create a sample of transcribed and normalised texts, which we use to train and test three methods based on machine translation: word-by-word mappings, character-based machine translation, and language modelling. We show that an optimal combination of the three approaches gives better results than any of them separately.

Citation (ISO format)
SAMARDZIC, Tanja, SCHERRER, Yves, GLASER, Elvira. Normalising orthographic and dialectal variants for the automatic processing of Swiss German. In: Proceedings of the 7th Language and Technology Conference. Poznan. [s.l.] : [s.n.], 2015.
Main files (1)
Proceedings chapter (Accepted version)
accessLevelPublic
Identifiers
  • PID : unige:82397
930views
478downloads

Technical informations

Creation01/04/2016 12:32:00
First validation01/04/2016 12:32:00
Update time15/03/2023 00:15:23
Status update15/03/2023 00:15:22
Last indexation31/10/2024 03:07:14
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack