en
Proceedings chapter
Open access
English

Word-Based Dialect Identification with Georeferenced Rules

Published inProceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Editors Hang Li & Lluís Màrquez, p. 1151-1161
Presented at Boston (USA), 9-11 October 2010
PublisherStroudsburg, PA (USA) : The Association for Computational Linguistics
Publication date2010
Abstract

We present a novel approach for (written) dialect identification based on the discriminative potential of entire words. We generate Swiss German dialect words from a Standard German lexicon with the help of hand-crafted phonetic/graphemic rules that are associated with occurrence maps extracted from a linguistic atlas created through extensive empirical fieldwork. In comparison with a character n-gram approach to dialect identification, our model is more robust to individual spelling differences, which are frequently encountered in non-standardized dialect writing. Moreover, it covers the whole Swiss German dialect continuum, which trained models struggle to achieve due to sparsity of training data.

Citation (ISO format)
SCHERRER, Yves, RAMBOW, Owen. Word-Based Dialect Identification with Georeferenced Rules. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Boston (USA). Stroudsburg, PA (USA) : The Association for Computational Linguistics, 2010. p. 1151–1161.
Main files (1)
Proceedings chapter
accessLevelPublic
Identifiers
  • PID : unige:22821
ISBN978-1-932432-86-2
569views
190downloads

Technical informations

Creation08/29/2012 2:28:00 PM
First validation08/29/2012 2:28:00 PM
Update time03/14/2023 5:40:27 PM
Status update03/14/2023 5:40:27 PM
Last indexation02/12/2024 8:25:02 PM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack