UNIGE document Chapitre d'actes
previous document  unige:94611  next document
add to browser collection
Title

Lexicon Induction for Spoken Rusyn – Challenges and Results

Authors
Rabus, Achim
Published in Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing. Valencia (Spain) - 4 April 2017 - . 2017
Abstract This paper reports on challenges and results in developing NLP resources for spoken Rusyn. Being a Slavic minority language, Rusyn does not have any resources to make use of. We propose to build a morphosyntactic dictionary for Rusyn, combining existing resources from the etymologically close Slavic languages Russian, Ukrainian, Slovak, and Polish. We adapt these resources to Rusyn by using vowel-sensitive Levenshtein distance, hand-written language-specific transformation rules, and combinations of the two. Compared to an exact match baseline, we increase the coverage of the resulting morphological dictionary by up to 77.4% relative (42.9% absolute), which results in a tagging recall increased by 11.6% relative (9.1% absolute). Our research confirms and expands the results of previous studies showing the efficiency of using NLP resources from neighboring languages for low-resourced languages.
Full text
Structures
Research group Laboratoire d'Analyse et de Traitement du Langage (LATL)
Citation
(ISO format)
RABUS, Achim, SCHERRER, Yves. Lexicon Induction for Spoken Rusyn – Challenges and Results. In: Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing. Valencia (Spain). [s.l.] : [s.n.], 2017. https://archive-ouverte.unige.ch/unige:94611

86 hits

62 downloads

Update

Deposited on : 2017-06-02

Export document
Format :
Citation style :