Proceedings chapter
Open access

Lexicon Induction for Spoken Rusyn – Challenges and Results

Presented at Valencia (Spain), 4 April 2017
Publication date2017

This paper reports on challenges and results in developing NLP resources for spoken Rusyn. Being a Slavic minority language, Rusyn does not have any resources to make use of. We propose to build a morphosyntactic dictionary for Rusyn, combining existing resources from the etymologically close Slavic languages Russian, Ukrainian, Slovak, and Polish. We adapt these resources to Rusyn by using vowel-sensitive Levenshtein distance, hand-written language-specific transformation rules, and combinations of the two. Compared to an exact match baseline, we increase the coverage of the resulting morphological dictionary by up to 77.4% relative (42.9% absolute), which results in a tagging recall increased by 11.6% relative (9.1% absolute). Our research confirms and expands the results of previous studies showing the efficiency of using NLP resources from neighboring languages for low-resourced languages.

Citation (ISO format)
RABUS, Achim, SCHERRER, Yves. Lexicon Induction for Spoken Rusyn – Challenges and Results. In: Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing. Valencia (Spain). [s.l.] : [s.n.], 2017.
Main files (1)
Proceedings chapter (Published version)
  • PID : unige:94611

Technical informations

Creation05/25/2017 3:13:00 PM
First validation05/25/2017 3:13:00 PM
Update time03/15/2023 1:44:28 AM
Status update03/15/2023 1:44:28 AM
Last indexation01/17/2024 12:03:52 AM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack