Proceedings chapter
OA Policy
English

Multi-source morphosyntactic tagging for Spoken Rusyn

Presented atValencia (Spain), 3 April 2017
Publication date2017
Abstract

This paper deals with the development of morphosyntactic taggers for spoken varieties of the Slavic minority language Rusyn. As neither annotated corpora nor parallel corpora are electronically available for Rusyn, we propose to combine existing resources from the etymologically close Slavic languages Russian, Ukrainian, Slovak, and Polish and adapt them to Rusyn. Using MarMoT as tagging toolkit, we show that a tagger trained on a balanced set of the four source languages outperforms single language taggers by about 9%, and that additional automatically induced morphosyntactic lexicons lead to further improvements. The best observed accuracies for Rusyn are 82.4% for part-of-speech tagging and 75.5% for full morphological tagging.

Citation (ISO format)
SCHERRER, Yves, RABUS, Achim. Multi-source morphosyntactic tagging for Spoken Rusyn. In: Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects. Valencia (Spain). [s.l.] : [s.n.], 2017.
Main files (1)
Proceedings chapter (Published version)
accessLevelPublic
Identifiers
  • PID : unige:94610
483views
350downloads

Technical informations

Creation25/05/2017 17:10:00
First validation25/05/2017 17:10:00
Update time15/03/2023 02:44:28
Status update15/03/2023 02:44:28
Last indexation31/10/2024 08:04:28
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack