UNIGE document Chapitre d'actes
previous document  unige:94610  next document
add to browser collection
Title

Multi-source morphosyntactic tagging for Spoken Rusyn

Authors
Rabus, Achim
Published in Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects. Valencia (Spain) - 3 April 2017 - . 2017
Abstract This paper deals with the development of morphosyntactic taggers for spoken varieties of the Slavic minority language Rusyn. As neither annotated corpora nor parallel corpora are electronically available for Rusyn, we propose to combine existing resources from the etymologically close Slavic languages Russian, Ukrainian, Slovak, and Polish and adapt them to Rusyn. Using MarMoT as tagging toolkit, we show that a tagger trained on a balanced set of the four source languages outperforms single language taggers by about 9%, and that additional automatically induced morphosyntactic lexicons lead to further improvements. The best observed accuracies for Rusyn are 82.4% for part-of-speech tagging and 75.5% for full morphological tagging.
Full text
Structures
Research group Laboratoire d'Analyse et de Traitement du Langage (LATL)
Citation
(ISO format)
SCHERRER, Yves, RABUS, Achim. Multi-source morphosyntactic tagging for Spoken Rusyn. In: Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects. Valencia (Spain). [s.l.] : [s.n.], 2017. https://archive-ouverte.unige.ch/unige:94610

108 hits

56 downloads

Update

Deposited on : 2017-06-02

Export document
Format :
Citation style :