en
Proceedings chapter
Open access
English

Unsupervised adaptation of supervised part-of-speech taggers for closely related languages

ContributorsScherrer, Yves
Presented at Dublin (Ireland), 23 Aug 2014
PublisherAssociation for Computational Linguistics and Dublin City University
Publication date2014
Abstract

When developing NLP tools for low-resource languages, one is often confronted with the lack of annotated data. We propose to circumvent this bottleneck by training a supervised HMM tagger on a closely related language for which annotated data are available, and translating the words in the tagger parameter files into the low-resource language. The translation dictionaries are created with unsupervised lexicon induction techniques that rely only on raw textual data. We obtain a tagging accuracy of up to 89.08% using a Spanish tagger adapted to Catalan, which is 30.66% above the performance of an unadapted Spanish tagger, and 8.88% below the performance of a supervised tagger trained on annotated Catalan data. Furthermore, we evaluate our model on several Romance, Germanic and Slavic languages and obtain tagging accuracies of up to 92%.

Citation (ISO format)
SCHERRER, Yves. Unsupervised adaptation of supervised part-of-speech taggers for closely related languages. In: Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial). Dublin (Ireland). [s.l.] : Association for Computational Linguistics and Dublin City University, 2014. p. 30–38.
Main files (1)
Proceedings chapter (Published version)
accessLevelPublic
Identifiers
  • PID : unige:39954
813views
613downloads

Technical informations

Creation09/02/2014 3:45:00 PM
First validation09/02/2014 3:45:00 PM
Update time03/14/2023 9:34:38 PM
Status update03/14/2023 9:34:38 PM
Last indexation01/16/2024 11:45:57 AM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack