Proceedings chapter
OA Policy
English

Unsupervised adaptation of supervised part-of-speech taggers for closely related languages

ContributorsScherrer, Yves
Presented atDublin (Ireland), 23 Aug 2014
PublisherAssociation for Computational Linguistics and Dublin City University
Publication date2014
Abstract

When developing NLP tools for low-resource languages, one is often confronted with the lack of annotated data. We propose to circumvent this bottleneck by training a supervised HMM tagger on a closely related language for which annotated data are available, and translating the words in the tagger parameter files into the low-resource language. The translation dictionaries are created with unsupervised lexicon induction techniques that rely only on raw textual data. We obtain a tagging accuracy of up to 89.08% using a Spanish tagger adapted to Catalan, which is 30.66% above the performance of an unadapted Spanish tagger, and 8.88% below the performance of a supervised tagger trained on annotated Catalan data. Furthermore, we evaluate our model on several Romance, Germanic and Slavic languages and obtain tagging accuracies of up to 92%.

Citation (ISO format)
SCHERRER, Yves. Unsupervised adaptation of supervised part-of-speech taggers for closely related languages. In: Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial). Dublin (Ireland). [s.l.] : Association for Computational Linguistics and Dublin City University, 2014. p. 30–38.
Main files (1)
Proceedings chapter (Published version)
accessLevelPublic
Identifiers
  • PID : unige:39954
Additional URL for this publicationhttp://www.aclweb.org/anthology/W/W14/W14-5304.pdf
872views
633downloads

Technical informations

Creation02/09/2014 17:45:00
First validation02/09/2014 17:45:00
Update time14/03/2023 22:34:38
Status update14/03/2023 22:34:38
Last indexation30/10/2024 20:56:03
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack