UNIGE document Chapitre d'actes
previous document  unige:39954  next document
add to browser collection
Title

Unsupervised adaptation of supervised part-of-speech taggers for closely related languages

Author
Published in Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial). Dublin (Ireland) - 23 Aug 2014 - Association for Computational Linguistics and Dublin City University. 2014, p. 30-38
Abstract When developing NLP tools for low-resource languages, one is often confronted with the lack of annotated data. We propose to circumvent this bottleneck by training a supervised HMM tagger on a closely related language for which annotated data are available, and translating the words in the tagger parameter files into the low-resource language. The translation dictionaries are created with unsupervised lexicon induction techniques that rely only on raw textual data. We obtain a tagging accuracy of up to 89.08% using a Spanish tagger adapted to Catalan, which is 30.66% above the performance of an unadapted Spanish tagger, and 8.88% below the performance of a supervised tagger trained on annotated Catalan data. Furthermore, we evaluate our model on several Romance, Germanic and Slavic languages and obtain tagging accuracies of up to 92%.
Full text
Structures
Research group Laboratoire d'Analyse et de Traitement du Langage (LATL)
Citation
(ISO format)
SCHERRER, Yves. Unsupervised adaptation of supervised part-of-speech taggers for closely related languages. In: Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial). Dublin (Ireland). [s.l.] : Association for Computational Linguistics and Dublin City University, 2014. p. 30-38. https://archive-ouverte.unige.ch/unige:39954

282 hits

236 downloads

Update

Deposited on : 2014-09-03

Export document
Format :
Citation style :