UNIGE document Chapitre d'actes
previous document  unige:38812  next document
add to browser collection
Title

A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages

Authors
Sagot, Benoît
Published in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). Reykjavik (Iceland) - 26-31 May 2014 - ELRA. 2014
Abstract In this paper, we describe our generic approach for transferring part-of-speech annotations from a resourced language towards an etymologically closely related non-resourced language, without using any bilingual (i.e., parallel) data. We first induce a translation lexicon from monolingual corpora, based on cognate detection followed by cross-lingual contextual similarity. Second, POS information is transferred from the resourced language along translation pairs to the non-resourced language and used for tagging the corpus. We evaluate our methods on three language families, consisting of five Romance languages, three Germanic languages and five Slavic languages. We obtain tagging accuracies of up to 91.6%.
Keywords Part-of-speech taggingLexicon inductionClosely related languages
Full text
Structures
Research group Laboratoire d'Analyse et de Traitement du Langage (LATL)
Citation
(ISO format)
SCHERRER, Yves, SAGOT, Benoît. A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). Reykjavik (Iceland). [s.l.] : ELRA, 2014. https://archive-ouverte.unige.ch/unige:38812

434 hits

214 downloads

Update

Deposited on : 2014-07-22

Export document
Format :
Citation style :