en
Proceedings chapter
Open access
English

A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages

Presented at Reykjavik (Iceland), 26-31 May 2014
PublisherELRA
Publication date2014
Abstract

In this paper, we describe our generic approach for transferring part-of-speech annotations from a resourced language towards an etymologically closely related non-resourced language, without using any bilingual (i.e., parallel) data. We first induce a translation lexicon from monolingual corpora, based on cognate detection followed by cross-lingual contextual similarity. Second, POS information is transferred from the resourced language along translation pairs to the non-resourced language and used for tagging the corpus. We evaluate our methods on three language families, consisting of five Romance languages, three Germanic languages and five Slavic languages. We obtain tagging accuracies of up to 91.6%.

Keywords
  • Part-of-speech tagging
  • Lexicon induction
  • Closely related languages
Citation (ISO format)
SCHERRER, Yves, SAGOT, Benoît. A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). Reykjavik (Iceland). [s.l.] : ELRA, 2014.
Main files (1)
Proceedings chapter (Accepted version)
accessLevelPublic
Identifiers
  • PID : unige:38812
719views
525downloads

Technical informations

Creation06/30/2014 1:56:00 PM
First validation06/30/2014 1:56:00 PM
Update time03/14/2023 9:27:39 PM
Status update03/14/2023 9:27:38 PM
Last indexation01/16/2024 11:22:59 AM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack