Proceedings chapter
OA Policy
English

The Trilingual ALLEGRA Corpus: Presentation and Possible Use for Lexicon Induction

Presented atIstanbul (Turkey), 23-24 May
Published inCalzolari, N. ; Choukri, K. ; Declerck, T. ; Uğur Doğan, M. ; Maegaard, B. ; Mariani, J. ; Odijk,J. & Piperidis, S. (Ed.), Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), p. 2890-2896
PublisherEuropean Language Resources Association (ELRA)
Publication date2012
Abstract

In this paper, we present a trilingual parallel corpus for German, Italian and Romansh, a Swiss minority language spoken in the canton of Grisons. The corpus called ALLEGRA contains press releases automatically gathered from the website of the cantonal administration of Grisons. Texts have been preprocessed and aligned with a current state-of-the-art sentence aligner. The corpus is one of the first of its kind, and can be of great interest, particularly for the creation of natural language processing resources and tools for Romansh. We illustrate the use of such a trilingual resource for automatic induction of bilingual lexicons, which is a real challenge for under-represented languages. We induce a bilingual lexicon for German-Romansh by phrase alignment and evaluate the resulting entries with the help of a reference lexicon. We then show that the use of the third language of the corpus ― Italian ― as a pivot language can improve the precision of the induced lexicon, without loss in terms of quality of the extracted pairs.

Keywords
  • Corpus (creation-annotation-etc.)
  • Endangered languages
  • Lexicon
  • Lexical database
Citation (ISO format)
SCHERRER, Yves, CARTONI, Bruno. The Trilingual ALLEGRA Corpus: Presentation and Possible Use for Lexicon Induction. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC′12). Calzolari, N. ; Choukri, K. ; Declerck, T. ; Uğur Doğan, M. ; Maegaard, B. ; Mariani, J. ; Odijk,J. & Piperidis, S. (Ed.). Istanbul (Turkey). [s.l.] : European Language Resources Association (ELRA), 2012. p. 2890–2896.
Main files (1)
Proceedings chapter
accessLevelPublic
Identifiers
  • PID : unige:22781
ISBN978-2-9517408-7-7
564views
346downloads

Technical informations

Creation29/08/2012 16:14:00
First validation29/08/2012 16:14:00
Update time14/03/2023 18:40:17
Status update14/03/2023 18:40:17
Last indexation29/10/2024 21:32:53
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack