en
Scientific article
Open access
English

Syntactic concordancing and multi-word expression detection

Published inInt. J. Data Mining, Modelling and Management, vol. 5, no. 2, p. 158-181
Publication date2013
Abstract

Concordancers are tools that display the contexts of a given word in a corpus. Also called key word in context (KWIC), these tools are nowadays indispensable in the work of lexicographers, linguists, and translators. We present an enhanced type of concordancer that integrates syntactic information on sentence structure as well as statistical information on word cooccurrence in order to detect and display those words from the context that are most strongly related to the word under investigation. This tool considerably alleviates the users' task, by highlighting syntactically well-formed word combinations that are likely to form complex lexical units, i.e., multi-word expressions. One of the key distinctive features of the tool is its multilingualism, as syntax-based multi-word expression detection is available for multiple languages and parallel concordancing enables users to consult the version of a source context in another language, when multilingual parallel corpora are available. In this article, we describe the underlying methodology and resources used by the system, its architecture, and its recently developed online version. We also provide relevant performance evaluation results for the main system components, focusing on the comparison between syntax-based and syntax-free approaches.

Keywords
  • Collocation extraction
  • Collocations
  • Concordancers
  • Key word in context
  • KWIC
  • Lexical acquisition
  • Lexical resources
  • Linguistic analysis
  • Multilingualism
  • Multi-word expression detection
  • Multi-word expressions
  • MWE
  • Natural language processing
  • NLP
  • Parallel concordancing tool
  • Syntactic analysis
  • Syntactic concordancing
  • Syntax
  • Translation
  • Word cooccurrence
Citation (ISO format)
SERETAN, Violeta, WEHRLI, Eric. Syntactic concordancing and multi-word expression detection. In: Int. J. Data Mining, Modelling and Management, 2013, vol. 5, n° 2, p. 158–181. doi: 10.1504/ijdmmm.2013.053694
Main files (1)
Article (Published version)
accessLevelPublic
Identifiers
620views
573downloads

Technical informations

Creation06/26/2014 11:21:00 AM
First validation06/26/2014 11:21:00 AM
Update time03/14/2023 9:23:49 PM
Status update03/14/2023 9:23:49 PM
Last indexation01/16/2024 11:10:34 AM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack