Scientific article
OA Policy
English

Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction

Published inBMC bioinformatics, vol. 9 Suppl 3, S9
Publication date2008
Abstract

BACKGROUND: This paper describes and evaluates a sentence selection engine that extracts a GeneRiF (Gene Reference into Functions) as defined in ENTREZ-Gene based on a MEDLINE record. Inputs for this task include both a gene and a pointer to a MEDLINE reference. In the suggested approach we merge two independent sentence extraction strategies. The first proposed strategy (LASt) uses argumentative features, inspired by discourse-analysis models. The second extraction scheme (GOEx) uses an automatic text categorizer to estimate the density of Gene Ontology categories in every sentence; thus providing a full ranking of all possible candidate GeneRiFs. A combination of the two approaches is proposed, which also aims at reducing the size of the selected segment by filtering out non-content bearing rhetorical phrases. RESULTS: Based on the TREC-2003 Genomics collection for GeneRiF identification, the LASt extraction strategy is already competitive (52.78%). When used in a combined approach, the extraction task clearly shows improvement, achieving a Dice score of over 57% (+10%). CONCLUSIONS: Argumentative representation levels and conceptual density estimation using Gene Ontology contents appear complementary for functional annotation in proteomics.

Keywords
  • Algorithms
  • Artificial Intelligence
  • Genes/genetics
  • MEDLINE
  • Natural Language Processing
  • Pattern Recognition, Automated/methods
  • Proteins/classification/genetics
  • Sensitivity and Specificity
  • Terminology as Topic
  • Vocabulary, Controlled
Citation (ISO format)
GOBEILL, Julien et al. Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction. In: BMC bioinformatics, 2008, vol. 9 Suppl 3, p. S9. doi: 10.1186/1471-2105-9-S3-S9
Main files (1)
Article (Accepted version)
accessLevelPublic
Identifiers
Additional URL for this publicationhttp://www.ncbi.nlm.nih.gov/sites/entrez
Journal ISSN1471-2105
612views
43downloads

Technical informations

Creation04/03/2009 11:26:00
First validation04/03/2009 11:26:00
Update time30/03/2023 11:39:44
Status update30/03/2023 11:39:44
Last indexation29/10/2024 12:18:29
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack