UNIGE document Scientific Article
previous document  unige:1065  next document
add to browser collection
Title

Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction

Authors
Tbahriti, Imad
Published in BMC Bioinformatics. 2008, vol. 9 Suppl 3, p. S9
Abstract BACKGROUND: This paper describes and evaluates a sentence selection engine that extracts a GeneRiF (Gene Reference into Functions) as defined in ENTREZ-Gene based on a MEDLINE record. Inputs for this task include both a gene and a pointer to a MEDLINE reference. In the suggested approach we merge two independent sentence extraction strategies. The first proposed strategy (LASt) uses argumentative features, inspired by discourse-analysis models. The second extraction scheme (GOEx) uses an automatic text categorizer to estimate the density of Gene Ontology categories in every sentence; thus providing a full ranking of all possible candidate GeneRiFs. A combination of the two approaches is proposed, which also aims at reducing the size of the selected segment by filtering out non-content bearing rhetorical phrases. RESULTS: Based on the TREC-2003 Genomics collection for GeneRiF identification, the LASt extraction strategy is already competitive (52.78%). When used in a combined approach, the extraction task clearly shows improvement, achieving a Dice score of over 57% (+10%). CONCLUSIONS: Argumentative representation levels and conceptual density estimation using Gene Ontology contents appear complementary for functional annotation in proteomics.
Keywords AlgorithmsArtificial IntelligenceGenes/geneticsMEDLINENatural Language ProcessingPattern Recognition, Automated/methodsProteins/classification/geneticsSensitivity and SpecificityTerminology as TopicVocabulary, Controlled
Stable URL http://archive-ouverte.unige.ch/unige:1065
Full text
Identifiers
PMID: 18426554
Structures
Research group Swiss-Prot Research Group
140 hits and 152 downloads since 2009-03-04
Update
Export document
Format :
Citation style :