UNIGE document Scientific Article
previous document  unige:2728  next document
add to browser collection

Overview of BioCreative II gene normalization

Morgan, Alexander A.
Lu, Zhiyong
Sun, Chengjie
Liu, Heng-hui
Torres, Rafael
Krauthammer, Michael
Lau, William W.
Liu, Hongfang
show hidden authors show all authors [1 - 20]
Published in GenomeBiology.com. 2008, vol. 9, no. Suppl 2, p. S3-S19
Abstract BACKGROUND: The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes (often from different organisms). For BioCreative II, the task was to list the Entrez Gene identifiers for human genes or gene products mentioned in PubMed/MEDLINE abstracts. We selected abstracts associated with articles previously curated for human genes. We provided 281 expert-annotated abstracts containing 684 gene identifiers for training, and a blind test set of 262 documents containing 785 identifiers, with a gold standard created by expert annotators. Inter-annotator agreement was measured at over 90%. RESULTS: Twenty groups submitted one to three runs each, for a total of 54 runs. Three systems achieved F-measures (balanced precision and recall) between 0.80 and 0.81. Combining the system outputs using simple voting schemes and classifiers obtained improved results; the best composite system achieved an F-measure of 0.92 with 10-fold cross-validation. A 'maximum recall' system based on the pooled responses of all participants gave a recall of 0.97 (with precision 0.23), identifying 763 out of 785 identifiers. CONCLUSION: Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement. These results show promise as tools to link the literature with biological databases.
Keywords Abstracting and Indexing as TopicAnimalsComputational Biology/methodsDatabases, GeneticGenesHumansMEDLINEPubMedReproducibility of ResultsSocieties, Scientific
PMID: 18834494
Full text
Article (Accepted version) (368 Kb) - public document Free access
Supplemental data - I (29 Kb) - public document Free access
Supplemental data - II (81 Kb) - public document Free access
Supplemental data - III (29 Kb) - public document Free access
Supplemental data - IV (24 Kb) - public document Free access
Other version: http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=18834494
(ISO format)
MORGAN, Alexander A. et al. Overview of BioCreative II gene normalization. In: GenomeBiology.com, 2008, vol. 9, n° Suppl 2, p. S3-S19. doi: 10.1186/gb-2008-9-s2-s3 https://archive-ouverte.unige.ch/unige:2728

620 hits



Deposited on : 2009-09-21

Export document
Format :
Citation style :