en
Doctoral thesis
Open access
English

Modular text mining for protein-protein interactions extraction

ContributorsEhrler, Frédéric
Defense date2009-06-30
Abstract

Since researchers discovered that proteins do not function isolated in a cell but act in multi-protein complexes, the number of publications about protein-protein interactions (PPI) has increased significantly. This large amount of unstructured textual information is difficult to exploit by humans as these have trouble to localize the information of interest efficiently. Therefore, it is necessary to develop techniques to automate the extraction of protein-protein interactions from free text. In this thesis, we explore the PPI extraction from the point of view of database curators and study the dependencies between the different steps of the PPI extraction process. It starts with the recognition of articles containing a PPI. Once done, the proteins are located in the selected documents. These proteins must then be unambiguously identified, and finally the interactions are extracted. These different steps allow u to study exhaustively various data mining techniques. The outcomes of this thesis confirm the crucial importance of the performance consistency of the tasks involved in a process over their individual performance. More specifically, the results reveal that each time an error occurs at a given step, it influences all the steps downstream and finally strongly reduces the precision and recall of the generated interactions.

eng
Citation (ISO format)
EHRLER, Frédéric. Modular text mining for protein-protein interactions extraction. 2009. doi: 10.13097/archive-ouverte/unige:12936
Main files (1)
Thesis
accessLevelPublic
Identifiers
606views
953downloads

Technical informations

Creation12/16/2010 4:08:00 PM
First validation12/16/2010 4:08:00 PM
Update time03/14/2023 4:10:39 PM
Status update03/14/2023 4:10:39 PM
Last indexation01/29/2024 7:05:58 PM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack