Doctoral thesis
OA Policy
English

Understanding signal sequences with machine learning

ContributorsFalcone, Jean-Lucorcid
Defense date2008-02-18
Abstract

Proteins synthesized in the cell must be transported to the correct cellular compartment so that they can achieve their function. This process is a fundamental aspect of cell protein metabolism. All the proteins that must be secreted, carry a particular region of conserved function, the signal sequence (SS) or signal peptide, located in N-terminal extremity. To address the problem of correctly discriminating secreted proteins from the other ones (cytosolic), artificial intelligence techniques have been considered. The training set was composed of E. coli proteins whose location was determined experimentally. We used a set of wild type proteins completed by two mutants sets: (i) 15 SS which have lost their function and (ii) 240 proteins which gained SS function. We used evolutionary computing to generate new features able to better predict secretion. The idea here was to extend existent theory. To reach this goal, we designed a generic framework to described physico-chemical requirements. To reduce the huge number of amino-acid properties to a tractable amount, we proposed a clustering method using on a novel correlation based distance. Resulting performances are higher than preceding attempts on wild-type proteins (95.8% of cross-validated accuracy), but also on mutant collections. Furthermore, the new properties give new insights about the signal sequence requirements. An analysis of these new features, along with their usage in the decision trees, allowed us to explain some apparently contradicting experiments about signal sequence secondary structure.

Keywords
  • Signal sequences
  • Secretion
  • Proteomics
  • Machine learning
  • Decision tree
  • Clustering
  • Genetic algorithms
Citation (ISO format)
FALCONE, Jean-Luc. Understanding signal sequences with machine learning. Doctoral Thesis, 2008. doi: 10.13097/archive-ouverte/unige:18221
Main files (1)
Thesis
accessLevelPublic
Identifiers
1238views
1184downloads

Technical informations

Creation25/01/2012 09:57:00
First validation25/01/2012 09:57:00
Update time14/03/2023 17:07:32
Status update14/03/2023 17:07:32
Last indexation13/05/2025 15:58:48
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack