UNIGE document Doctoral Thesis
previous document  unige:18221  next document
add to browser collection

Understanding signal sequences with machine learning

Defense Thèse de doctorat : Univ. Genève, 2008 - Sc. 4073 - 2008/02/18
Abstract Proteins synthesized in the cell must be transported to the correct cellular compartment so that they can achieve their function. This process is a fundamental aspect of cell protein metabolism. All the proteins that must be secreted, carry a particular region of conserved function, the signal sequence (SS) or signal peptide, located in N-terminal extremity. To address the problem of correctly discriminating secreted proteins from the other ones (cytosolic), artificial intelligence techniques have been considered. The training set was composed of E. coli proteins whose location was determined experimentally. We used a set of wild type proteins completed by two mutants sets: (i) 15 SS which have lost their function and (ii) 240 proteins which gained SS function. We used evolutionary computing to generate new features able to better predict secretion. The idea here was to extend existent theory. To reach this goal, we designed a generic framework to described physico-chemical requirements. To reduce the huge number of amino-acid properties to a tractable amount, we proposed a clustering method using on a novel correlation based distance. Resulting performances are higher than preceding attempts on wild-type proteins (95.8% of cross-validated accuracy), but also on mutant collections. Furthermore, the new properties give new insights about the signal sequence requirements. An analysis of these new features, along with their usage in the decision trees, allowed us to explain some apparently contradicting experiments about signal sequence secondary structure.
Keywords Signal sequencesSecretionProteomicsMachine learningDecision treeClusteringGenetic algorithms
URN: urn:nbn:ch:unige-182215
Full text
Thesis (7.7 MB) - public document Free access
Research group Translocation des protéines à travers les membranes (169)
(ISO format)
FALCONE, Jean-Luc. Understanding signal sequences with machine learning. Université de Genève. Thèse, 2008. doi: 10.13097/archive-ouverte/unige:18221 https://archive-ouverte.unige.ch/unige:18221

862 hits



Deposited on : 2012-01-30

Export document
Format :
Citation style :