Glycans are carbohydrate molecules historically considered as having energetic (starch and glycogen) and mechanical roles only (cellulose and chitin), but have been found having more complex structural and informational functions for several decades.
Glycosylation is one of the most common and important Post-Translational Modification (PTM), being crucial both for the correct folding of proteins and for regulating protein activity. It is generally admitted that at least 50% of proteins in most organisms bear a collection of carbohydrates on their surface acting as an interface with their surroundings. These protein-sugar molecular complexes can assume a great range of structures and are part of the set of molecules known as glycoconjugates, typically formed by a lipid or a protein bound to a carbohydrate. Thus glycans can be involved in protein-protein, protein-ligand and protein-matrix interactions as well as cell-cell, cell-matrix and host-pathogen interactions.
The collection of all glycans in a cell at a given time under given micro-environmental conditions is the glycome. Its high variability and versatility is made possible by the non-template driven glycan synthesis and constitutes one of the major challenges in glycobiology and bioinformatics, because the prediction of a glycan function based on its sequence, and vice versa, is almost impossible, and also because a glycome can go through rapid changes in response to different types of stimuli. This causes a deluge of information that makes computers essential to biological research because of their ability to deal with large amount of data, from storage to analysis. From the integration of computer science, and biology stems the science of bioinformatics, used to organise and visualise information, and to develop computational tools to better understand biological issues.
It is therefore decisive to develop tools and methodologies capable of interpreting the large amount of data on glycans to expand the organizational, structural and functional understanding of glycomes. In this challenge, great effort is invested by the Swiss Institute of Bioinformatics (SIB) through their GlyConnect platform hosting data and tools to help characterise protein glycosylation, among which the Compozitor web tool for visualization and comparison of glycomes is an example. One of the main challenges in bioinformatics research has always been the comparison of entities through the development of algorithms for similarity scoring and data clustering according to biologically relevant aspects. Glycoinformatics share this challenge with other fields, although the comparison of glycosylation networks is yet an unexplored territory. The aim of the work presented here is therefore to move the first steps in the comparison between glycomes and start a discussion on their similarity by reflecting on their main characteristics and on the most suited methods to measure them. The way advanced in this Master Thesis to achieve these goals is through the Doppelganger.java application specifically devoloped with this purpose and proposed as a possible implementation of Compozitor for scoring glycome similarities. During the process, special attention has been given to the interesting implication of the virtual node feature of Compozitor, leading to precious intuitions that will be thoroughly presented in a dedicated publication and of which an overview is given in this Thesis.