Scientific article
Open access

What is a consistent glycan composition dataset?

Published inFrontiers in analytical science, vol. 3, 1073540
Publication date2023-06-07
First online date2023-06-07

Introduction: One of the main challenges in bioinformatics has been and still is, the comparison of entities through the development of algorithms for similarity scoring and data clustering according to biologically relevant aspects. Glycoinformatics also faces this challenge, in particular regarding the automated comparison of protein and/or tissue glycomes, that remains a relatively uncharted territory.

Methods: Low and high throughput experimental glycomic and glycoproteomic results were collected, revealing a bias toward N-linked glycomes. Then, N-glycomes were considered and represented as networks of related glycan compositions as opposed to lists of glycans. They were processed and compared through a java application generating graphs and another producing a similarity matrix based on graph content. Several scoring schemes (e.g., Jaccard index or cosine) were tested and evaluated using the Matthews Correlation Coefficient, in order to capture a meaningful protein and tissue N-glycome similarity.

Results: Assuming that a glycome corresponds to a well-connected graph of glycan compositions, graph comparison has revealed gaps that can be interpreted as inconsistencies. The outcome of systematic graph comparison is both formal and practical. In principle, it is shown that the idiosyncrasy of current glycome data limits the definition of appropriate estimates for systematically comparing N-glycomes. Yet, several potentially interesting criteria could be identified in a series of use cases detailed in the study.

Discussion: Differentially expressed glycomes are usually compared manually, but the resulting work tends to remain in publications due to the lack of dedicated tools. Even manually, cross-comparison is challenging mostly because different sets of features are used from one study to the other. The work presented here enables laying down guidelines for developing a software tool comparing glycomes based on appropriate definitions of similarity and suitable methods for its evaluation and implementation.

  • Glycan composition
  • Glycome
  • Web interface
  • Glycoprotein
  • Graph
  • Similarity measure
  • Data visualisation
Citation (ISO format)
SABA, Federico, MARIETHOZ, Julien, LISACEK, Frédérique. What is a consistent glycan composition dataset? In: Frontiers in analytical science, 2023, vol. 3, p. 1073540. doi: 10.3389/frans.2023.1073540
Main files (1)
Article (Published version)
ISSN of the journal2673-9283

Technical informations

Creation01/24/2024 12:04:12 PM
First validation02/29/2024 7:35:52 AM
Update time02/29/2024 7:35:52 AM
Status update02/29/2024 7:35:52 AM
Last indexation05/06/2024 6:03:21 PM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack