Doctoral thesis

Development of a Prioritization Strategy to Efficiently Select Natural Extracts from Large Biodiverse Datasets with a High Structural Novelty Potential

Number of pages629
Imprimatur date2023
Defense date2023

Efficient prioritization of samples in natural extract (NEs) libraries has become a critical aspect in discovering new specialized metabolites in Natural Products (NP) research. However, prioritizing NEs from given collections is still challenging due to the absence of workflows that integrate multiple sources of information to facilitate comprehensive data interpretation. To achieve optimal decision-making, the results from metabolites profiling techniques and literature data must be organized, processed, and interpreted. This task becomes particularly complex when dealing with large NEs collections (consisting of hundreds to thousands of samples). It requires an intelligent selection of samples to create manageable subsets tailored to each study. This challenge constitutes the focus of this thesis project, namely the development of a prioritization strategy to efficiently select NEs from large datasets with a high structural novelty potential. Ideally, the approach should allow for faster and more rational decision-making for sample selection and speed up the discovery of novel NPs.

To address this challenge, a bioinformatic tool called Inventa was developed to assist in the selection of extracts based on their potential for structural novelty. Inventa generates a combined score considering untargeted UHPLC-HRMS2 data, spectral annotations, literature reports, and chemically informed sample comparisons. The application of this tool on a set of 76 plant extracts from the Celastraceae family led to the discovery of 13 new β-agarofuran compounds, including 5 with a new base scaffold, thus providing proof of concept for Inventa's effectiveness.

This workflow can be implemented in aligned and unaligned (do not require an RT-aligned feature table) UHPLC-HRMS2 data sets. The unaligned workflow allows the processing of large volume data sets (up to thousands of samples) and the addition of samples over time. This latest approach was applied to a diverse collection of 1 600 plant extracts. Several plant species were highlighted for their structural novelty potential. The isolation work done on the extracts of Entandrophragma candollei (Q5834167) and Entandrophragma utile (Q835089) resulted in the isolation of series of novel limonoids and ergostanes.

With the idea of combining the results of bioactivity and structural novelty to obtain innovative bioactive NPs, the set of 1600 plant extracts was screened for a bioactivity in a Wnt triple-negative breast cancer bioassay. The bioactivity results helped to reduce the number of extracts of interest significantly. Inventa’s results were used as supplementary information to further narrow down the selection of extracts by considering only extracts spectrally dissimilar with a low annotation rate, low number of reported compounds, ensuring a high probability of discovering structurally novel compounds. The isolation efforts were focused on the leaf extract of Hymenocardia punctata (Q15514019) and a series of structurally novel bicyclo[3.3.1]non-3-ene-2,9-diones with high bioactive potential were isolated.

The tool and integrative approaches developed in this thesis project could be employed to focus phytochemical investigations on a smaller number of extracts with distinct chemical spaces, resulting in the identification of novel compounds, and potentially significant bioactivities.

  • Natural Products
  • Bioinformatics
  • Isolation
  • Prioritization
Citation (ISO format)
QUIROS GUERRERO, Luis Manuel. Development of a Prioritization Strategy to Efficiently Select Natural Extracts from Large Biodiverse Datasets with a High Structural Novelty Potential. 2023. doi: 10.13097/archive-ouverte/unige:171177
Main files (1)
accessLevelRestrictedaccessLevelPublic 09/04/2025
Secondary files (1)

Technical informations

Creation07/08/2023 2:04:47 PM
First validation09/04/2023 12:14:35 PM
Update time09/04/2023 12:14:35 PM
Status update09/04/2023 12:14:35 PM
Last indexation05/06/2024 4:57:23 PM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack