en
Doctoral thesis
Open access
English

Contributions to the Statistical Analysis of Networks and Graphs

ContributorsMiglioli, Cesareorcid
Imprimatur date2024-02-01
Defense date2024-02-01
Abstract

This thesis contributes to the field of network and graph data analysis. Over the course of three chapters, we propose novel statistical methods that address the challenges of hypothesis testing for network data, approximate graph mining and constructing an interpretable network of predictive variables.

The first chapter introduces a novel class of two-sample Monte Carlo permutation tests for network data. These tests are, under weak assumptions, asymptotically valid and at the same time retain the exact rejection probability α, in finite samples, when the underlying distributions of the two samples are identical. A particular specification of our general testing framework, provides evidence for a different fetal brain functional connectivity in high risk of preterm birth pregnancies compared to low risk pregnancies.

The second chapter displays a novel graph representation that enables simple and fast approximate parallel graph mining with strong theoretical guarantees on work, depth, and result accuracy. The key idea is to represent sets of vertices using "probabilistic set representations" (PSR) such as Bloom filters. These representations are much faster to process than the original vertex sets, thanks to vectorizability and small size. Moreover, we define novel estimators, based on these representations, and derive their concentration inequalities to prove that PSR entail only a minor loss in terms of accuracy. On the applications side, we use PSR as building blocks in important parallel graph mining algorithms such as Clique Counting or Clustering. These enhanced algorithms, significantly outperform tuned parallel exact baselines (up to nearly 50 times on 32 cores) while ensuring accuracy of more than 90% for many graph datasets.

Finally, in the third chapter, we study the role of non-coding micro RNAs (miRNAs) dysregulation in breast cancer with the “Sparse Wrapper AlGorithm” (SWAG). This novel procedure combines screening and wrapper approaches to deliver a low-dimensional network of predictive variables that can easily be interpreted. Moreover, the SWAG increases the potential replicability of results, given the diversity of variable combinations that characterize the network. With this novel algorithm, we analyse the Ahus breast cancer dataset and we discover a set of 112 indistinguishable models each with 4 or 5 miRNAs. Within this set, by comparing the model coefficients, we are able to identify oncogenic, protective and undefined miRNAs which can play both an oncogenic and a protective role according to the network with which they interact. These results may contribute to explain why, in some cases, different studies attribute opposite functions to the same miRNA.

engfre
Citation (ISO format)
MIGLIOLI, Cesare. Contributions to the Statistical Analysis of Networks and Graphs. 2024. doi: 10.13097/archive-ouverte/unige:175512
Main files (1)
Thesis
Secondary files (1)
Identifiers
148views
149downloads

Technical informations

Creation03/07/2024 11:56:09 AM
First validation03/11/2024 9:28:07 AM
Update time03/11/2024 9:28:07 AM
Status update03/11/2024 9:28:07 AM
Last indexation05/06/2024 6:08:06 PM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack