UNIGE document Report
previous document  unige:8591  next document
add to browser collection

Fast Robust Model Selection in Large Datasets

Dupuis, Debbie
Publication Montreal, 2010
Collection Cahiers du GERAD
Abstract Large datasets upon which classical statistical analysis cannot be performed because of the curse of dimensionality are more and more common in many research fields. In particular, in the linear regression context, it is often the case that a huge number of potential covariates are available to explain a response variable, and the first step of a reasonable statistical analysis is to reduce the number of covariates using appropriate statistical criteria. Alternative fast methods that alleviate the problem of computational time with classical procedures have been recently proposed in the literature. However, these fast methods are based on classical statistical theory and are non robust to extreme observations. And, simply replacing the classical statistical criteria by robust ones is not possible because the complexity of the robust estimators and the testing procedures lead to infeasible computations. In this paper, we propose alternative robust estimators, selection criteria and testing procedures for the linear regression model that are fast to compute and hence can be used in a fast model selection procedure. The robust estimator is a one-step weighted M-estimator that can be biased if the covariates are not orthogonal. We show that the bias is relatively small and can be made smaller by iterating the M-estimator one or more steps further. In the variable selection process, we propose a simplified robust criterion based on a robust t-statistic for significance. We propose a complete algorithm for fast robust model selection, including considerations for huge sample sizes, and show the performance of our method in a simulation study. We also analyze two datasets and show that the results obtained by our method outperform those from robust LARS and random forests.
Keywords Linear regressionMulticollinearityM-estimatorRobust t-testPartial correlationLARSRandom forests
Full text
Report (234 Kb) - public document Free access
(ISO format)
DUPUIS, Debbie, VICTORIA-FESER, Maria-Pia. Fast Robust Model Selection in Large Datasets. 2010 https://archive-ouverte.unige.ch/unige:8591

1298 hits



Deposited on : 2010-07-11

Export document
Format :
Citation style :