en
Scientific article
Open access
English

Robust VIF Regression with Application to Variable Selection in Large Datasets

Published inThe annals of applied statistics, vol. 7, p. 319-341
Publication date2013
Abstract

The sophisticated and automated means of data collection used by an increasing number of institutions and companies leads to extremely large datasets. Subset selection in regression is essential when a huge number of covariates can potentially explain a response variable of interest. The recent statistical literature has seen an emergence of new selection methods that provide some type of compromise between implementation (computational speed) and statistical optimality (e.g. prediction error minimization). Global methods such as Mallows' Cp have been supplanted by sequential methods such as stepwise regression. More recently, streamwise regression, faster than the former, has emerged. A recently proposed streamwise regression approach based on the variance inflation factor (VIF) is promising but its least-squares based implementation makes it susceptible to the outliers inevitable in such large datasets. This lack of robustness can lead to poor and suboptimal feature selection. In our case, we seek to predict an individual's educational attainment using economic and demographic variables. We show how classical VIF performs this task poorly and a robust procedure is necessary for policy makers. This article proposes a robust VIF regression, based on fast robust estimators, that inherits all the good properties of classical VIF in the absence of outliers, but also continues to perform well in their presence where the classical approach fails.

Keywords
  • Variable selection
  • Linear regression
  • Multicollinearity
  • M-estimator
  • College data
Citation (ISO format)
DUPUIS, Debbie J., VICTORIA-FESER, Maria-Pia. Robust VIF Regression with Application to Variable Selection in Large Datasets. In: The annals of applied statistics, 2013, vol. 7, p. 319–341. doi: 10.1214/12-AOAS584
Main files (1)
Article (Accepted version)
accessLevelPublic
Identifiers
ISSN of the journal1932-6157
1101views
334downloads

Technical informations

Creation08/02/2012 3:19:00 PM
First validation08/02/2012 3:19:00 PM
Update time03/14/2023 5:39:32 PM
Status update03/14/2023 5:39:32 PM
Last indexation01/16/2024 12:08:17 AM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack