Print this page

Fast Robust Model Selection

Abstract

Large datasets upon which classical statistical analysis cannot be performed because of the curse of dimensionality have become more and more common in many research fields. In particular, in the linear regression context, it is often the case that a huge number of potential covariates are available to explain a response variable, and the first step of a reasonable statistical analysis is to reduce the number of these covariates using appropriate statistical criteria. Alternative fast methods that alleviate the problem of computational time with classical procedures, have been recently proposed in the literature. However, these methods are based on classical statistical theory and are non robust to extreme observations. Simply replacing the classical statistical criteria by robust ones in the fast methods is not possible because of the complexity of the robust estimators and testing procedures, leading to infeasible computations. Instead, we propose alternative robust estimators and testing procedures for the linear regression model that are fast to compute and hence can be used in a fast search method for model selection. The robust estimator is a one-step $MM$-estimator. It can be biased if the covariates are not orthogonal, however we show that the bias is relatively small and can be made smaller by iterating the $MM$-estimator one or more steps further. In the variable selection process, we propose a simplified robust criteria based on a robust $t$-statistic for significance. We propose a complete algorithm for fast robust model selection, including considerations for huge sample sizes, and show in an extensive simulation study the performance of our method. We also analyse two datasets and show that the results obtained by our method outperform other ones