Print this page

Multivariate outlier detection and imputation in survey data

Abstract

Detection of multivariate outliers in survey data with missing values has been treated in the literature and some experience with applications exists. Many of these methods are based on the Mahalanobis distance. After the detection of outliers and influential observations, these suspicious observations may be revised interactively and/or an imputation considering their special status may be carried out. Multivariate robust imputation has not been extensively discussed yet.
Some strategies of detecting multivariate outliers and imputation of outliers are discussed. The Epidemic Algorithm is based on data-depth. It may be used for detection only or the Epidemic may be run backwards to impute missing values and/or outlying observations. The TRC algorithm and the BACON-EEM algorithm are based on the Mahalanobis distance and can be combined with a robust multivariate linear imputation under the assumption of a multivariate normal distribution of the bulk of the data. Special attention must be paid to the zero inflated distribution of income components. The methods are tested with the Public Use Data set of the Austrian SILC survey for 2004.