Outliers

This document explains how outliers are defined in the Exploratory Data Analysis (ED) framework (John Tukey). In a more classical setting, outliers are often defined as being values outside an interval of c units of standard deviations around the mean (often 2 or 3 standard devations)

Some introductory comments
Defining outliers (EDA framwork)

Are considered outliers all observations outside the inner fences, extreme outliers are outside the outer fences; observations between the inner and outer fences are often called "out" values, those outside the outer fences "far out" or sometimes "extremes" or "extreme outliers" and have, on a boxplot, a different marker.

step is defined as m×IQR (Interquartile range); m is often 1.5, sometimes 2 or 2.5.

The interquartile range is defined as

IQR =H3-H1

Where H3 and H1 are the third (Q3)and first quartile (Q1)defined according to Tukey's hinges.

Outliers
H3+step ≤ xi < Q3+1×step
H1-1.5×step < xi ≤ Q1+1×step
Extreme outlier
xi ≥ H3+2×step
xi ≤ H1-2×step
Related documents