Boxplots (box and whisker plots)

Below you will find a series of examples showing how to produce boxplots, using the boxplot() function.

boxplot()

- boxplot(urb) Simple boxplot for a single variable
- boxplot(urb, notch=TRUE) "notched boxplot", marking the confidence interval for the mean. Make sure to use uppercase letters for TRUE!
- boxplot(urb, range = 2) range (default 1.5) defines where to place the inner fences, i.e. to define the outliers. 1.5 corresponds to 1.5 interquartile ranges. If you set range=0 the whiskers will extend to the minimum and maximum (no outliers possible).
- boxplot(urb, horizontal=TRUE) Draw the boxplot horizontally.
- boxplot(world[,20:22]) produces a parallel boxplot for columns 20 to 22 of our data frame (the three economic sectors); alternatively we could also write: boxplot(world[,c("gnpagr","gnpind","gnpserv")]). Of course you could also execute boxplot(world), but the resulting chart will be quite messy, as the variables have very different scales..
- boxplot(infmor ~continent) Boxplot by continent. boxplot() lets you specify a formula as argument, specifying the data. Note that if you wish to transform the variable before, you can write: boxplot(log10(infmor) ~continent)
- boxplot(infmor ~continent, varwidth=TRUE) produces a boxplot by continent as well, but the width of the boxes will be proportional to the square root of the numbers of observations in each continent. Note that - generally speaking - you could also supply a vector of values, containing the width for each box, based on a criterion you have defined previously.

boxplot() and the functions boxplot.stats and bxp it calls, allow -if you are able to program in R - to obtain any kind of boxplot you wish to produce. If you are curious, just examine the documentation and examples for these functions..

boxplot() does not identify outliers, but it is quite easy to program, as boxplot.stats() supplies a list of outliers..

Adding elements to a boxplot

You can add a density plot (barcode plot) to the boxplot.

boxplot(urb,horizontal=TRUE) rug(urb)

Identify outliers interactively

The identify() function can be used to interactively identify observations on a boxplot using the mouse. As the function, most useful for scatterplots (see this document)

, requires coordinates in the x and y direction, the example below creates a simple sequence variable: rep(1,length(area)) (1,2,3 ... up to the number of observations in the variable).boxplot(area, ylab="Area of the country") identify(rep(1, length(area)), super, rownames(worldl))

To stop the identification, use the

button or the context menu.Boxplot() in package car

Boxplot() (Uppercase B !) built on the base boxplot() function but has more options, specifically the possibility to label outliers.

- Boxplot(gnpind, data=world,labels=rownames(world)) identifies outliers, the labels are taking from world (the rownames are country abbreviations).
- Boxplot(gnpind, data=world,labels=rownames(world),id.method="identify") lets you identify outliers on the graph, using the mouse (click on an outlier to show its label).
- Boxplot(area~continent, data=world,labels=rownames(world)) Boxplot by continent.

The Boxplot functions returns the list of outliers as a result, however by default only 10 outliers are shown (in the example below the id.n=Inf has been added to show all outliers (Inf=Infinity

> Boxplot(area, data=world,labels=rownames(world),id.n=Inf) [1] "ALGE" "ARAB" "ARG" "AUS" "BRES" "CAN" "CHIN" "USA" "GROE" "INDE" [11] "INDO" "IRAN" "LIBY" "MEXI" "MONG" "NIGR" "PERO" "SOUD" "TCHA" "ZAIR"

You can take advantage of this to analyze these outliers further to for instance show the values of urb for these outlying countries, or display the full data frame for these countries.

outarea=Boxplot(area, labels=rownames(world),id.n=Inf) world[outarea,"urb"] world[outarea,]

Boxplots from ggplot

A few examples, assuming that you are familiar with ggplot

p<=ggplot(world,aes(x=continent,y=infmor)) | Start by creating an object with data (by continent) |

p + geom_boxplot() | Add the boxplot layer |

p + geom_boxplot(outlier.size=2,outlier.shape=21,width=0.5) | same but change outlier size and shape and box width |

p + geom_boxplot(notch=TRUE) | same, but notched boxplots |

p+geom_boxplot+stat_summary(fun.y="mean",geom="point",shape=23,size=3,fill="blue") | add mean diamonds to the boxplot |

ggplot(world,aes(x=1,y=infmor)) + geom_boxplot() | Simple boxplot. As geom_boxplot requires a x variable (factor). a constant value forces a single boxplot of y |

p+geom_violin() | Violin plot |

Note that with ggplot you cannot (currently) label the outliers. You could create a label variable replacing all outliers with NAs and then add geom_text(label=labelvar).

See also

- bwplot() in library lattice
- Boxplot variations
- Creating groups Learn to create groups (bins) from a continuous variables, for instance to produce a categorical variable for urbanization (for instance Low, average and high urbanization), a variable that will let you produce a boxplot of infant mortality for the three levels of urbanization.
- plot(continent,urb) also displays a boxplot by continent (plot is a generic function, that produces a boxplot, because continent is a factor (categorical variable).
- compmeans {descr } Means of a numerical vector according to a factor with boxplots