Apply a function to several variables or groups

It is frequently useful to produce the same charts or analyses for a series of variables or groups of observations defined by a categorical variable (factor). Some functions have options to do this, but many others do not.

Here we will use the example of the stem() function producing a stem and leaf plot for a single variable, i.e. the function offers no possibility to produce a stemplot for several variables at the same time.

Produce a stemplor for several variables

Assume that we need to produce stemleaves for all variables (columns) in the data frame world. To do this we can use the the R function apply(data,dim,function) wheredata corresponds to the data matrix, dim tells R to what dimension of the data matrix the specified function is to be applied. As we want to apply the function to each column of the data matrix, dim will be 2.

There is however a small problem, we cannot write apply(world,2,stem), because the data frame contains a non-numerical variable (continent); therefore we need to make sure to include only numerical variables. apply(world[,2:23],2,stem) will do the job.

Produce a stem-and-leaf plot for all groups

Assume that we would like to produce a stem-and-leaf plot of urb for each continent.

by(urb,continent,stem)
will do exactly this, i.e. the stem function is applied to each group defined by the continent variable.

Graphical functions

If you use a graphical function with apply or by, e.g. the hist function, you should prepare a graphical window that can receive serveral charts, otherwise you will only see the last graph. [par(mfrow=c(3,3)) opens a window that can receive 9 graphics, 3 rows, 3 colums.].

(More advanced) A practical example

Assume that we would like to compute the difference between the median and the mean as a simple measure of symmetry for all variables in a data matrix. As apply(x,2,function) expects the name of a function, we need to define our own function first. function(x) median(x)-mean(x). The we can either assign it to a name:

medmean =  function(x) median(x)-mean(x)
apply(world[,3:23],2,medmean)

or add it directly as an argument

apply(world[,3:23],2,function(x) median(x) - mean(x))

This solution is very general and can be applied to functions of any complexity; to simply compute the difference between the mean and the median the following command will also do the job:

apply(world[,3:23],2,median)-apply(world[,3:23],2,mean)