Transformations/Reexpressions

New variables are computed using simple expressions of the kind:

newvar <-var1+var2 | Add two variables |

newvar1<-log(var4) | Take the log of var4 |

newvar2 <- ifelse(var>20,log(var4),var4) | if var is > 20, take the log of var4 otherwise copy it simply. |

R being a programming environment for statistics, based on functions, any imaginable transformation is easily available to create transformed variables or to supply transformed variables directly to a function.

plot(log(x),sqrt(y)) boxplot(log(x))

Transform groups

It is often useful to apply a different transformation to each group (defined by a factor) of a variable, functionddply (package plyr) simplifies this greatly. For instance to w1 and adds urb1: from urbanization of each country in a continent subtract the median of the continent.

w1<-ddply(world,"continent",transform, urb1=urb-median(urb)) | Copies data frame world |

w1 <-ddply(w1,"continent",transform,urb2=urb1/IQR(urb)) | Creates urb2 by dividing each value of urb1 by the Interquartile range of the continent the country belongs to. |

ddplyrequires a data frame as first argument, i.e. cannot be used to transform a single variable (single vector). After the two command lines data frame w1 contains all variables from world plus two new columns urb1 (urb (normalized), i.e. from each country value its continent median has been removed. urb2: each country of urb1 is divided by its continent's Interquartile range, i.e. urb2 is the standardized version of the original urb: centered around the median and divided by the IQR of the continent.

See also:

- Re-expressions (Tukey's ladder of powers)
- Creating groups (examples for recoding etc)
- Create dummy variables