This document discusses the most important family of transformations used in statistics to promote symmetry and linearity, Tukey's :(a particular form of the Box-Cox transformations).
Power | Algebraic | Expression | |
---|---|---|---|
↑ | 3 | X3 | x^3 |
↑ | 2 | X2 | x^2 |
… | 1 | X1 | x |
↓ | ½ | √x | x^0.5 or sqrt(X) |
↓ | 0 | log(x) or ln(x) | log10(x) or log(x) |
↓ | -½ | -1/√x | -1/sqrt(x) |
↓ | -1 | -1/x (Reciprocal) | -1/X |
↓ | -2 | -1/x2 | 1/x^2) |
Lower and higher, as well as intermediate powers are consistent with the ladder power and the properties of the transformations.
R being a programming language, the transformations can be used readily everywhere as arguments to functions or to create derived variables.
Package carcontains many function for finding and graphically representing transformations.
symbox(infmor, powers=c(3,2,1,0.5,0,-0.5,-1,-2))
shows the boxplots for the common
powers of the ladder of powers; by default (omitting the powers argument, only
powers -1, -0.5, 0, 0.5, 1 are shown.
Instead of the box-cox family of power transformations, you can get Yeo-Johnson power transformations symbox(infmor, powers=c(3,2,1,0.5,0,-0.5,-1,-2), trans=yjPower)
For variables with zero or negative values you might need to add a constant using the start= option
The package has many power transformation oriented tools, many non-graphical ones, for instance powerTransform that gives estimates of the transformation parameters, using maximum likelihood, to obtain multidimensional normality. Works also for single variables.
This package offers, among many other function, a ladder() function displaying a scatterplot matrix showing all transformations using the common powers of the ladder of powers.
require(HH) ladder(urb~infmor, data=world)
All values must be positive and non-missing; for an unknown reason, if data= is not present, an error message is issued.
Using one of the ancillary functions of the HH package you can also construct a chart that contains boxplots of a variable and its re-expressions.
require(HH) par(mfrow=c(1,6)) apply(ladder.f(urb),2,boxplot)
These are in reality six separated boxplots showing form left to right powers of -1, -0.5, 0, 0.5, 1 et 2. The use of the apply() function does not allow for adding legends and a global title. Some more programming effort is needed to build a nicely documented sequence of boxplots or by the way any other chart or summary statistic.
The LearnEDA package has several functions to work with reexpressions two function to simplify the reexpression of variables (taking care of things like negative numbers and rescaling:
It also has a spread.level.plot function
The LearnEDA has several functions to transform a variable or a relationship interactively, using a slider controlling the power, namely
slider.straighten(infmor, urb) | Straighten a relationship (shows a scatterplot with a line, the transformation power, as well as a residual plot |
slider.compare(infmor,continent) | Groupwise boxplot |
slider.power | Histogram with a single variable |
slider.match | Same, but uses matched reexpressions |