Linear regression

Regression being of central importance in statistics, R offers a large number of conventional and modern regression methods. See the Tukey Line as a general introduction to these techniques, as well as the general document on the analysis of Residuals.

lsfit()

lsfit(x,y): Classical least squares regression. The function has been around since
the beginning of S; for simple regressions it is still sufficient; more recent procedures are more powerful as
they offer many more options.

y is the independent variable (a single vector) and x
a matrix containing the independent variables.

lm()

lm() ("Linear Model") is more general and more modern and has a formula interface, letting you write simple and complex regression equations.

lm(infmor ~ urb, data=world) | Simple bivariate regression |

lm(infmor ~ urb + gnpserv, data=world) | Two independents |

lm(infmor ~ urb + gnpserv+urb*gnpserv, data=world) | Include an interaction term |

lm(infmor ~ urb + gnpserv+urb+continent, data=world) | Include a factor variable, see note |

The formula interface lets you write the equations quite naturally, including - as in the third example - the inclusion of interaction terms. (help(lm) for more details.

Regression with dummy variables

There is no need to create dummy variables if you wish to include a categorical variable (factor), R automatically generates them. In the formula infmor ~ urb + gnpserv+urb+continent continent is a factor variable. levels(continent) will show you that there are 6 continents; if you use the factor in a regression, R will automaticall create a dummy for each continent, exept the first, in this case Asia will be the reference category. On the output you will see names like continEurope ,

Have a look at "Create dummy variables" if you need to create your own dummies.

Standardized regression coefficients

lm() has no option to produce beta coefficients. you will have to standardize your variables, i.e. you might write:

lm(infmor ~ scale(urb) + scale(gnpserv), data=world) | Standardize using scale() |

Regression diagnostics, residual analysis

- Regression and residuals Simple residual analysis
- Regression diagnostics, residual analysis (namely lm methods for further inspection and diagnosis of the regression.
- Analysis of Residuals and regression diagnostics

Outlook

R is now the standard platform for the publication new algorithms; in recent years robustness and robust regression have been an important research them; therefore the number of new, alternative regression methods, for instance the packages lasso, leaps, lqs, locfit, lpridge, modreg, polymars, quantreg to name a few. The MNM package Nordhausen/Oja,Multivariate L1 Methods. has also alternative methods.

See also

- Resistant Line (Tukey line)
- Residuals