Data frames

Data Frames

R has many different types of data structures, and other structures can be defined for specific uses. For ordinary statistical analysis the data frame (an object of class "data frame" to use R-speak) is the most common and usual one, closest to the idea of a data matrix (standard rectangular data matrices) in other statistical packages, i.e. variables are found in the columns and observations/cases in the rows. It is used to store data tables sharing a number of properties, for R it is a list of vectors of equal length.

Rows and columns have names that can be accessed using the following commands

rownames(world) (row.names(world) (in the example country identifications)
names(world)colnames(world) (variable names)

The example data frame contains two types of variables (vectors):

Numerical variables
Factors (often string variables) continent is a factor. R has two types of factors, factors (nominal) and ordered factors (order). Factors have levels, in this case the names of the continents. levels(continent) displays them. This variable type cannot be used in computations.

Referring to the data in the data matrix (some illustrative examples); an introduction to using the contents of a data frame can be found in this document

Some examples
world	Full contents of the data frame
world[1,2]	Displays the value in the 1st rown and the 2nd column
world[,"urb"]	Displays the column labelled "urb"
world$urb	(same)
world$area1=log(world$area)	adds a new column to the dataframe (log of column "area1).
world["CH","urb"]	Display the value found in row named "CH" (*) and column "urb"
world[world$continent=="Africa","urb"]	Displays contents of urb only for Africa (*)
attach(world)	Makes the names of the rows directly accessible (instead of writing world$urb or world[,"urb"] simply use urb
detach(world)	Names are no longer directly accessible.
levels(continent)=c("Asie", "Afrique", "Europe", "Am.N&C", "AmSud", "AusOcéa")	Replaces the English continent names by their French version

(*) Make sure that you include trailing blanks in the quoted string, if they are present. When you import data to R, e.g. from an SPSS file, the names contains trailing blanks if the string is shorter than the length declared in SPSS; if unsure check the names using rownames(world). In the example data file strings have been trimmed.