Tools for categorical variables
Data
The examples below use a different data set: Click minarets or
within R
load(url("http://www.unige.ch/ses/sococ/cl/edat/minarets.Rdata"))
- names(minarets) Shows the names of the variables in the data set
- str(minarets) Note that there are several factor variables (categorical variables), one of them is ordered
(PolInter).
- attach(minarets) to make variables names accessible directly
Frequency tables and graphics depicting frequency tables
- table(PolInter) Frequency table showing counts
- xtabs(~gender) Same, but uses formula interface
- freq1<-table(PolInter) Create a table for further use
- prop.table(freq1) Table with proportions
- barplot(freq1,main="Count of Political Interest") Barchart of the table
- mosaicplot(freq1)Mosaic plot of the table
- mosaic(freq1) Mosaic plot of the table (requires packace {vcd}
Crosstabulations
Several functions produce simple crosstabulations (contingency tables);
table, ftable,
xtabs.
Producing tables containing only frequencies (counts):
- table(PolInter,gender) Bivariate table
- ftable(language~PolInter) Same using formula interface
- xtabs(~gender+language) Same using formula interface
- xtabs(~gender+language+vote) Three variables
- ftable(xtabs(~gender+language+vote)) Same, but alternative presentation
- tab1<- table(PolInter,gender) store table as object for further use
- addmargins(tab1) add margins (row and column sums to table)
- margin.table(tab1,1) display row margins
- margin.table(tab1,2) display column margins
Proportions computed from a table of counts:
- tab1<- table(PolInter,gender) store frequency count table as object for further use
- prop.table(tab1) total proportions
- prop.table(tab1,2) proportions columnwise
- prop.table(tab1,1) proportions rowwise
- ftable(prop.table(tab1)) alternative presentation
CrossTable (package {gmodels})
CrossTable produces crosstabs similar to the ones produced in the
past by SPSS or SAS.
By default the table cells show counts, chi-square contributions, row, column and total proportions
(default, SAS) or percentages (SPSS format).
- library(gmodels)
- CrossTable(gender,language) crosstabulations with total, row and column proportions
- CrossTable(gender,language, format="SPSS") crosstabulations with total, row and column percentages
- CrossTable(PolInter,language,digits=1,prop.r=F,prop.t=F,prop.chisq=F,format="SPSS")
Only column percentages with a single decimal digit.
- CrossTable(vote1,language,missing.include=TRUE) Missing values are
included in the table.
- CrossTable(PolInter,language,chisq=TRUE) add chi-square test
- CrossTable(PolInter,language,fisher=TRUE, mcnemar=TRUE) add fisher and McNemar tests
The following values can be displays in the cells (shown with default values)
prop.r=TRUE | row proportions/percentages |
prop.c=TRUE | column proportions |
prop.t=TRUE | total proportions |
prop.chisq=TRUE | contribution to chi-square |
expected=FALSE | expected value |
resid=FALSE | residual |
sresid=FALSE | standardized residual |
asresid=FALSE, | adjusted standardized residual |
Summary statistics: tests and association coefficients
- summary(tab1) Chi square test
- assocstats(tab1) Association coefficients; package {vcd}
- chisq.test(tab1) Chisquare test
- fisher.test(tab1, alternative="greater") Fisher's exact test
For 2x2 tables:
- polinter1 <-recode(PolInter,"c('--','-')='low';c('+','++')='high';else=NA")
recode PolInter into to categories;
requires package {car}
- tab2<-table(polinter1,gender) produce a 2x2 table
- oddsratio(tab2, log=FALSE) {vcd} odds ratios
- summary(oddsratio(tab2)) more odds ratios
More coefficients:
- Packages polycor,
epitools, and {rms
have functions to produce other association coefficients, namely
Polychoric any polyserial correlations, Kendall's tau, γ Somer's D and others.
- See also the vcd package
descr package
The descr package provides similar functions but with some
additional options.
- CrossTable(language, gender) nearly identical to the
CrossTable function described above.
- crosstab(language,gender) wrapper function, produces by default
a mosaic plot; has a weighting option
- freq(language) Frequency table with a barchart
Note for SPSS users: descr has several functions that
help you to read/write SPSS label and missing value commands.
Graphics for categorical variables
- tab1<-table(PolInter,gender) Create a table of counts (object of class table)
- barplot(tab1) Stacked barchart
- barplot(tab1,beside=T) Barchart (not stacked)
- barplot(tab1,horiz=T) Bars are shown horizontally
- dotchart(tab1) Cleveland's dot chart
- mosaicplot(tab1) from the {graphics} package
- mosaic(tab1)in the {vcd} package
- Association plots: assoc()in the {vcd}
See on-line documentation for titles and legends.
You will find most of these graphics in packagesPackages lattice
and ggplot with
many more options and control on how charts are produced and displayed. See the documentation for
details.
Graphics for categorical variables with
lattice
- library(lattice)
- barchart(vote~gender+PolInter, data=minarets)
Graphics for categorical variables with
package
ggplot2
- library(ggplot2)
- ggplot(minarets,aes(x=PolInter)) + geom_bar()
- ggplot(minarets,aes(x=PolInter,fill=gender)) + geom_bar()
- ggplot(na.omit(minarets),aes(x=PolInter,fill=gender)) + geom_bar(position="dodge")
More...
- xtable package: prepare tables for Latex or HTML
Related documents