Dummy variables in Stata

Dummy (logical) variables in Stata take values of 0, 1 and missing.

Indicator variables in variable lists

The most common use of dummy variables is in modelling, for instance using regression (we will use this as a general example below). For this use you do not need to create dummy variables as the variable list of any command can contain factors and operators based on factors generating indicator (dummy) variables.

When you are generating indicator variables (dummy variables, contrasts) from a categorical variables like the continent variable, you need to omit one of the categories (base or reference categories). In all regression examples below one of the continents will be omitted, i.e. in the regression you will find 5 out of the six continents. By default the first (smallest) value will be used as reference category; there is a ib operator to indicate other base values

regress infmor urb i.continent5 indicator variables, the first continent is the base category
regress infmor urb ib2.continentcontinent 2 is the base
regress infmor urb ib(first).continentFirst continent is the base, same as i.continent
regress infmor urb ib(last).continentLast continent is base
regress infmor urb ib(freq).continentThe continent with the highest frequency count is base

If you wish to contrast a specific continent, e.g. Asia against all others your can wite (both forms are equivalent)

regress infmor 1.continent
regress infmor i1.continent

See the documentation for further variations

Use the generate command

Generate a dummy variable: Countries below 50% of urbanization=0, above 50=1

generate urbdum = 0 
replace urbdum= 1 if urb>50

Or shorter

generate urbdum= (urb>50)

generate urbdum= (urb>50) produces the variable as when urb>50 is true Stata produces a value or 1 (for true) and 0 otherwise (=false).

There is however a problem with this when you have missing values in the variable. Stata stores missing values as positive infinity, i.e. a very large positive value, i.e. a value of 1. If you wish to avoid this, you need to treat missing values specifically, namely

generate urbdum=0 
replace urbdum=1 if urb>50
replace urbdum= . if missing(urb)

or

generate urbdum1= urb>50 if !missing(urb) 

Examples showing how to create a dummy variable from a categorical variable, continent here:

generate Asia=continent==1
generate America=continent==4 | continent==5

This creates a variable with value of 1 if the condition is true and 0 if the condition is false. (In Stata logical values are represented by 0/1 (false/true).

Use tabulate

The tabulate command has an option to generate automatically dummy variables from a categorical variable:

tabulate continent, generate(cont)

Produces the variables shown to the left.

Using the reode command

recode can also be used as shown here:

recode v3 (min/20=1 Rich ) (else=0 Not_rich) , generate (d6) label(Dummy_richcountry)	

Labels can be specified directly. The above example creates a new variable d6 from v3, values below 20 will be set to 1 (labelled "Rich" in the new variable), 0 otherwiese

Using factor/dummy variables outside modelling

As dummy variables are logical variables you can use them with if to simplify the use of filters. Assuming that you have created: generate America=continent==4 | continent==5 you can simply write

list urb infmor country if America

To list only american countries.

Related documents