Recode variables
Creating group variables from continuous variables

Groups are defined by categorical variables. Frequently it is useful, for instance, to compare infant mortality in countries with low, average and high urbanisation; as urbanisation is a continuous variable we need to break it into a categorical variable with, as an example, three groups.

generate urbcat=autocode(urb,4,0,100) break urb into four evenly spaced categories from 0 to 100
generate cat1=recode(urb,21,38,64,100) 4 groups (≤ 21, ≤ 38, ≤ 64 and ≤ 100)
xtile urbcat = urb, nquantiles(3) Three groups with roughly the same number of observations (default 2 groups)
table urbcat, contents(min urb max urb) Show the min/max of the groups
egen urbcat1 = cut(urb), at(0,34,68,101)Three groups, based on specified limits

The cut function available in egen lets you specify bin boundaries. In the example:

groupBoundaries (breaks)
1From 0 up to (but not including) 34
2From 34 up to (but not including) 68
3From 68 up to (but not including) 101

Note that if observations are found outside the specified boundaries, egen will generate missing values for them (message displayed).

cut has three useful options: The urbcat generated in the example above has three values corresponding to the lower boundary of the bin, i.e. 0. 34 and 68.

egen urbcat1 = cut(urb), at(0,34,68,101) icodesurbcat will have values 0,1,2
egen urbcat2 = cut(urb), at(0,34,68,101) labelsame, but in addition defines labels "0- ", "34-" and "68-"

A third option group defines groups that contain roughly the same numbers of observations, i.e. groups(3) will create three groups correponding to the thirds of a distribution:

egen urbcat3 = cut(urb), group(3) label
Defining labels for the groups
label urbcat3 "Urbanization in 3 categories" Label the newly created variable
label define urblab 1 "Low urb." 2 "Average" 3 "High" Define value label set urblab
label values urbcat3 urblab Attach labels urblab to variable urbcat3
tab urbcat3 Show frequency tables with the newly defined labels
Related commands