Create groups

Recode variables

Creating group variables from continuous variables

Groups are defined by categorical variables. Frequently it is useful, for instance, to compare infant mortality in countries with low, average and high urbanisation; as urbanisation is a continuous variable we need to break it into a categorical variable with, as an example, three groups.

generate urbcat=autocode(urb,4,0,100)	break urb into four evenly spaced categories from 0 to 100
generate cat1=recode(urb,21,38,64,100)	4 groups (≤ 21, ≤ 38, ≤ 64 and ≤ 100)
xtile urbcat = urb, nquantiles(3)	Three groups with roughly the same number of observations (default 2 groups)
table urbcat, contents(min urb max urb)	Show the min/max of the groups
egen urbcat1 = cut(urb), at(0,34,68,101)	Three groups, based on specified limits

The cut function available in egen lets you specify bin boundaries. In the example:

group	Boundaries (breaks)
1	From 0 up to (but not including) 34
2	From 34 up to (but not including) 68
3	From 68 up to (but not including) 101

Note that if observations are found outside the specified boundaries, egen will generate missing values for them (message displayed).

cut has three useful options: The urbcat generated in the example above has three values corresponding to the lower boundary of the bin, i.e. 0. 34 and 68.

egen urbcat1 = cut(urb), at(0,34,68,101) icodes	urbcat will have values 0,1,2
egen urbcat2 = cut(urb), at(0,34,68,101) label	same, but in addition defines labels "0- ", "34-" and "68-"

A third option group defines groups that contain roughly the same numbers of observations, i.e. groups(3) will create three groups correponding to the thirds of a distribution:

egen urbcat3 = cut(urb), group(3) label

Defining labels for the groups

label urbcat3 "Urbanization in 3 categories"	Label the newly created variable
label define urblab 1 "Low urb." 2 "Average" 3 "High"	Define value label set urblab
label values urbcat3 urblab	Attach labels urblab to variable urbcat3
tab urbcat3	Show frequency tables with the newly defined labels

Related commands