Regression: Methods

Regression Methods

Introduction

Methods control the way variables are included into the regression. Quite often you will just want to compute a regression model you have specified, i.e. a dependent variable explained by several independent variables.

Example

or the equivalent syntax

REGRESSSION /Dependent= InfantMortality  
    /ENTER=GDP_PCap UrbanPop illiteracy.

This example uses in fact the ENTER method (default with the menu system).

If you only need to compute regression equations you have specified in advance and want to include all variables, there is no need to read any further.

Including variables into the equation

In the straightforward example above, all variables are introduced in one step (forced) in order of decreasing tolerance. To be included independent variables must pass both tolerance and minimum tolerance tests. is the proportion of the variance of a variable in the equation that is not accounted for by other independent variables in the equation. The minimum tolerance of a variable not yet included in the equation is the smallest tolerance any variable already included in the equation would have, if the variable being considered were included in the analysis. It can happen that some variables are not included, because they do not satisfy the minimum tolerance (defaults to 0.0001).

Sometimes it is useful to control the sequence of inclusion and specify of variables, the first block being considered, before the second etc. and obtain coefficients and statistics for each block.

In the regression dialog when you specify independents, you specify a (model) block. The first model you specify is the first block and Next lets you specify the next block. When blocks are defined the Previous and Next button lets you navigate from one block to the next.

The corresponding syntax is:

REGRESSION
  /DEPENDENT <dependent-variable>
  /METHOD=ENTER <Independents-block1>
  /METHOD=ENTER <independents-block2>.

Specifying regression methods

Besides ENTER, several other methods are available to build models, controlling how variables are included into a model; note also that several methods can be combined. The main goal of this methods is to determine the best subset of variables explaining a dependent variable.

Specify a method: Regression menu

from the Regression dialog you can choose one of the methods shown to the left.

Specify a method: Syntax

/METHOD= | STEPWISE varlist [...] [/...]
         | FORWARD varlist 
         | BACKWARD varlist 
         | ENTER varlist
         | REMOVE varlist 
         | TEST(varlist)(varlist)...

Note that the keyword itself /METHOD is optional.

There are basically two types of methods, methods that handle blocks of variables and methods.

For all methods variables must pass the tolerance criterion to be entered in the equation. The default tolerance level is 0.0001. Note that a variable is not entered if it would cause the tolerance of another variable already in the model to drop below that level.

Methods handling blocks of variables

Enter (default) All independent variables are entered into the equation in (one step), also called "forced entry".
Remove all variables in a block are removed simultaneously
Test [available only with syntax] This method, based on R² change and its significance, starts by adding all specified variables and then, in turn removes each test-subset specified in parentheses. Note that a variable can appear in different subsets.

These methods use only the tolerance criterion.

Stepwise methods

Stepwise methods include or remove one independent variable at each step, based (by default) on the probability of F (p-value); alternatively the F value can be used instead. The limits for the criteria controlling variable inclusion or removal can be specified by defining probabilities for F-to-enter/F-to-remove (or values of F-to-enter/F-to-remove).

If you use the dialog, you can change these criteria in the Options dialog (defaults shown on the screenshot).

If you are using syntax, these values can be set using the /CRITERIA keyword.

Probability of F (default): /CRITERIA can be used to change de default values of the probabilities of F to enter or remove a variable: Defaults are PIN(0.05) and POUT(0.10)
Value of F: specify FIN or FOUT on /CRITERIA; F-to-enter (FIN) has a default value of FIN(3.84); F-to-remove default FOUT(2.71)

The following three stepwise methods are available.

Stepwise Based on the p-value of F (probability of F), SPSS starts by entering the variable with the smallest p-value; at the next step again the variable (from the list of variables not yet in the equation) with the smallest p-value for F and so on. Variables already in the equation are removed if their p-value becomes larger than the default limit due to the inclusion of another variable. The method terminates when no more variables are eligible for inclusion or removal. This methods is based on both probability-to-enter (PIN) and probability to remove (POUT) (or alternatively FIN and FOUT).
Backward Elimination: First all variables are entered into the equation and then sequentially removed. For each step SPSS provides statistics, namely R². At each step, the largest probability of F is removed (if the value is larger than POUT. Alternatively FOUT can be specified as a criterion.
Forward Forward selection: at each step the variable not yet in the equation with the smallest probability pf F is entered. as long as the value is smaller thant PIN. Alternatively you can use the value of F by specifying FIN on /CRITERIA. The procedure stops when there are no variables that meet the entry criterion.

Examples

Regression  /Dependent= InfantMortality   /stepwise=GDP_PCap UrbanPop illiteracy.

Each step adds a variable to equation until all variables specified are entered.

Regression  /criteria pout(0.1) /Dependent= InfantMortality
          /method=backward GDP_PCap UrbanPop illiteracy.

All variables are entered first, then considerd for elimination based on the probability of F-to-remove, specified here as POUT(0.1)