>Multidimensional analysis

Introduction

Overview

This section describes all EDA commands pertaining to multidimensional analysis. Several important concepts are added to EDA; their understanding is essential; therefore read carefully the introductory section.

Note that several of the commands described here are useful in other contexts, namely all commands computing coefficients into MATRIX may be used in connection with the VHIERARCHY command.

The following commands are documented in this section:

    ANACOR      correspondence analysis (Benzecri)
    BASSOC      compute binary association measures
    C1          analyse and manipulate the C1 matrix
    C2          analyse and manipulate the C2 matrix
    CANONICAL   Canonical analysis
    CFIX        fit two configurations
    CFIT        configuration comparison
    CONFIGUR    configuration manipulation
    CORRELATE   compute a correlation matrix
    DISTANCE    compute distance matrix
    FACTOR      Factor analysis
    MDS         multidimensional scaling (non-metric)
    MINISSA     smallest space analysis
    ROTATE      configuration rotation
    SCORES      factor scoring
    TSCALE      Torgerson scale

MATRIX, C1 and C2

The multidimensional commands define and use several additional matrices, namely MATRIX, C1 and C2. These matrices are separate from the work area and are manipulated with a specific set of commands. They also may be referenced in arithmetic expressions, and other commands have options using these matrices.

MATRIX: contains the dissimilarity or similarity measures used with the dimensional techniques and usually computed on variables from the WA.

C1 (configuration 1) contains the first result matrix from a dimensional analysis and usually defines the variables space.

C2 (configuration 2) contains the second result matrix from a dimensional analysis and is usually related to the observation's space.

Let us consider principal component analysis (default operation):

   compute correlations on the variables
   perform a eigenvalue/eigenvector decomposition
   compute factor loadings
   compute factor scores
   interpret results

This sequence which is similar to most other dimensional analyses translates into EDA as follows:

Principal components is invoked with the FACTOR command:

(1) Then EDA picks the variables from the WA as specified (see below) and computes correlations and stores them into MATRIX.

(1a) Examine the correlation matrix using the MATRIX command.

(2) Eigenvalue/vector analysis is performed and diagnostic information shown; the user then selects the number of dimensions.

(3) Compute factor loadings and store them into C1.

(4) Compute factor scores and store them into C2.

(5) Examine (list, plot etc.) C1 and C2 using the C1/C2 commands.

(6) Use further commands to study the results, e.g. perform a hierarchical cluster analysis on C2 (factor scores).

Note that is the default way of doing things; many options are provided for the more sophisticated user in order to modify these defaults, e.g. FACTOR may be directed to pick the values it finds in MATRIX, instead of computing correlations.

As in other instances (CASID, GVAR) C1, C2 and MATRIX designate a program concept (matrices here), as well as the commands used to manipulate them. C1, C2 manipulate the configuration matrices, and MATRIX may be used to perform operations on MATRIX.

Furthermore if the specialized facilities for C1, C2 and MATRIX are not sufficient it is always possible to load these matrices into the work area and treat the data the same way as any other variable.

If you are an advanced user you might even write some macro command using directly the configurations produced by FACTOR.

Variable lists

With the commands mentioned below EDA works differently with respect to variables in the WA, depending upon the setting of the ALLVARS switch (See the SET command). Default is ALLVARS ON.

Let us take the FACTOR command as an example:

FACTOR FACTOR vlist

ALLVARS ON all variables in WA variables in list

ALLVARS OFF previous list variables in list

When specifying a variable list on the command line the variables in the list are analyzed, no matter what how ALLVARS is set. However if no list is present, the behaviour is different. With ALLVARS ON all variables in the WA are taken (of course the WA then has to be rectangular); with ALLVARS set to OFF the previous list is used, i.e. EDA works the same way as with the descriptive commands.

Analysis commands

ANACOR

  ANACOR  [vlist] [NDIM=ndim] [EIGEN{=v#}] [NOSCORE]
                  [CWEIGHT] [VWEIGHT]
                  [CDIST2] [VDIST2]

  Subcommands:

  ANACOR  [vlist]  SUPVARS

  ANACOR  d1,d2    PLOT | [BOTH]    |  <opt>
                        | CASES     |
                        | VARIABLES |

  ANACOR  VCONTtributions  <copt>
  ANACOR  CCONTtributions  <copt>

     <copt>  :==  [RELATIVE] [CONFIG | COPY_to_WA]

  ANACOR NEW

<opt> -->  see C1/C2 PLOT for details

Performs a simple correspondence analysis (Benzecri) on the whole WA or on the variables in the vlist (the command is ALLVARS sensitive). Anacor allows for simultaneous representation of cases and variables in the same space using a chi-square distance (stored into MATRIX) and produces a C1 (variables) and a C2 (observations) configuration.

Whenever a row or a column of the matrix analyzed has a sum of zero, a message is issued, telling which row, resp. column is concerned and that the sum has been replaced by a very small value.

The basic workings of this commands is analogous to the to the FACTOR command, i.e. the main command performs the initial computational tasks and produces the requested configurations. Further information can be requested with sub-commands, i.e. commands issued immediately after the main command.

The NOSCORES option inhibits the creation of the C2 matrix (individual scores); in this case the subcommands are disabled. The NOSCORE option is useful when comparing configurations and you do not wish to overwrite the previous C2 matrix.

NDIM=nd lets you specify the number of dimensions to compute (if this option is not present you will be asked).

Subcommands

PLOT lets you plot selection dimensions: CASES plots observations only, VARIABLES variables and BOTH produces a simultaneous representation of variables and observations. Note that the same plots can be produced with the C1 PLOT or C2 PLOT commands. (Simultaneous plots are produced with the C1/C2 DPLOT command).

SUPVARS adds variables to the current configuration. (projection of variables not used in the computation into the computed configuration). Supplementary variables added appear on lists and plots with a + sign in the first label position.

The NEW option allows to initiate a new ANACOR analysis, when the program is expecting a subcommand. If you type ANACOR without a subcommand immedately after another ANACOR command, EDA issues an error message. This to prevent that the auto-repeat mechanism starts the whole computation process all over again, whenever a subcommand is mistyped.

V/CCONTRIBUTION produce absolute and relative contributions for the variables (VCONT) or the cases (CCONT). If RELATIVE is not present, absolute contributions are produced. Default is to display a table; options are provided to copy the result either into the WA as variables (as many as dimensions): COPY or into C1 (variables) or C2 (cases): CONFIG. This last possibility destroys the coordinates; it is only provided for refined analysis with the C1/C2 LIST command. For these reasons subcommands using a modified C1 or C2 are disabled whenever the CONFIG option has been used. Therefore if you intend to use several sub-commands use CONFIG last.

The names of the variables generated with the COPY option are as follows:

aaCbbDnn

where aa = Ab for absolute contribution Rl for relative contribution bb = Vr for variable space Cs for observation space nn = the number of the dimension

Therefore AbCCsD02 for example means Absolute contributions of the cases to the second dimension.

Advanced options (*)

EIGEN=var# copies the eigenvalues to a variable.

CWEIGHTS and VWEIGHTS copy the weights of the cases and variables into a variable in the WA. This is useful for tricky computations with the results. [These weights are the relative marginal frequencies].

VDIST2 or CDIST2 copy the distances of each variable (case) to the gravity center of the variable (case) space. Like the CWEIGHT/VWEIGHT these options are used with macros of special analysis needs.

Reference

Lebart 1973. Credit: Initial program built on Lebart 1973, heavily modified.

CANONICAL

CANONICAL vlist1&vlist2 [<corr>] [SCORES={var}]

CANONICAL  NOCOMPUTE [NCASE=nc]

Performs a canonical correlation analysis.

There are two modes of operation: (1) The default mode computes first a Pearsonian correlation matrix (or as specified by <corr>; see CORRELATE for a full explanation) from the variables specified and stores it in MATRIX. This mode requires two variable lists, i.e. the first list corresponds to the variables in the left set and the second list to the variables in the right set. Specify these variables in the vlist field and separate the two lists with the "&" symbol.

(2) The NOCOMPUTE option inhibits the automatic calculation of a MATRIX and takes up the stored MATRIX. The NCAS= option is required if the number of cases on which the MATRIX is based is not known. This is only the case when a user defined MATRIX has been stored (.e.g. using the MATRIX command or direct computations).

CANON then requests the variables in for left set and in the right set, which are obtained from the user as two separate lists. You then should enter the numbers of the variables (i.e. the position they occupy in MATRIX). It is not possible to specify variable names.

Besides the displayed results for the left set is stored in C1 and the configuration of the right set in C2 (canonical correlations). Note that this command destroys the initially computed MATRIX.

The SCORES option computes individual scores and copies them as new variables into the WA. If SCORES is used the new variables are stored in free locations of the WA; SCORES=var# directs EDA to copy them into variable locations starting with variable number var#. The generate variable names are of the form csorl<xxx> and csorr<xxx> for the scores for the left set, resp. the right set and where <xxx> is a number indicating the number of the factor. Furthermore the scores for each set are tied to a different list; the number of the list is reported.

SCORES is not available with the NOCOMPUTE option. Limitations: CANON is internally limited to analyzed sets of up to NVAR/2 in size each.

Reference

Cooley & Lohnes 1971. Credit: Adapted from Cooley & Lohnes by Dominique Joye.

FACTOR

  FACTOR [vlist <corr> | NOCOMP] [NF=nfac] [NOSCORES]
                 [MAXROW] [EIGEN{=v#}] [NNORM]
                 [GABRIEL {EUCLID} {VSCORES}]

   Sub-commands:

  FACTOR                | VSCORE C1 |  *

  FACTOR dim1,dim2 PLOT | [BOTH]  | <opt> *
                        | [CASES]   |
                        |VARIABLES  | *

 <opt>   see C1/C2 PLOT for details


  FACTOR  NEW


     *) applies to GABRIEL option only

Factor analysis procedure computing a factor analysis for the variables in the WA (whole or vlist, depending on the setting of the ALLVARS mode) or the MATRIX stored using the MATRIX STORE command. The factor loadings are stored in C1, the scores in C2. <corr> are options controlling the computation of the matrix to be analyzed. For a complete description of <corr> refer to the CORRELATE command; all options apply here also.

Number of factors

NFACT=number_of_factors: Number of factors desired. If N is not present the user decides the number of factors (s)he desires, after the display of the eigenvalues.

EIGEN

EIGEN=var#: Copies all eigenvalues into a variable. This option may be used in two ways: either you specify a target variable, or you let the program search itself for an empty location. Note that the second form overwrites the previous variable labeled 'eigen', therefore you should rename the variable if you intend it beyond a second copy of an eigen-variable (this can occur from other commands, like TSCALE or ANACOR).

MAXROW

MAXROW replaces the diagonal of the correlation matrix by the maximum correlation in a row (communality estimates) in order to perform a common factor analysis. Other elements can be placed on the diagonal using the MATRIX DIAGONAL command or an expression with the MATRIX area as target.

NNORM

inhibits normalizing of the factor scores on unit length (eigenvalues). This option is not available for GABRIEL or any option where scores are not computed.

NOCOMPUTE

inhibits automatic computing of the correlation matrix, if the user performs different analyses on the same matrix, using the matrix subsetting facilities etc., or if a matrix is stored using the MATRIX command or the MUTILITY tool. Note that with this option the factor scores cannot be computed and in some instances the factor loadings cannot be correctly normalized (a message is then given, indicating that case. As long as the measures used in MATRIX are normalized no problem occurs, but in other examples, say you have computed a variance covariance MATRIX with the MUTIL tool the factor loadings should be divided by the standard deviation of each variable as these are not available in this case (or any case where you store some non-normalized measure in MATRIX) the factor loadings will not be as you want them. In such a case a message is given and you might divide the factor loadings (i.e. the C1 matrix by the appropriate unit), you may do so using expressions using C1 as target (i.e. C[] type target expressions).

GABRIEL

GABRIEL is a special factorization for computing scores for cases and variables in the same space, allowing for a simultaneous representation (biplot). EUCL uses euclidean distances rather than Mahalanobis distances (default) for the projections (scoring). When specifying GABRIEL in fact three result matrices are computed the "normal" loadings, the scores on cases and the scores on variables. C1 contains the loadings as usual, unless you specify the VSCORES option, which stores the variable scores instead as C1. In order to copy the VSCORES after the FACTOR command you might use the subcommand VSCORES explained below.

NOSCORES

Inhibits the computation of the factor scores. This might be useful if you wish to keep the configuration stored in C2 for comparison purposes.

VSCORES (subcommand)

VSCORES copies the scores for the variables (Gabriel only) into free variable locations in the WA (they are tied together as a list: #nvar). If the C1 option is given the scores are not copied but stored in C1 (overwriting the loadings). [See also the VSCORES option on the main command].

PLOT (subcommand)

plots the factor scores. In the case of GABRIEL it allows to display the biplot (Cases and variables in the same space) depending on the second option field. Factor loadings are plotted using the C1 command. [Note that the same functionality may be achieved with the CONFIG DPLOT command, if you ask to store the variable scores into C1].

NEW (subcommand)

This sub-command initiates a new FACTOR analysis, when the program is expecting a sub-command or an other EDA command in order to avoid an "accidental" start of a new factor analysis.

Reference

Gabriel 1971. Credit: Original Principal components program by Dominique Joye, adapted for EDA and heavily modified. A few lines left from the original program.

MDS



   MDS  [vlist] <dist>  | NOCOMPUTE {SIMIL | DISSIMIL}
               NDIM=ndim | ELBOW [KEEP=ndim]
               [ICONFIG] [PRIMARY] [STR2] [CBLOCK]
               [STRMIN=val] [MAXIT=max.iter]
               [SILENT | {DHIST=freq}
               [SHEPARD {DISTANCES}]

Performs Kruskal-Shepard's multidimensional scaling Guttmann 1968. The program normally computes a distance matrix on all variables in the WA or the variables specified by <vlist> depending upon the setting of the ALLVARS mode.

<dist> : all options allowed on the DISTANCE command, namely R=metric and CASEWISE (see there for details). Note that the default distance depends upon the setting of the SET POWER command (default: euclidean distances).

NOCOMPUTE option is used, MDS picks up the MATRIX stored by the user. If the matrix is of unknown type the SIMIL, resp. DISSIMIL option is needed to specify whether MATRIX contains similarity or dissimilarity (e.g. distances).

Either the number of dimensions to compute is specified, or the user wishes an elbow diagram. In this case all solutions are computed and the solution with dimensions specified by the KEEP option (default 2) retained. In any case the resulting configuration is stored in CONFIG.

The user might specify his own initial configuration by storing it in C2 and specifying the ICONFIG option (default: program generates initial configuration).

CBLOCK specifies city block metric, instead of euclidean metric. PRIMARY asks for primary approach for ties, instead of the default secondary approach. MAXITER specifies the maximum number of iterations (default 50), STRMIN the minimal stress to be attained by the program (default 0.01).

SHEPARD displays the Shepard diagram plotted initial data versus dhats; if the DISTANCE option is also present an additional diagram is plotted with the distances.

MDS is an iterative procedure with some possible pitfalls (local minima, degenerate configurations etc), where is useful to consult the computation history. Default is to display each second iteration. The DISP= option can be used to specify more or less frequent display. SILENT suppresses the computation history completely. NODISPLAY suppresses the history on the terminal.

References

Kruskal 1964a, 1964b, Shepard 1962. Credit: Adapted from a version contained in the FASCALE program: Scott, B, Guthery, Spaeth, Stuart Thomas, FASCALE, Technical Report No 29, Computer Institute for Social Science Research, Michigan State University.

Note: All Results are for the final solution stored in C1, i.e. with ELBOW the solution kept (default 2 dimensional, or as set by the KEEP option ).

MINISSA

      MINISSA NDIM=dim | ELBOW [KEEP=dim#] | CHECK
              [SIMILARITIES | DISSIMIARITIES]
              [G-L] | KRUSKAL
              [LOCAL] [CBLOCK] [MAXIT=max.steps]
              [RDIST] [ICONF] [INLRCONF] [TOL=min]
              [SILENT | {DISP=freq}
              [CUT=val {LARGE}] [STRMIN=min]
              [SHEPARD]

NOTE: This is an experimental implementation, it is not yet complete, and all options described in Lingoes 1973 are not (yet) available.

MINISSA: Michigan Israel Netherlands Integrated Smallest Space Analysis is an adapted version from the Guttman-Lingoes Non metric program series. (Monotone Distance Analyses, Unconditional: MDA-U).

This command works on the (dis)similarity matrix stored in MATRIX (a MATRIX must be stored by the user) and produces the requested number of dimensions into C1. The SIMI/DISSI option are used to tell the system, whether the coefficients in MATRIX are similarities (e.g. correlations) or dissimilarities (e.g distances), if the matrix type is not known. (This is usually only the case if the matrix stored is a user defined matrix stored by CONFIG store). These options may NOT be used to override the matrix type; if the type is known they are meaningless.

Unless the Kruskal option is specified, the program uses the Guttman-Lingoes "soft-squeeze", double phase, followed by single phase (rank-images for semi-strong monotonicity).

The KRUSKAL option uses Kruskal's monotone regression (single-phase) and minimizes Kruskal's stress. [To use this option the same way as described by G-L you should do first a MINISSA without this option, use then the result as input configuration for a KRUSKAL run (ICONF option, after CONFIG EXCHANGing).]

The other options have the following meaning: LOCAL uses local monotonicity instead of the default global monotonicity. CBLOCK specifies City-block instead of Euclidean metric.

RDISTANCE computes the derived coefficients (relative distances) and stores them in MATRIX, i.e. replaces the original coefficients.

The MAXIT option controls the maximum number of iterations (default 50).

IFCONF takes up C2 as initial configuration, i.e. this allows to use any arbitrary user defined configuration as starting configuration. Otherwise the starting configuration is program generated.

INLR generates Lingoes-Roskam initial configuration.

The TOL=value option allows to set the tolerance for convergence (termination criterion) different to the default 1.0e-6 (if elbow is used tol defaults to 1.0e-4 to speed up computations).

The CUT=val is used to specify the treatment of ties. The val indicates the value at or above/below which all input coefficients will be considered as tied. Unless the LARGE option is specified, small distances are tied (which achieves clustering). LARGE ties large values(providing yet another way of achieving local monotonicity).

The ELBOW option produces an elbow (scree) diagram, computing all solutions from 1 to 8 dimensions. This option is rather time consuming. The KEEP=dim# option is used to keep the dim#th dimension as result. If it is not given, the two dimensional solution is stored.

The STRMIN=val option specifies the minimal stress to be achieved by the procedure (convergence criterion).

The SHEPARD option displays the Shepard diagram, plotting proximities (d-hat or d* for G-L, resp. KRUSKAL) versus distances. The SILENT option inhibits display of each second iteration at the terminal (which is default) (see MDS for the remaining options).

References

Lingoes 1973, Guttman 1968; Kruskal 1964a, Kruskal 1964b; Credits: Adapted from Lingoes 1973, the eigenval/vector program is from Lebart et al. 1977.

LIMITATIONS

MINISSA has some internal limitations, depending upon the maximal WA size. If the WA is square only NVAR/2 variables/item may be analyzed. If MCAS is at least twice as large as NVAR up to NVAR variables/items may be analyzed.

SEE ?MINISS to check whether there is a limitation (if no message talks about limits, there are none).

TSCALE

TSCALE [vlist] [<dist>] [BMATRIX] [E=v#]
               [C=const] [N=nfact]
               [NOCOMPUTE {SIMILAR | DISSIMILAR}]

Computes a Torgerson Scale (Metric MDS without iteration). The program attempts to define an euclidean space for the MATRIX stored. All the variables in the WA or the vlist, depending on the setting of the ALLVARS mode are included in the analysis. Unless the NOCOMPUTE option is present, the program automatically computes a distance matrix into MATRIX (distance, metric=2, i.e. R=2 ), otherwise the MATRIX already stored is used.

If the N=Nfact option is not present, the user is asked to specify the number of dimensions to be computed and put into C1. This configuration may then be analyzed with the C1.

<dist> are all options allowed on the DISTANCE command, i.e. R=metric and CASEWISE (see there for more details). Note that the default metric is controlled by the SET POWER command.

If the matrix has been produced by an EDA program, e.g. CORREL or DISTANCE, the SIMIL/DISSIMIL options need not be specified; the program determines itself the value of the option, otherwise SIMIL/DISSIML must be specified to determine if the analysis is done on similarity or dissimilarity measures. similarities instead of distances, SIMILAR should be specified.

BMATRIX copies the B-matrix (see reference) into MATRIX. The user is asked to specify the number of dimensions to be computed and put into C1. These dimensions are then analyzed using the C1 command.

The C option adds the constant specified to the distances stored. (Additive constant problem). DMATRIX puts the final distances (computed distances) into MATRIX.

The N= and E= option have the same meaning as on the FACTOR command.

Reference

Torgerson 1958. Credit: Sorry, don't now where the original pieces came from. I found a dusty card-deck, punched in by someone (who?) in the Department a long time ago. Probably a piece from Mark Franklin's and David Handley's DAEDAL package.

Similarity and dissimilarity measures

This section describes a series of commands used to compute similarity or dissimilarity coefficients into MATRIX.

NOTE that for standard use of the analysis commands (except for MINISSA) this commands are not needed, i.e. the default values always call directly the appropriate procedure (e.g. Pearsonian correlations for FACTOR or euclidean distances for MDS or TSCALE). In order to used one of the commands described here you will need to specify the NOCOMPUTE option on the main analysis commands. The main options of CORRELATE are directly available with the FACTOR command; the same is true with MDS or TSCALE with respect to the DISTANCE options.

BASSOCIATION

   BASS [vlist] <coefficient> [CASEWISE] [DIVISION=div]

Compute binary association measures into MATRIX. The command is ALLVARS mode sensitive.

Binary association measures are appropriate for dummy variables. EDA does not require that your variables have only 0-1 values, but it simply considers positive non-zero values as ones and values equal to zero or less as zeros. If DIVISION is present values smaller or equal div are considered 0 (false) and values above div 1 (true). If DIVISION is present values smaller or equal div are considered 0 (false) and values above div 1 (true).

CASEWISE computes association measures between CASES instead of variables, i.e. behaves as if the WA were transposed (see DISTANCE for more details).

For some coefficients there are situations where the coefficient is not defined; this means in terms of the formulas below that a division by zero would occur. To avoid this EDA adds a very small value, causing therefore the corresponding coefficient to become rather large.

Note also that most of the coefficients are similarity measures, however several are dissimilarities (distances). When using these coefficients as input to commands like MDS, MINISSA or VHIERARCHY you need not worry about the problem; for other uses however you need possibly a transformation.

1 ACP S Average conditional probability of a 1-1 match 2 COS S cosine between vectors (Ochiai) 3 ACM S Average cond probability match also called Sokal and Sneath 4 4 GAC S geometric average of cosines 5 PH2 S Phi squared 6 LAM S Goodman and Kruskals Lambda 7 YYU S Yules Y 8 RUS S Russel and Raos prog. 1-1M 9 AND S Anderberg's D 10 SMA S simple match-probability 11 JAC S JAccard coefficient coefficient 12 ROG S Rogers and Tanimoto 13 DIC S Dice coefficient 14 MDI D metric distance 15 SS1 S Sokal&Sneath 1 16 SS2 S Sokal&Sneath 2 17 SS3 S Sokal&Sneath 3 18 SS5 S Sokal&Sneath 5 19 KU1 S Kulczynski 1 20 KU2 S Kulczynski 2 21 HAM S HAMMAN 22 QYU S Yule's Q 23 PHI S 24 BEU D Binary Euclidean 25 SBE D Squared BEU 26 SIZ D size difference 27 PAT D pattern difference 28 BSH D Binary shape 29 DIS D Dispersion similarity 30 VAR D Variance dissimilarity 31 BLW D Binary Lance & and Williamns non-metric dissimilarity

D=distance or dissimilarity S=similarity

    Formulas

            Y
          1     0
        -----------
 X   1  | a     b
     0  | c     d     n=a+b+c+d



1  ACP  (a/(a+b)+a/(a+c))/2
2  COS  sqrt(a*a/((a+b)(a+c))
3  ACM  (a/(a+b)+a/(a+c)+d/(b+d)+d/(c+d)/4.
4  GAC  sqrt(a*a*d*d/((a+b)(a+c)(b+d)(c+d))
5  PH2  (ad-b.c.)**2/((a+b)(c+d)(a+c)(b+d))
6  LAM  t1=max(a,b)+max(c,d)+max(a,c)+max(b,d)
        t2=max(a+c,b+d)+max(a+b,c+d)
        LAM= (t1-t2) / 2*(a+b+c+d)-t2
7  YYU  (sqrt(ad)-sqrt(b.c.))/(sqrt(ad)+sqrt(b.c.))
8  RUS  a/n
9  AND  (t1+t2) / 2*(a+b+c+d)   ; t1,t2 see LAM
10 SMA  (a+d)/n
11 JAC  a/(a+b+c)
12 ROG  (a+d)/(a+d)+2(b+c)
13 DIC  2a/(2a+b+c)
14 MDI  (b+c)/n
15 SS1  ( 2(a+d))/2*(a+d)+b+c
16 SS2  a / (a+2(b+c)
17 SS3  (a+d) / (b+c)
18 SS5  ad/ sqrt((a+b)(a+c)(b+d)(c+d)
19 KU1  a / (b+c)
20 KU2  (a / (a+b) + a/(a+c) / 2
21 HAM  ((a+d) - (b+c))  / (a+b+c+d)
22 QYU  (ad-b.c.)/(ad+b.c.)
23 PHI  (ad-b.c.)/sqrt((a+b)(a+c)(b+d)(c+d))
24 BEU  sqrt(b+c)
25 SBE  b+c
26 SIZ  (b-c)**2 / (a+b+c+d)**2
27 PAT  b.c./(a+b+c+d)**2
28 BSH  ((a+b+c+d)(b+c)-(b-c)**2)/(a+b+c+d)**2
29 DIS  (ad-b.c.) /(a+b+c+d)**2
30 VAR  (b+c)/ 4*(a+b+c+d)
31 BLW  (b+c)/( 2a + b + c)

Credit

About half of the coefficients are adapted from [Anderberg 1973]. Binary storage has not been used; to avoid zero division which with some coefficients is a problem a small constant has been added to the frequency counts. Macro programmers: If you need a different coefficient, it is rather easy to implement them as macros using the LET command. the CMPT command will assist you in such computations as it provides you with the a/b/c and d frequencies which will be stored as result variables.

CORRELATE

 CORRELATE [vl] [VARCO] [CENTER]
                [RANK {NOROB}]
                [JACKNIFE {DIAG_ONLY{=diff}} {GRP=ng}]

Computes a correlation MATRIX between all the variables in the WA or the vlist depending on the setting of the ALLVARS mode and stores it as MATRIX, where it is retrieved by various multivariate analyses. For FACTOR this command is not necessary, the same options may be specified on the FACT command.

CENTER uses the center estimate stored with each variable as estimator of location instead of the mean. this is usually the median, but can be replaced by the estimator or another estimate (Mosteller & Tukey 1977).

VARCOV stores the variance/covariance matrix, instead of the Pearson-correlation matrix.

The RANK option transforms the data into ranks before computing the correlations the ranks are then transformed using an inverse gaussian distribution to obtain robust estimates:

y (k) = inv(F) (k/n+1)

If NOROBUST is present this last transformation is suppressed.

The JACKNIFE option without the DIAG_ONLY option computes the jacknifed Pearsonian correlation coefficient into MATRIX. The DIAGNOSTIC_ONLY option computes the coefficient as usual but computes the jacknived values for diagnostic purposes. The program then displays correlations where the difference between the ordinary coefficient and the jacknived version is larger than 0.05 (default value) or the value set by DIAG=val.

The idea of the jacknife is to omit a case from the computation (here) of the correlation coefficient and repeat this with all cases; then a more robust correlation coefficient is computed from the series of coefficients (pseudo-values).

In practice it is often desirable, especially when then n is rather large to omit more than one case for each calculation, therefore reducing the number of pseudo values to compute. EDA does this under control of the groups option (defaults to n/4). This means in an example with 24 cases and grps=4 that 4 different calculations should be done (4 pseudo values). In order to form 4 separate sets of cases n/4, i.e. 6 cases will be omitted each time, therefore computing the first pseudo-value without cases 1-6, the second without 7-12, the third without 13-18 and finally without 19-24.

Reference

Lebart/Fenelon 1979

DISTANCE

  DISTANCE [vlist]  [R=metric] [CASEWISE]

Computes a distance matrix into the MATRIX from all variables in the WA or the vlist depending on the setting of the ALLVARS mode. This matrix may then be used by multidimensional or cluster analyses (NOCOMPUTE options).

Note that in most cases you will not use the DISTANCE command in order to produce a distance matrix, as the various commands working on distance matrices do it automatically and offer the same options as the DISTANCE command.

R option (metric)

The R= option is used to override the default metric setting for computing distance matrices. Initially the default is set to 2, i.e. the default distance is a standard euclidean distance. When setting R=1 the result will yield a city block (Manhattan) metric. Other settings define any Minkowski metric. The SET POWER command is used to control the default values. Note that in order to use any power metric you will need to define it with the SET POWER command (see there for details). CASEWISE compute distances between cases instead of variables, i.e. behaves as if the WA were transposed. Note that the MATRIX area may contain an up to NVAR by NVAR matrix, therefore if the length of the variables (max MCAS) is larger than NVAR this feature may not be used.

Interpretation of configurations: the C1,C2 commands

CONFIGURATION

  CONFIGURATION <option> [SCORES | SECOND]
  C1            <option>  same as CONFIGURATION
  C2            <option>  same as CONFIGURATION SCORES

<option>  | INFO or ?
          | [d1,d2] PLOT [XVAR][YVAR] [GVAR | TIES] <plot-opt>
          | LIST [NOSORT | {ASC|DESC} {KEY=dim#}]
          |      [LIMIT=cut | LIMIT=(min,max)] [SHORT | STAT]
          | LIST CODED [NOSORT | {ASC|DESC} {KEY=dim#}]
          | [d] SHOW [CODED {Divide=val} [STAT] [NCOORD=n]
          | [d1,d2] DPLOT [ALPHA | CONFIG] <plot-opt>
          | HIGHLIGHT [CUT=cut-val]
          | [dlist] PROFIL [SELECT] [UNIT] [UNIT=val]
          | LOAD [TIE=tie#] [AT=var#] [NOMODCASID]
          | vlist STORE ["nam"] [CASLABEL | VARLABEL]
          | DROP
          | SETDIM=(ncoord,ndim)  ["name"]
          | EXCHANGE
          | REMOVE [DIM=ndim]
          | CASLABEL
          | VARLABEL

 C1 ROTATE [VARIMAX] | QUARTIMAX   [NOSCORES | COEFFICIENTS]


<plot-opt>  [NUMBER] [LABELS] [FULL] [BIG] [NOLEGEND]
          [DETAILS {OVERPRINT}]

This command is used to manipulate the two configuration matrices generated by various multidimensional techniques: C1 and C2. Multidimensional methods typically produce some coordinates in a defined space (loadings and scores in the case of the FACTOR command).

CONFIGURATION is a synonym for C1 and CONFIG SECOND and CONFIG SCORES are synonyms for C2. The synonyms have been introduced to clarify terminology in the context of analysis with two different configurations.

C1 and C2 are the matrices where these coordinates (factor loadings and scores) are placed by commands like FACTOR, ANACOR etc. The C1 and C2 commands are used to list, plot and manipulate these result matrices. Other specialized commands like CFIT or CFIX take up the information in the matrices to produce additional results.

Default (no option)

If no option or INFO is specified, C1 and C2 show the names of the currently stored configurations, or tell that no such matrix has been stored.

LIST

LIST lists the configuration on the terminal and performs some specific computations for the analysis which produced the configuration. (e.g. communalities, percent of variation explained etc.). These computations are performed only on variable related information (for techniques where it makes sense). The computation of this information may be suppressed by the SHORT option or requested - for configs where it would not be computed - with the STATISTIC option.

Whenever there is space on the screen (and always in the print file) the full (or the beginning of it) descriptor is shown, as well as the sequence number of the variable/case shown (facilitating identification on a PLOT). Descriptors will only be shown if the corresponding variable exists in the WA since the C1/C2 only keep labels. [Beware: If several variables in the WA have the same name (possible, but not recommended) the descriptor will be that of the first variable found in the WA]. Labels preceded by a + sign, have been added to a configuration, i.e. will not enter into the computation of communalites and other summary information.

By default the configuration is sorted on the first dimension. The sort order depends upon the setting of the SET SORT switch (default ascending).

The DESCENDING or ASCENDING options are used to override the default sort order; the KEY option asks for sorting on dimension #dim instead on the first dimension. The NOSORT option inhibits automatic sorting on the first dimension.

Values smaller than abs(0.5) are not shown on the display and appear as a blank field. The user may change this. LIMIT=(min,max) shows only values outside the interval min <-> max. The Limit=cut form shows only values larger than abs(cut).

The CODE option displays + and - symbols for intervals on the coordinates. For example in the case of factor loadings (i.e. correlations), the range of abs(0 to 1) is divided into 1/D (D defaults to 8 segments and for each segment a +, resp. a - symbol are displayed. The LIST on the print file includes also variable descriptors, if variables in C1/C2 correspond to variables in the WA. Explanation ratios and communalities are not shown with CODED.

SHOW

SHOW is similar to LIST (regarding the output) but concentrates on a specific dimension (default the first). It shows the four most important positive values and the four most important negative values on that dimension, as well as the values of the same variables/cases on the other dimension. N= is to use to ask for more or less cases/variables to be displayed (default 4, i.e. 4 high values and four low values). Contrary to LIST, default is never to show communalities and explained variances; STAT is used to override this. Also it uses the SHORT format. Most options from the LIST command also apply, but are not mentioned as they are not really useful (sorting options) with this command, except possibly the LIMIting options.

HIGHLIGHT

HIGHLIGHT displays a short profile of the requested configuration as well as for each dimension the top three values with labels (top three positive and top-three negative values). This is useful only if the dimensions are centered around the mean. Default value for cutting (i.e. displaying a '+', resp. a '-' is 0.5 (i.e. suitable for correlations etc.). A different cutting value may be specified with the optional CUT=value option; specify a positive value, the cutting value at the low end is the same value with the sign reversed.

PROFILE

PROFIL displays profiles for all or selected dimensions from the specified configuration. The position on each dimension is marked with the number of the corresponding dimension. There is a section on the display indicating the top 9 (absolute value) coordinate for each dimension, i.e. coordinates rank 1..9, ranks below 9 are left blank). On the print file the numeric values of the coordinates are also output. Default is to show all dimensions, the SELECT option allows to specify selected dimensions. Other options deal with the scaling: default is to scale to the real min/max in the matrix. UNIT specifies that the display should range from -1 to +1, UNIT=val

PLOT

PLOT plots d1 against d2. If d1,d2 are not present the first and second dimensions are plotted.

If nothing else is specified case-wise data (e.g. factor scores), which are usually stored in C2 are plotted using the (case-) identifiers, variable oriented data (e.g. factor loadings), which are normally stored in C1 are numbered, and a legend is displayed on the left side of the plot. These default plotting modes may be modified: the NUMBER option plots case oriented data with sequence number, instead of case identifiers, and LABELS uses the variable oriented C1 labels instead of numbers.

Besides the symbols requested, several other symbols might be shown. For positions where more than one point should be plotted two different overprint symbols are used. ($ for variables and the @ symbol for observations. Note that this symbols might be different in your EDA version, and you maz change it yourself using the SET GRAPH command). If no information exists, as this might be with a case no member of a group (GVAR PLOT) then a ? will be shown. Finally numbers (Gvars, ties) not fitting into the allocated positions (say a 120 into a two character position) a # symbol will appear.

The BIG option turns the terminal display of the PLOT off and produces a plot having the size of a printer page to the print file.

FULL uses four character labels instead of 2 character labels. Note that 4 character labels are always used, whenever the width of the plot area is larger than 100 positions, i.e normally the print file, but not the screen. However more and more screens have those capabilities and many versions of EDA take advantage of that fact. When using 2 character labels, it might be useful to alter casids and/or labels. This can be done with the CASID command and/or the ELABEL command within the EDITor. If other display forms are desired, load the dimensions into the WA and use the PLOT commands.

PLOT NOLEGEND : suppresses the legend shown to the right of the plot area. (This legend only appears with plots using numbers referring to variables).

GVAR causes (for case-related configurations) the group memberships to be displayed, instead of the case identifications. Note that they are limited to two characters, any membership greater than 99 is displayed as a '#'; an undefined case (group 0) is shown as '?'.

The TIES option does the same for variable related configurations, but plots table ties, instead of group memberships.

The XVAR, resp. YVAR are used to tell the program that the X variable (d1), resp. de Y-variable (d2) are not to be taken in the corresponding configuration, but are to be interpreted as variables in the WA. Note that C2 1 2 PLOT XVAR YVAR in fact plots variable 1 against variable 2, as PLOT 1,2 CASID (a difference occurs because of different treatment of the origin) and small differences in the scaling algorithm (namely that the origin is always represented.

PLOT DETAILS: produces a list of all points and their coordinates to the print file (requires active print file): the list contains the label shown, the symbol actually plotted, the coordinates (true x,y values, as well as device coordinates (rows and columns).

PLOT DETAILS OVERPRINT produces the same list, but only for points no shown completely (overprinted).

DPLOT

DPLOT (for Double_PLOT) plots C1 and C2 simultaneously, i.e in the same space. This is useful for correspondence analysis, configuration comparisons and many other applications, the meaning of such plots however has to be considered very carefully. If no option is present case-oriented data is plotted with identifiers and variable oriented data as sequence numbers. If both areas are variable oriented, which often occurs in configuration comparison, the plotted points are numbered sequentially. The displayed text explains the system used. The CONF option is used to contrast two configurations: one configuration is shown as "A" and the other as "B", individual points are not distinguished. The ALPHA option displays one configuration using A..Z legends, and the other using a..z (lower-case) legends. If more than 26 points are plotted the remaining points are shown as "#". When using these two options the user has to make sure himself, that the configuration areas contain indeed comparable data.

For an explanation of the <opt>ions refer to C1 PLOT. The options have the same meaning.

LOAD

LOAD loads the specified configuration as variables into the free positions in the WA. These new variables are tied together as table #<nvar> for C2 and #<nvar>-1 for C1, where <nvar> is an implementation parameter (max number of variables). The TIE=#list option allows to change the default table tie. The AT= option copies the configuration at location var# and following (in this case existing non-protected variables are overwritten). LOAD normally modifies the case identifiers of the WA, i.e. it replaces them with the configuration related ids. This may be suppressed with the NOMOD option.

STORE

STORE stores vlist as configuration into one of the configuration areas. "name" may be used to specify a configuration label. The CASLABEL, resp. VARLABEL commands are used to indicate that the configuration stored is to be considered case-oriented or variable-oriented. Both configuration areas have labels for the coordinates stored, which are used for displaying and plotting. The default is to take C1 are variable-oriented (using the CLABELS) and C2 as case-oriented. The two options on the store command are used to modify this default mechanism.

CASLABEL and VARLABEL

CASLABEL, VARLABEL these two commands are used to reset the labels, i.e. if CASLABEL is specified a matrix is set to be case oriented, if VARLABEL is present the matrix is set to be variable oriented. Beware: These options have to be used carefully, because the can allow to attach the wrong labels to the wrong variables.

SETDIM

SETDIM=(ncoord,ndim) set the size of the C1 or C2 matrix. The "name" field is used to name C1 or C2. The SET command is needed before filling the C1 or C2 matrix by means of transformation commands (LET, IF) which require to know in advance the size of the matrix. The "name" field is used to set the name (origin) of the configuration.

EXCHANGE

EXCHANGE exchanges C1 and C2. This option is useful when manipulating configurations for instance to prepare a target rotation.

Consider a situation where you want to compare results from a principal components analysis and the TSCALE procedure, i.e. you want the results from TSCALE in C1 and the results (loadings) from FACTOR in C2:

    >FACTOR NDIM=4
    >CONFIG EXCHANGE
    >TSCALE NDIM=4
    >CFIT KAISER

FACTOR produces a C1 and C2. CONFIG EXCHANGE copies C1 into C2 and vice-versa. TSCALE produces a new C1, overwriting the current contents. CFIT then compares the contents of C1 (TSCALE results) and C2 (factor loadings).

DROP

DROP drops a configuration.

REMOVE

REMOVE is used to remove one or more dimension from a given configuration. Without DIM= one (the last) dimension is removed. DIM=2 would remove the last two dimensions (compare with DROP).

C1 ROTATE

ROTATE rotates the contents of the C1 matrix using by default the varimax method. Alternatively QUARTIMAX may be specified.

If a C2 matrix, related to C1 is present (factor scores) the scores are recomputed automatically, unless the NOSCORES option is present. You may also specify COEFFIECIENTS if you want the factor scores coefficients displayed.

Note that C1 ROTATE does the same as the ROTATE command followed by the SCORES command. Use this form in regular situations, i.e. working with principal components or similar when a ROTATION should be followed by a recomputation of the factor scores.

Alternatives (*)

These matrices may also be used with expressions (see the section on expressions for complete details). As an example you might wish to look at specific rows or columns. This might be done by

        SET EXPRESS DISPLAY
        LET K2
        LET C3

The first command just says that you wish the result of expressions displayed (used for vector valued results); if you don't do that EDA will tell that vector results are not displayed. The next two lines are expression examples (where the target part is missing and just one term is used ... and many more could be used...). C and K designate the C1, resp. C2 configuration. C3 refers to the third dimension in the first configuration, K2 refers the second dimension of the C2 configuration.

Note that the Cn/Kn reference is a short form of the matrix references C[] and K[]. C[1,1] means first row (variable or case) and first dimension. Omitting one of the indices designates a complete row or column. Therefore K[1,] will display the values of the first case on all dimensions; C[,1] shows all variables on the first dimension. References to dimensions, e.g. C[,1], K[,4] may be written as C1 and K4. (P.S. Please distinguish clearly between the C1 and C2 configurations (e.g. matrix concepts) and - written in an expression context - C1, C2 meaning the first and the second dimension of the FIRST configuration i.e. C1; K1 means the first dimension of the second configuration, i.e. C2. See the section on expressions for additional details.

Rotation and factor scoring

ROTATE

 ROTATE     | {VARIMAX}
            | QUARTIMAX

Rotates the stored C1 (FACTOR, ANACOR, CONFIG) using the VARIMAX, resp. the quartimax criterion to achieve simple structure. Note that the C2 area is not modified. Use the SCORES command if you wish to recompute the factor scores.

As an alternative you might want to use the C1 ROTATE command performing, by default, a rotation followed by an automatic recomputation of the factor scores.

SCORES

    SCORES  [COEFFICIENTS]

Computes factor scores into C2. This command is used to recompute factor scores after ROTAtion unrotated scores are directly available. SCORES requires that C1 and MATRIX be stored and the data related to them are in the WA, otherwise scores cannot be computed. The scores are computed using:

(inverse R) * S

where R is the correlation matrix and S the factor structure matrix. If the matrices have been produced by principal components, the following formula applies:

     inverse  S (S'S)

The COEFFICIENT option displays the factor score coefficient matrix.

Compare with C1 ROTATE.

Credit

Based on a program by Dominique Joye.

The MATRIX command

MATRIX

  MATRIX  |{INFO   or  ?}

  MATRIX  | SUMMARY [CUTOFF=val] [NOADJUST]
          | LIST <opt>
          | LIST CODED ["altsym"] <opt>
          | LIST VALUE=val [FUZZ=val] <opt>
          | LIST VALUE=(low,high) [<opt>

  <opt>   [SORT{=sortvar#} {ASCENDING | DESCENDING} ]
          [NOROUND] [CUTOFF=val] [NOADJUST]
          [WIDTH=nchars]

MATRIX    | DROP
          | LOAD [TIE=list#]
          | vlist STORE
          | CHECK
          | SET | [DIS]SIMILARITY
          |     | SETDIMENSION=ndim
          |     | "name"
          | elis SUBSET [NOLAB]
          | DIAGONAL{=const}
          | SET [DIS]SIMILARITY
          | EIGENANALYSIS [NORMAL] [EIGVAL{=var#}]
          | SYMMETRIZE LOWHALF | UPHALF

The MATRIX command manipulates MATRIX (see chapter introduction for more information). Options are offered to inspect and manage the MATRIX.

Without option (default) the MATRIX command displays the information attached to the current MATRIX: number of variables, number of cases on which the matrix is based (if available) and descriptive information on the contents of MATRIX. description.

MATRIX LIST

MATRIX LIST in its various form is intended to give an overall view of a MATRIX, without going into to much (numerical) detail. All forms show the value size related information in the lower triangle of the matrix (MATRIX is considered symmetrical), whereas the upper triangle shows the sign of the coefficients. If all coefficients are positive a message is given and the upper triangle is blank. Coefficients are supposed to start at 0 (absolute value). Values not shown, because they are too small, appear always as blank character; this is also true for the sign. Specification of a CUTOFF=val value defines a limit: coefficients below that value (absolute value) are considered 0 and appear as blanks.

The first form (default) displays a matrix showing the size of each coefficient by a single digit 1,2,3,4 .. 9 (0 is blank) and a star (*) for "10" in the lower triangle of the matrix. If the coefficients in the matrix are correlations, i.e. having an absolute value between 0 and 1, 2 corresponds to a correlation of 0.2, 9 of 0.9 and a star of 1.0. If the coefficients are not normalized the result is slightly different: the range of the coefficients (determined from the current MATRIX) is divided into ten intervals and the same digit-symbols are used to show the size of the coefficient.

The digit-symbol is obtained by rounding the current coefficient. Therefore a value of 0.04 will be considered and 0.051 as 0.1. If you prefer truncation to rounding specify NOROUND. The cutoff options is described above.

MATRIX LIST CODED: represents in the lower triangle of the displayed matrix a symbol for each interval of the coefficient. Default is to use 5 intervals, where the first interval is represented by a space character and the other 4 using the standard EDA symbols representing growing "density". Again only the absolute value is considered. Alternatively the user may specify other symbols using "alt_symbols". If e.g. specifying "ABCDFG" for instance (6 symbols) seven intervals will be used (first interval = space) and the other intervals will be shown as A, B etc.

The NOROUND and CUTOFF options are the same as explained below, except that cutoff here defines the "first" interval represented by a space.

MATRIX LIST VALUE=val is used to mark a specified value in the displayed matrix (absolute value). The value marked is within the precision limits defined by the EDA fuzz value (See SET FUZZ or SHOW), i.e. the range in with val is matched is val-fuzz to val+fuzz. The FUZZ=val option may be used to specify a different fuzz value, that the current value.

MATRIX LIST VALUE=(low,high) marks all coefficients in the specified range (absolute values). If the interval limits lie within the cutoff range, the values within the cutoff limit do not appear. The symbol used is the standard mark symbol.

Besides the already mentionned common options (NOROUND, CUTOFF) the LIST options share other options.

SORT=varnum# sort the data matrix on varnum#, instead of showing the matrix in natural order, i.e. the first column of the MATRIX will contain varnum# as the first row, followed by the other variables in ascending or descending order. The ASCENDING and DESCENDING options are used to select a different sort order than that specified by the current SET SORTORDER setting.

WIDTH=nchar controls the number of letters from the matrix labels shown vertically. Matrix labels are 8 letters long and used therefore a lot of vertical space on the screen when shown fully, this option (defaults to 4) lets you optimize the screen display for your purpose.

NOADJUST: EDA guesses whether the coefficients lie in a range of -1 through 1 or 0 through 1 by checking all coefficients: if all coefficients are smaller than or equal to 1 then if all values are positive 0 to 1 is assumed, if values are in the range -1 to +1 this range is assumed. This is ok for many situations; however if values are e.g. all smaller than 0.2 the result might not be what you want. The NOADJUST option inhibits these "assumptions" and takes the true minimum and maximum as a basis for scaling.

CUTOFF=val: All coefficients smaller than abs(val) are considered as zero, i.e. are counted in the first column.

MATRIX SUMMARY

displays a table of counts of coefficients in MATRIX: the number of coefficients at various levels (5 intervals) is shown. For correlation coefficients e.g. the table shows how many coefficients (absolute values) lie between (near) 0 and 0.2, 0.2. and 0.4 and so forth up to 0.8 to 1. Values close to zero are shown in a separate column. In each interval two numbers might appear : the number of positive values and the number of negative values in that interval. Furthermore a last column with numbers in parenthesis may appear showing coefficients showing out of the range.

A note on precision: the allocation of the coefficients to each interval is determined by rounding, e.g. a value of 0.04 will be considered as zero (shown in the first column) and a value of 0.06 will be shown in the first interval.

If the theoretical range of the coefficient is not known and cannot be determined, the true maximal coefficient is searched and the intervals determined accordingly. Coefficients are supposed starting at 0 (absolute value).

MATRIX STORE

MATRIX vlist STORE: Stores the variables in the <vlist> as new MATRIX. The user has to place appropriate information into the work are. This command is used to put alternative distance or similarity coefficients into MATRIX. Note that the matrix formed by the <vlist> needs to be square, i.e contain the same number of variables and cases. Furthermore you need to know that the maximal size of MATRIX is NVAR.

See below the section on expressions for other ways of modifying the contents of MATRIX.

MATRIX LOAD

Loads the correlation matrix into the work area as variables. Each row of MATRIX will become a variable in the WA, stored into free locations. You may used the TIE=listnum option to tie these new variables to a specific group of variables. Note that with expressions MATRIX elements and lines or columns may be copied or modified using the M[] reference.

MATRIX DROP

DROP drops a stored matrix, i.e. the MATRIX will no longer be available. Macro programmers: Note that the matrix is not set to zeroes, only the matrix size is set to 0.

MATRIX SUBSET

Subsets the MATRIX using the elist to indicate which matrix elements (row and columns at the same time: remember the MATRIX is square and symmetric). This command is for instance useful to perform several factor analyses on different subsets of variables without recomputing the correlations each time. The NOLAB option inhibits the modification of the labels of the matrix (note that these labels are shared with any variable oriented configuration stored either in C1 and/or C2).

MATRIX DIAGONAL

DIAGONAL reads in new diagonal elements for a stored matrix. This might be used, e.g. to enter communality estimates for the FACTOR command. The form DIAG=constant stores <constant> in all diagonal elements of the MATRIX.

MATRIX EIGENANALYSIS

EIGENANALYSIS computers eigenvalues and eigenvectors on the symmetrical MATRIX. MATRIX is replaced by the matrix of eigenvectors. NORMALIZE commands the normalization of eigenvalues. EIGEN allows to copy the eigenvalues into a variable. NOTE: MATRIX must be symmetrical.

MATRIX SYMMETRIZE

SYMMETRIZE allows for symmetrization of a triangular matrix. LOWHALF stores the values in the lower triangle in the upper, regardless of the values found there. UPHALF does the same, but the upper half triangle is copied into the lower.

MATRIX CHECK

Checks the current MATRIX and reports the following: whether it is symmetrical or not; the number of negative, positive and zero values in the matrix, as well as the minimum and maximum coefficient found.

MATRIX SET

SETDIM=ndim. Sets dimensions of MATRIX for calculations using matrix references. Remember that the MATRIX is square. Note that this command is needed when storing information into MATRIX using expressions.

SET {DIS}SIMILARITY set the similarity or dissimilarity attribute for the currently stored matrix. Some multivariate techniques need to know whether the coefficients are similarities or dissimilarities. Usually this attribute is set by the creation command or may be specified upon calling the analysis command (e.g. MDS). The SET command is useful to avoid unnecessary typing of options.

"name" set the MATRIX descriptor to "name".

Related commands

With the LET/IF family of commands it is possible to manipulate the matrix using the M[] reference. Each element may be accessed, as well as single rows and columns. Let us consider an example:

Let the WA contain 3 variables having values:

 v#1   1 2 3
 v#2   4 5 6
 v#3   7 8 9

Store the data with a MATRIX STORE; then using the LET command:

  >SET EXPRESS DISPLAY
  >LET M[,1]

  will display 1 4 7

  >LET M[1,]
    1 2 3

Note that SET EXPRESS DISPLAY is required in order to display the result of a non scalar computation; if you are in default mode an error message tells you that vector results may not be displayed (only assigned to some variable).

As a second example, let us consider a problem where you want to make sure that MATRIX contains only positive values, i.e. taking the absolute value of all elements.

The command LET M[3,2]=ABS(M[3,2]) takes the absolute for one element (3rd row, 2nd column). In order to perform the same operation for all elements you need to write the following command:

 >EXECUTE FOR R END=$MDIM "LET M[R,C]=ABS(M[R,C]) \ FOR C END=$MDIM"

LET M[R,C]=ABS(M[R,C]) is the command to be executed repeatedly, R being an index to the current row and C to the current column. EXECUTE controls the row loop, starting at row 1 (default, no need to specify it) and terminating after $MDIM iterations. $MDIM is a system constant, indicating the size of the current MATRIX. EXECUTE executes the command found in between the quotes, i.e. the LET command, controlled by a second loop, taking variable C from 1 to $MDIM.

Configuration fitting

CFIT

    CFIT <method> FIT RESID

Fits (compares) two configurations stored in C1 and C2 respectively (target rotation).

The following methods are available:

SCHONEMANN&CARROL KAISER AHAMVAARA&RUMMEL [FULL]

Summary statistics and the transformation matrix are displayed on the terminal. Optionally the FITted matrix can be placed in C2 and the RESIdual matrix into C1.

The third technique (AHAM) has a third optional option FULL producing additional matrices into the print file (print file should be open, otherwise FULL is ignored).

References

Lingoes 1973, Rummel 1970; Schonemann&Caroll, Veldman 1967. Credits: Program sources from Lingoes 1973, Veldman 1967 with help from Dominique Joye.

CFIX

    CFIX [RESID] [CONFIG]

Fits the configuration in C1 to the stored MATRIX. This command is useful for imposing dimensional structures and for factor comparisons. It requires that a MATRIX and a C1 be stored.

This command permits to describe, using hypothesis vectors, the sequential arbitrary factors to be approximated by orthogonal factors. This extractopm of arbitrary factors permits the researcher to specify the loadings (s)he would like to have on each factor. This prescription is respected as far as possible, subject to the requirement that each factor be orthogonal to the preceeding factor.

The RESID option copies the residual matrix into MATRIX. (R-Matrix) As one of the uses of CFIX is often to remove control factors, this residual matrix may be used as input to a principal component analysis. (FACTOR NOCOMPUTE) would take up this residual matrix and perform a principal component analysis on it.

CONFIG copies the resulting factor pattern into C2.

If the CONFIG option is present the display of the factor pattern matrix is suppressed (use the C2 command).

Reference

Cooley & Lohnes 1971, chapter 5.2

Credit

Adated from Cooley and Lohnes by Dominique Joye