Note that several of the commands described here are useful in other contexts, namely all commands computing coefficients into MATRIX may be used in connection with the VHIERARCHY command.
The following commands are documented in this section:
ANACOR correspondence analysis (Benzecri) BASSOC compute binary association measures C1 analyse and manipulate the C1 matrix C2 analyse and manipulate the C2 matrix CANONICAL Canonical analysis CFIX fit two configurations CFIT configuration comparison CONFIGUR configuration manipulation CORRELATE compute a correlation matrix DISTANCE compute distance matrix FACTOR Factor analysis MDS multidimensional scaling (non-metric) MINISSA smallest space analysis ROTATE configuration rotation SCORES factor scoring TSCALE Torgerson scale
MATRIX: contains the dissimilarity or similarity measures used with the dimensional techniques and usually computed on variables from the WA.
C1 (configuration 1) contains the first result matrix from a dimensional analysis and usually defines the variables space.
C2 (configuration 2) contains the second result matrix from a dimensional analysis and is usually related to the observation's space.
Let us consider principal component analysis (default operation):
compute correlations on the variables perform a eigenvalue/eigenvector decomposition compute factor loadings compute factor scores interpret resultsThis sequence which is similar to most other dimensional analyses translates into EDA as follows:
Principal components is invoked with the FACTOR command:
(1) Then EDA picks the variables from the WA as specified (see below) and computes correlations and stores them into MATRIX.
(1a) Examine the correlation matrix using the MATRIX command.
(2) Eigenvalue/vector analysis is performed and diagnostic information shown; the user then selects the number of dimensions.
(3) Compute factor loadings and store them into C1.
(4) Compute factor scores and store them into C2.
(5) Examine (list, plot etc.) C1 and C2 using the C1/C2 commands.
(6) Use further commands to study the results, e.g. perform a hierarchical cluster analysis on C2 (factor scores).
Note that is the default way of doing things; many options are provided for the more sophisticated user in order to modify these defaults, e.g. FACTOR may be directed to pick the values it finds in MATRIX, instead of computing correlations.
As in other instances (CASID, GVAR) C1, C2 and MATRIX designate a program concept (matrices here), as well as the commands used to manipulate them. C1, C2 manipulate the configuration matrices, and MATRIX may be used to perform operations on MATRIX.
Furthermore if the specialized facilities for C1, C2 and MATRIX are not sufficient it is always possible to load these matrices into the work area and treat the data the same way as any other variable.
If you are an advanced user you might even write some macro command using directly the configurations produced by FACTOR.
Let us take the FACTOR command as an example:
FACTOR FACTOR vlistWhen specifying a variable list on the command line the variables in the list are analyzed, no matter what how ALLVARS is set. However if no list is present, the behaviour is different. With ALLVARS ON all variables in the WA are taken (of course the WA then has to be rectangular); with ALLVARS set to OFF the previous list is used, i.e. EDA works the same way as with the descriptive commands.ALLVARS ON all variables in WA variables in list
ALLVARS OFF previous list variables in list
ANACOR [vlist] [NDIM=ndim] [EIGEN{=v#}] [NOSCORE] [CWEIGHT] [VWEIGHT] [CDIST2] [VDIST2]Performs a simple correspondence analysis (Benzecri) on the whole WA or on the variables in the vlist (the command is ALLVARS sensitive). Anacor allows for simultaneous representation of cases and variables in the same space using a chi-square distance (stored into MATRIX) and produces a C1 (variables) and a C2 (observations) configuration.Subcommands:
ANACOR [vlist] SUPVARS
ANACOR d1,d2 PLOT | [BOTH] | <opt> | CASES | | VARIABLES |
ANACOR VCONTtributions <copt> ANACOR CCONTtributions <copt>
<copt> :== [RELATIVE] [CONFIG | COPY_to_WA]
ANACOR NEW
<opt> --> see C1/C2 PLOT for details
Whenever a row or a column of the matrix analyzed has a sum of zero, a message is issued, telling which row, resp. column is concerned and that the sum has been replaced by a very small value.
The basic workings of this commands is analogous to the to the FACTOR command, i.e. the main command performs the initial computational tasks and produces the requested configurations. Further information can be requested with sub-commands, i.e. commands issued immediately after the main command.
The NOSCORES option inhibits the creation of the C2 matrix (individual scores); in this case the subcommands are disabled. The NOSCORE option is useful when comparing configurations and you do not wish to overwrite the previous C2 matrix.
NDIM=nd lets you specify the number of dimensions to compute (if this option is not present you will be asked).
SUPVARS adds variables to the current configuration. (projection of variables not used in the computation into the computed configuration). Supplementary variables added appear on lists and plots with a + sign in the first label position.
The NEW option allows to initiate a new ANACOR analysis, when the program is expecting a subcommand. If you type ANACOR without a subcommand immedately after another ANACOR command, EDA issues an error message. This to prevent that the auto-repeat mechanism starts the whole computation process all over again, whenever a subcommand is mistyped.
V/CCONTRIBUTION produce absolute and relative contributions for the variables (VCONT) or the cases (CCONT). If RELATIVE is not present, absolute contributions are produced. Default is to display a table; options are provided to copy the result either into the WA as variables (as many as dimensions): COPY or into C1 (variables) or C2 (cases): CONFIG. This last possibility destroys the coordinates; it is only provided for refined analysis with the C1/C2 LIST command. For these reasons subcommands using a modified C1 or C2 are disabled whenever the CONFIG option has been used. Therefore if you intend to use several sub-commands use CONFIG last.
The names of the variables generated with the COPY option are as follows:
aaCbbDnnTherefore AbCCsD02 for example means Absolute contributions of the cases to the second dimension.where aa = Ab for absolute contribution Rl for relative contribution bb = Vr for variable space Cs for observation space nn = the number of the dimension
CWEIGHTS and VWEIGHTS copy the weights of the cases and variables into a variable in the WA. This is useful for tricky computations with the results. [These weights are the relative marginal frequencies].
VDIST2 or CDIST2 copy the distances of each variable (case) to the gravity center of the variable (case) space. Like the CWEIGHT/VWEIGHT these options are used with macros of special analysis needs.
CANONICAL vlist1&vlist2 [<corr>] [SCORES={var}]CANONICAL NOCOMPUTE [NCASE=nc]
Performs a canonical correlation analysis.
There are two modes of operation: (1) The default mode computes first a Pearsonian correlation matrix (or as specified by <corr>; see CORRELATE for a full explanation) from the variables specified and stores it in MATRIX. This mode requires two variable lists, i.e. the first list corresponds to the variables in the left set and the second list to the variables in the right set. Specify these variables in the vlist field and separate the two lists with the "&" symbol.
(2) The NOCOMPUTE option inhibits the automatic calculation of a MATRIX and takes up the stored MATRIX. The NCAS= option is required if the number of cases on which the MATRIX is based is not known. This is only the case when a user defined MATRIX has been stored (.e.g. using the MATRIX command or direct computations).
CANON then requests the variables in for left set and in the right set, which are obtained from the user as two separate lists. You then should enter the numbers of the variables (i.e. the position they occupy in MATRIX). It is not possible to specify variable names.
Besides the displayed results for the left set is stored in C1 and the configuration of the right set in C2 (canonical correlations). Note that this command destroys the initially computed MATRIX.
The SCORES option computes individual scores and copies them as new variables into the WA. If SCORES is used the new variables are stored in free locations of the WA; SCORES=var# directs EDA to copy them into variable locations starting with variable number var#. The generate variable names are of the form csorl<xxx> and csorr<xxx> for the scores for the left set, resp. the right set and where <xxx> is a number indicating the number of the factor. Furthermore the scores for each set are tied to a different list; the number of the list is reported.
SCORES is not available with the NOCOMPUTE option. Limitations: CANON is internally limited to analyzed sets of up to NVAR/2 in size each.
FACTOR [vlist <corr> | NOCOMP] [NF=nfac] [NOSCORES] [MAXROW] [EIGEN{=v#}] [NNORM] [GABRIEL {EUCLID} {VSCORES}]Factor analysis procedure computing a factor analysis for the variables in the WA (whole or vlist, depending on the setting of the ALLVARS mode) or the MATRIX stored using the MATRIX STORE command. The factor loadings are stored in C1, the scores in C2. <corr> are options controlling the computation of the matrix to be analyzed. For a complete description of <corr> refer to the CORRELATE command; all options apply here also.Sub-commands:
FACTOR | VSCORE C1 | *
FACTOR dim1,dim2 PLOT | [BOTH] | <opt> * | [CASES] | |VARIABLES | *
<opt> see C1/C2 PLOT for details
FACTOR NEW
*) applies to GABRIEL option only
MDS [vlist] <dist> | NOCOMPUTE {SIMIL | DISSIMIL} NDIM=ndim | ELBOW [KEEP=ndim] [ICONFIG] [PRIMARY] [STR2] [CBLOCK] [STRMIN=val] [MAXIT=max.iter] [SILENT | {DHIST=freq} [SHEPARD {DISTANCES}]Performs Kruskal-Shepard's multidimensional scaling Guttmann 1968. The program normally computes a distance matrix on all variables in the WA or the variables specified by <vlist> depending upon the setting of the ALLVARS mode.
<dist> : all options allowed on the DISTANCE command, namely R=metric and CASEWISE (see there for details). Note that the default distance depends upon the setting of the SET POWER command (default: euclidean distances).
NOCOMPUTE option is used, MDS picks up the MATRIX stored by the user. If the matrix is of unknown type the SIMIL, resp. DISSIMIL option is needed to specify whether MATRIX contains similarity or dissimilarity (e.g. distances).
Either the number of dimensions to compute is specified, or the user wishes an elbow diagram. In this case all solutions are computed and the solution with dimensions specified by the KEEP option (default 2) retained. In any case the resulting configuration is stored in CONFIG.
The user might specify his own initial configuration by storing it in C2 and specifying the ICONFIG option (default: program generates initial configuration).
CBLOCK specifies city block metric, instead of euclidean metric. PRIMARY asks for primary approach for ties, instead of the default secondary approach. MAXITER specifies the maximum number of iterations (default 50), STRMIN the minimal stress to be attained by the program (default 0.01).
SHEPARD displays the Shepard diagram plotted initial data versus dhats; if the DISTANCE option is also present an additional diagram is plotted with the distances.
MDS is an iterative procedure with some possible pitfalls (local minima, degenerate configurations etc), where is useful to consult the computation history. Default is to display each second iteration. The DISP= option can be used to specify more or less frequent display. SILENT suppresses the computation history completely. NODISPLAY suppresses the history on the terminal.
Note: All Results are for the final solution stored in C1, i.e. with ELBOW the solution kept (default 2 dimensional, or as set by the KEEP option ).
MINISSA
MINISSA NDIM=dim | ELBOW [KEEP=dim#] | CHECK [SIMILARITIES | DISSIMIARITIES] [G-L] | KRUSKAL [LOCAL] [CBLOCK] [MAXIT=max.steps] [RDIST] [ICONF] [INLRCONF] [TOL=min] [SILENT | {DISP=freq} [CUT=val {LARGE}] [STRMIN=min] [SHEPARD]NOTE: This is an experimental implementation, it is not yet complete, and all options described in Lingoes 1973 are not (yet) available.
MINISSA: Michigan Israel Netherlands Integrated Smallest Space Analysis is an adapted version from the Guttman-Lingoes Non metric program series. (Monotone Distance Analyses, Unconditional: MDA-U).
This command works on the (dis)similarity matrix stored in MATRIX (a MATRIX must be stored by the user) and produces the requested number of dimensions into C1. The SIMI/DISSI option are used to tell the system, whether the coefficients in MATRIX are similarities (e.g. correlations) or dissimilarities (e.g distances), if the matrix type is not known. (This is usually only the case if the matrix stored is a user defined matrix stored by CONFIG store). These options may NOT be used to override the matrix type; if the type is known they are meaningless.
Unless the Kruskal option is specified, the program uses the Guttman-Lingoes "soft-squeeze", double phase, followed by single phase (rank-images for semi-strong monotonicity).
The KRUSKAL option uses Kruskal's monotone regression (single-phase) and minimizes Kruskal's stress. [To use this option the same way as described by G-L you should do first a MINISSA without this option, use then the result as input configuration for a KRUSKAL run (ICONF option, after CONFIG EXCHANGing).]
The other options have the following meaning: LOCAL uses local monotonicity instead of the default global monotonicity. CBLOCK specifies City-block instead of Euclidean metric.
RDISTANCE computes the derived coefficients (relative distances) and stores them in MATRIX, i.e. replaces the original coefficients.
The MAXIT option controls the maximum number of iterations (default 50).
IFCONF takes up C2 as initial configuration, i.e. this allows to use any arbitrary user defined configuration as starting configuration. Otherwise the starting configuration is program generated.
INLR generates Lingoes-Roskam initial configuration.
The TOL=value option allows to set the tolerance for convergence (termination criterion) different to the default 1.0e-6 (if elbow is used tol defaults to 1.0e-4 to speed up computations).
The CUT=val is used to specify the treatment of ties. The val indicates the value at or above/below which all input coefficients will be considered as tied. Unless the LARGE option is specified, small distances are tied (which achieves clustering). LARGE ties large values(providing yet another way of achieving local monotonicity).
The ELBOW option produces an elbow (scree) diagram, computing all solutions from 1 to 8 dimensions. This option is rather time consuming. The KEEP=dim# option is used to keep the dim#th dimension as result. If it is not given, the two dimensional solution is stored.
The STRMIN=val option specifies the minimal stress to be achieved by the procedure (convergence criterion).
The SHEPARD option displays the Shepard diagram, plotting proximities (d-hat or d* for G-L, resp. KRUSKAL) versus distances. The SILENT option inhibits display of each second iteration at the terminal (which is default) (see MDS for the remaining options).
SEE ?MINISS to check whether there is a limitation (if no
message talks about limits, there are none).
TSCALE
TSCALE [vlist] [<dist>] [BMATRIX] [E=v#] [C=const] [N=nfact] [NOCOMPUTE {SIMILAR | DISSIMILAR}]Computes a Torgerson Scale (Metric MDS without iteration). The program attempts to define an euclidean space for the MATRIX stored. All the variables in the WA or the vlist, depending on the setting of the ALLVARS mode are included in the analysis. Unless the NOCOMPUTE option is present, the program automatically computes a distance matrix into MATRIX (distance, metric=2, i.e. R=2 ), otherwise the MATRIX already stored is used.
If the N=Nfact option is not present, the user is asked to specify the number of dimensions to be computed and put into C1. This configuration may then be analyzed with the C1.
<dist> are all options allowed on the DISTANCE command, i.e. R=metric and CASEWISE (see there for more details). Note that the default metric is controlled by the SET POWER command.
If the matrix has been produced by an EDA program, e.g. CORREL or DISTANCE, the SIMIL/DISSIMIL options need not be specified; the program determines itself the value of the option, otherwise SIMIL/DISSIML must be specified to determine if the analysis is done on similarity or dissimilarity measures. similarities instead of distances, SIMILAR should be specified.
BMATRIX copies the B-matrix (see reference) into MATRIX. The user is asked to specify the number of dimensions to be computed and put into C1. These dimensions are then analyzed using the C1 command.
The C option adds the constant specified to the distances stored. (Additive constant problem). DMATRIX puts the final distances (computed distances) into MATRIX.
The N= and E= option have the same meaning as on the FACTOR command.
NOTE that for standard use of the analysis commands (except for MINISSA)
this commands are not needed, i.e. the default values always call
directly the appropriate procedure (e.g. Pearsonian correlations for
FACTOR or euclidean distances for MDS or TSCALE).
In order to used one of the commands described here you will need to
specify the NOCOMPUTE option on the main analysis commands.
The main options of CORRELATE are directly available with the FACTOR
command; the same is true with MDS or TSCALE with respect to the
DISTANCE options.
BASSOCIATION
BASS [vlist] <coefficient> [CASEWISE] [DIVISION=div]Compute binary association measures into MATRIX. The command is ALLVARS mode sensitive.
Binary association measures are appropriate for dummy variables. EDA does not require that your variables have only 0-1 values, but it simply considers positive non-zero values as ones and values equal to zero or less as zeros. If DIVISION is present values smaller or equal div are considered 0 (false) and values above div 1 (true). If DIVISION is present values smaller or equal div are considered 0 (false) and values above div 1 (true).
CASEWISE computes association measures between CASES instead of variables, i.e. behaves as if the WA were transposed (see DISTANCE for more details).
For some coefficients there are situations where the coefficient is not defined; this means in terms of the formulas below that a division by zero would occur. To avoid this EDA adds a very small value, causing therefore the corresponding coefficient to become rather large.
Note also that most of the coefficients are similarity measures, however several are dissimilarities (distances). When using these coefficients as input to commands like MDS, MINISSA or VHIERARCHY you need not worry about the problem; for other uses however you need possibly a transformation.
1 ACP S Average conditional probability of a 1-1 match 2 COS S cosine between vectors (Ochiai) 3 ACM S Average cond probability match also called Sokal and Sneath 4 4 GAC S geometric average of cosines 5 PH2 S Phi squared 6 LAM S Goodman and Kruskals Lambda 7 YYU S Yules Y 8 RUS S Russel and Raos prog. 1-1M 9 AND S Anderberg's D 10 SMA S simple match-probability 11 JAC S JAccard coefficient coefficient 12 ROG S Rogers and Tanimoto 13 DIC S Dice coefficient 14 MDI D metric distance 15 SS1 S Sokal&Sneath 1 16 SS2 S Sokal&Sneath 2 17 SS3 S Sokal&Sneath 3 18 SS5 S Sokal&Sneath 5 19 KU1 S Kulczynski 1 20 KU2 S Kulczynski 2 21 HAM S HAMMAN 22 QYU S Yule's Q 23 PHI S 24 BEU D Binary Euclidean 25 SBE D Squared BEU 26 SIZ D size difference 27 PAT D pattern difference 28 BSH D Binary shape 29 DIS D Dispersion similarity 30 VAR D Variance dissimilarity 31 BLW D Binary Lance & and Williamns non-metric dissimilarityD=distance or dissimilarity S=similarity
FormulasY 1 0 ----------- X 1 | a b 0 | c d n=a+b+c+d
1 ACP (a/(a+b)+a/(a+c))/2 2 COS sqrt(a*a/((a+b)(a+c)) 3 ACM (a/(a+b)+a/(a+c)+d/(b+d)+d/(c+d)/4. 4 GAC sqrt(a*a*d*d/((a+b)(a+c)(b+d)(c+d)) 5 PH2 (ad-b.c.)**2/((a+b)(c+d)(a+c)(b+d)) 6 LAM t1=max(a,b)+max(c,d)+max(a,c)+max(b,d) t2=max(a+c,b+d)+max(a+b,c+d) LAM= (t1-t2) / 2*(a+b+c+d)-t2 7 YYU (sqrt(ad)-sqrt(b.c.))/(sqrt(ad)+sqrt(b.c.)) 8 RUS a/n 9 AND (t1+t2) / 2*(a+b+c+d) ; t1,t2 see LAM 10 SMA (a+d)/n 11 JAC a/(a+b+c) 12 ROG (a+d)/(a+d)+2(b+c) 13 DIC 2a/(2a+b+c) 14 MDI (b+c)/n 15 SS1 ( 2(a+d))/2*(a+d)+b+c 16 SS2 a / (a+2(b+c) 17 SS3 (a+d) / (b+c) 18 SS5 ad/ sqrt((a+b)(a+c)(b+d)(c+d) 19 KU1 a / (b+c) 20 KU2 (a / (a+b) + a/(a+c) / 2 21 HAM ((a+d) - (b+c)) / (a+b+c+d) 22 QYU (ad-b.c.)/(ad+b.c.) 23 PHI (ad-b.c.)/sqrt((a+b)(a+c)(b+d)(c+d)) 24 BEU sqrt(b+c) 25 SBE b+c 26 SIZ (b-c)**2 / (a+b+c+d)**2 27 PAT b.c./(a+b+c+d)**2 28 BSH ((a+b+c+d)(b+c)-(b-c)**2)/(a+b+c+d)**2 29 DIS (ad-b.c.) /(a+b+c+d)**2 30 VAR (b+c)/ 4*(a+b+c+d) 31 BLW (b+c)/( 2a + b + c)
CORRELATE [vl] [VARCO] [CENTER] [RANK {NOROB}] [JACKNIFE {DIAG_ONLY{=diff}} {GRP=ng}]Computes a correlation MATRIX between all the variables in the WA or the vlist depending on the setting of the ALLVARS mode and stores it as MATRIX, where it is retrieved by various multivariate analyses. For FACTOR this command is not necessary, the same options may be specified on the FACT command.
CENTER uses the center estimate stored with each variable as estimator of location instead of the mean. this is usually the median, but can be replaced by the estimator or another estimate (Mosteller & Tukey 1977).
VARCOV stores the variance/covariance matrix, instead of the Pearson-correlation matrix.
The RANK option transforms the data into ranks before computing the correlations the ranks are then transformed using an inverse gaussian distribution to obtain robust estimates:
y (k) = inv(F) (k/n+1)
If NOROBUST is present this last transformation is suppressed.
The JACKNIFE option without the DIAG_ONLY option computes the jacknifed Pearsonian correlation coefficient into MATRIX. The DIAGNOSTIC_ONLY option computes the coefficient as usual but computes the jacknived values for diagnostic purposes. The program then displays correlations where the difference between the ordinary coefficient and the jacknived version is larger than 0.05 (default value) or the value set by DIAG=val.
The idea of the jacknife is to omit a case from the computation (here) of the correlation coefficient and repeat this with all cases; then a more robust correlation coefficient is computed from the series of coefficients (pseudo-values).
In practice it is often desirable, especially when then n is rather large to omit more than one case for each calculation, therefore reducing the number of pseudo values to compute. EDA does this under control of the groups option (defaults to n/4). This means in an example with 24 cases and grps=4 that 4 different calculations should be done (4 pseudo values). In order to form 4 separate sets of cases n/4, i.e. 6 cases will be omitted each time, therefore computing the first pseudo-value without cases 1-6, the second without 7-12, the third without 13-18 and finally without 19-24.
DISTANCE [vlist] [R=metric] [CASEWISE]Computes a distance matrix into the MATRIX from all variables in the WA or the vlist depending on the setting of the ALLVARS mode. This matrix may then be used by multidimensional or cluster analyses (NOCOMPUTE options).
Note that in most cases you will not use the DISTANCE command in order to produce a distance matrix, as the various commands working on distance matrices do it automatically and offer the same options as the DISTANCE command.
CONFIGURATION <option> [SCORES | SECOND] C1 <option> same as CONFIGURATION C2 <option> same as CONFIGURATION SCORES<option> | INFO or ? | [d1,d2] PLOT [XVAR][YVAR] [GVAR | TIES] <plot-opt> | LIST [NOSORT | {ASC|DESC} {KEY=dim#}] | [LIMIT=cut | LIMIT=(min,max)] [SHORT | STAT] | LIST CODED [NOSORT | {ASC|DESC} {KEY=dim#}] | [d] SHOW [CODED {Divide=val} [STAT] [NCOORD=n] | [d1,d2] DPLOT [ALPHA | CONFIG] <plot-opt> | HIGHLIGHT [CUT=cut-val] | [dlist] PROFIL [SELECT] [UNIT] [UNIT=val] | LOAD [TIE=tie#] [AT=var#] [NOMODCASID] | vlist STORE ["nam"] [CASLABEL | VARLABEL] | DROP | SETDIM=(ncoord,ndim) ["name"] | EXCHANGE | REMOVE [DIM=ndim] | CASLABEL | VARLABEL
C1 ROTATE [VARIMAX] | QUARTIMAX [NOSCORES | COEFFICIENTS]
<plot-opt> [NUMBER] [LABELS] [FULL] [BIG] [NOLEGEND] [DETAILS {OVERPRINT}]
This command is used to manipulate the two configuration matrices generated by various multidimensional techniques: C1 and C2. Multidimensional methods typically produce some coordinates in a defined space (loadings and scores in the case of the FACTOR command).
CONFIGURATION is a synonym for C1 and CONFIG SECOND and CONFIG SCORES are synonyms for C2. The synonyms have been introduced to clarify terminology in the context of analysis with two different configurations.
C1 and C2 are the matrices where these coordinates (factor loadings and scores) are placed by commands like FACTOR, ANACOR etc. The C1 and C2 commands are used to list, plot and manipulate these result matrices. Other specialized commands like CFIT or CFIX take up the information in the matrices to produce additional results.
Whenever there is space on the screen (and always in the print file) the full (or the beginning of it) descriptor is shown, as well as the sequence number of the variable/case shown (facilitating identification on a PLOT). Descriptors will only be shown if the corresponding variable exists in the WA since the C1/C2 only keep labels. [Beware: If several variables in the WA have the same name (possible, but not recommended) the descriptor will be that of the first variable found in the WA]. Labels preceded by a + sign, have been added to a configuration, i.e. will not enter into the computation of communalites and other summary information.
By default the configuration is sorted on the first dimension. The sort order depends upon the setting of the SET SORT switch (default ascending).
The DESCENDING or ASCENDING options are used to override the default sort order; the KEY option asks for sorting on dimension #dim instead on the first dimension. The NOSORT option inhibits automatic sorting on the first dimension.
Values smaller than abs(0.5) are not shown on the display and appear as a blank field. The user may change this. LIMIT=(min,max) shows only values outside the interval min <-> max. The Limit=cut form shows only values larger than abs(cut).
The CODE option displays + and - symbols for intervals on the coordinates. For example in the case of factor loadings (i.e. correlations), the range of abs(0 to 1) is divided into 1/D (D defaults to 8 segments and for each segment a +, resp. a - symbol are displayed. The LIST on the print file includes also variable descriptors, if variables in C1/C2 correspond to variables in the WA. Explanation ratios and communalities are not shown with CODED.
If nothing else is specified case-wise data (e.g. factor scores), which are usually stored in C2 are plotted using the (case-) identifiers, variable oriented data (e.g. factor loadings), which are normally stored in C1 are numbered, and a legend is displayed on the left side of the plot. These default plotting modes may be modified: the NUMBER option plots case oriented data with sequence number, instead of case identifiers, and LABELS uses the variable oriented C1 labels instead of numbers.
Besides the symbols requested, several other symbols might be shown. For positions where more than one point should be plotted two different overprint symbols are used. ($ for variables and the @ symbol for observations. Note that this symbols might be different in your EDA version, and you maz change it yourself using the SET GRAPH command). If no information exists, as this might be with a case no member of a group (GVAR PLOT) then a ? will be shown. Finally numbers (Gvars, ties) not fitting into the allocated positions (say a 120 into a two character position) a # symbol will appear.
The BIG option turns the terminal display of the PLOT off and produces a plot having the size of a printer page to the print file.
FULL uses four character labels instead of 2 character labels. Note that 4 character labels are always used, whenever the width of the plot area is larger than 100 positions, i.e normally the print file, but not the screen. However more and more screens have those capabilities and many versions of EDA take advantage of that fact. When using 2 character labels, it might be useful to alter casids and/or labels. This can be done with the CASID command and/or the ELABEL command within the EDITor. If other display forms are desired, load the dimensions into the WA and use the PLOT commands.
PLOT NOLEGEND : suppresses the legend shown to the right of the plot area. (This legend only appears with plots using numbers referring to variables).
GVAR causes (for case-related configurations) the group memberships to be displayed, instead of the case identifications. Note that they are limited to two characters, any membership greater than 99 is displayed as a '#'; an undefined case (group 0) is shown as '?'.
The TIES option does the same for variable related configurations, but plots table ties, instead of group memberships.
The XVAR, resp. YVAR are used to tell the program that the X variable (d1), resp. de Y-variable (d2) are not to be taken in the corresponding configuration, but are to be interpreted as variables in the WA. Note that C2 1 2 PLOT XVAR YVAR in fact plots variable 1 against variable 2, as PLOT 1,2 CASID (a difference occurs because of different treatment of the origin) and small differences in the scaling algorithm (namely that the origin is always represented.
PLOT DETAILS: produces a list of all points and their coordinates to the print file (requires active print file): the list contains the label shown, the symbol actually plotted, the coordinates (true x,y values, as well as device coordinates (rows and columns).
PLOT DETAILS OVERPRINT produces the same list, but only for points no shown completely (overprinted).
For an explanation of the <opt>ions refer to C1 PLOT. The options have the same meaning.
Consider a situation where you want to compare results from a principal components analysis and the TSCALE procedure, i.e. you want the results from TSCALE in C1 and the results (loadings) from FACTOR in C2:
>FACTOR NDIM=4 >CONFIG EXCHANGE >TSCALE NDIM=4 >CFIT KAISERFACTOR produces a C1 and C2. CONFIG EXCHANGE copies C1 into C2 and vice-versa. TSCALE produces a new C1, overwriting the current contents. CFIT then compares the contents of C1 (TSCALE results) and C2 (factor loadings).
If a C2 matrix, related to C1 is present (factor scores) the scores are recomputed automatically, unless the NOSCORES option is present. You may also specify COEFFIECIENTS if you want the factor scores coefficients displayed.
Note that C1 ROTATE does the same as the ROTATE command followed by the SCORES command. Use this form in regular situations, i.e. working with principal components or similar when a ROTATION should be followed by a recomputation of the factor scores.
SET EXPRESS DISPLAY LET K2 LET C3The first command just says that you wish the result of expressions displayed (used for vector valued results); if you don't do that EDA will tell that vector results are not displayed. The next two lines are expression examples (where the target part is missing and just one term is used ... and many more could be used...). C and K designate the C1, resp. C2 configuration. C3 refers to the third dimension in the first configuration, K2 refers the second dimension of the C2 configuration.
Note that the Cn/Kn reference is a short form of the matrix references C[] and K[]. C[1,1] means first row (variable or case) and first dimension. Omitting one of the indices designates a complete row or column. Therefore K[1,] will display the values of the first case on all dimensions; C[,1] shows all variables on the first dimension. References to dimensions, e.g. C[,1], K[,4] may be written as C1 and K4. (P.S. Please distinguish clearly between the C1 and C2 configurations (e.g. matrix concepts) and - written in an expression context - C1, C2 meaning the first and the second dimension of the FIRST configuration i.e. C1; K1 means the first dimension of the second configuration, i.e. C2. See the section on expressions for additional details.
ROTATE | {VARIMAX} | QUARTIMAXRotates the stored C1 (FACTOR, ANACOR, CONFIG) using the VARIMAX, resp. the quartimax criterion to achieve simple structure. Note that the C2 area is not modified. Use the SCORES command if you wish to recompute the factor scores.
As an alternative you might want to use the C1 ROTATE command
performing, by default, a rotation followed by an automatic
recomputation of the factor scores.
SCORES
SCORES [COEFFICIENTS]Computes factor scores into C2. This command is used to recompute factor scores after ROTAtion unrotated scores are directly available. SCORES requires that C1 and MATRIX be stored and the data related to them are in the WA, otherwise scores cannot be computed. The scores are computed using:
(inverse R) * S
where R is the correlation matrix and S the factor structure matrix. If the matrices have been produced by principal components, the following formula applies:
inverse S (S'S)The COEFFICIENT option displays the factor score coefficient matrix.
Compare with C1 ROTATE.
MATRIX |{INFO or ?}The MATRIX command manipulates MATRIX (see chapter introduction for more information). Options are offered to inspect and manage the MATRIX.MATRIX | SUMMARY [CUTOFF=val] [NOADJUST] | LIST <opt> | LIST CODED ["altsym"] <opt> | LIST VALUE=val [FUZZ=val] <opt> | LIST VALUE=(low,high) [<opt>
<opt> [SORT{=sortvar#} {ASCENDING | DESCENDING} ] [NOROUND] [CUTOFF=val] [NOADJUST] [WIDTH=nchars]
MATRIX | DROP | LOAD [TIE=list#] | vlist STORE | CHECK | SET | [DIS]SIMILARITY | | SETDIMENSION=ndim | | "name" | elis SUBSET [NOLAB] | DIAGONAL{=const} | SET [DIS]SIMILARITY | EIGENANALYSIS [NORMAL] [EIGVAL{=var#}] | SYMMETRIZE LOWHALF | UPHALF
Without option (default) the MATRIX command displays the information attached to the current MATRIX: number of variables, number of cases on which the matrix is based (if available) and descriptive information on the contents of MATRIX. description.
The first form (default) displays a matrix showing the size of each coefficient by a single digit 1,2,3,4 .. 9 (0 is blank) and a star (*) for "10" in the lower triangle of the matrix. If the coefficients in the matrix are correlations, i.e. having an absolute value between 0 and 1, 2 corresponds to a correlation of 0.2, 9 of 0.9 and a star of 1.0. If the coefficients are not normalized the result is slightly different: the range of the coefficients (determined from the current MATRIX) is divided into ten intervals and the same digit-symbols are used to show the size of the coefficient.
The digit-symbol is obtained by rounding the current coefficient. Therefore a value of 0.04 will be considered and 0.051 as 0.1. If you prefer truncation to rounding specify NOROUND. The cutoff options is described above.
MATRIX LIST CODED: represents in the lower triangle of the displayed matrix a symbol for each interval of the coefficient. Default is to use 5 intervals, where the first interval is represented by a space character and the other 4 using the standard EDA symbols representing growing "density". Again only the absolute value is considered. Alternatively the user may specify other symbols using "alt_symbols". If e.g. specifying "ABCDFG" for instance (6 symbols) seven intervals will be used (first interval = space) and the other intervals will be shown as A, B etc.
The NOROUND and CUTOFF options are the same as explained below, except that cutoff here defines the "first" interval represented by a space.
MATRIX LIST VALUE=val is used to mark a specified value in the displayed matrix (absolute value). The value marked is within the precision limits defined by the EDA fuzz value (See SET FUZZ or SHOW), i.e. the range in with val is matched is val-fuzz to val+fuzz. The FUZZ=val option may be used to specify a different fuzz value, that the current value.
MATRIX LIST VALUE=(low,high) marks all coefficients in the specified range (absolute values). If the interval limits lie within the cutoff range, the values within the cutoff limit do not appear. The symbol used is the standard mark symbol.
Besides the already mentionned common options (NOROUND, CUTOFF) the LIST options share other options.
SORT=varnum# sort the data matrix on varnum#, instead of showing the matrix in natural order, i.e. the first column of the MATRIX will contain varnum# as the first row, followed by the other variables in ascending or descending order. The ASCENDING and DESCENDING options are used to select a different sort order than that specified by the current SET SORTORDER setting.
WIDTH=nchar controls the number of letters from the matrix labels shown vertically. Matrix labels are 8 letters long and used therefore a lot of vertical space on the screen when shown fully, this option (defaults to 4) lets you optimize the screen display for your purpose.
NOADJUST: EDA guesses whether the coefficients lie in a range of -1 through 1 or 0 through 1 by checking all coefficients: if all coefficients are smaller than or equal to 1 then if all values are positive 0 to 1 is assumed, if values are in the range -1 to +1 this range is assumed. This is ok for many situations; however if values are e.g. all smaller than 0.2 the result might not be what you want. The NOADJUST option inhibits these "assumptions" and takes the true minimum and maximum as a basis for scaling.
CUTOFF=val: All coefficients smaller than abs(val) are considered as zero, i.e. are counted in the first column.
A note on precision: the allocation of the coefficients to each interval is determined by rounding, e.g. a value of 0.04 will be considered as zero (shown in the first column) and a value of 0.06 will be shown in the first interval.
If the theoretical range of the coefficient is not known and cannot be determined, the true maximal coefficient is searched and the intervals determined accordingly. Coefficients are supposed starting at 0 (absolute value).
See below the section on expressions for other ways of modifying the contents of MATRIX.
SET {DIS}SIMILARITY set the similarity or dissimilarity attribute for the currently stored matrix. Some multivariate techniques need to know whether the coefficients are similarities or dissimilarities. Usually this attribute is set by the creation command or may be specified upon calling the analysis command (e.g. MDS). The SET command is useful to avoid unnecessary typing of options.
"name" set the MATRIX descriptor to "name".
Let the WA contain 3 variables having values:
v#1 1 2 3 v#2 4 5 6 v#3 7 8 9Store the data with a MATRIX STORE; then using the LET command:
>SET EXPRESS DISPLAY >LET M[,1]Note that SET EXPRESS DISPLAY is required in order to display the result of a non scalar computation; if you are in default mode an error message tells you that vector results may not be displayed (only assigned to some variable).will display 1 4 7
>LET M[1,] 1 2 3
As a second example, let us consider a problem where you want to make sure that MATRIX contains only positive values, i.e. taking the absolute value of all elements.
The command LET M[3,2]=ABS(M[3,2]) takes the absolute for one element (3rd row, 2nd column). In order to perform the same operation for all elements you need to write the following command:
>EXECUTE FOR R END=$MDIM "LET M[R,C]=ABS(M[R,C]) \ FOR C END=$MDIM"LET M[R,C]=ABS(M[R,C]) is the command to be executed repeatedly, R being an index to the current row and C to the current column. EXECUTE controls the row loop, starting at row 1 (default, no need to specify it) and terminating after $MDIM iterations. $MDIM is a system constant, indicating the size of the current MATRIX. EXECUTE executes the command found in between the quotes, i.e. the LET command, controlled by a second loop, taking variable C from 1 to $MDIM.
CFIT <method> FIT RESIDFits (compares) two configurations stored in C1 and C2 respectively (target rotation).
The following methods are available:
SCHONEMANN&CARROL KAISER AHAMVAARA&RUMMEL [FULL]
Summary statistics and the transformation matrix are displayed on the terminal. Optionally the FITted matrix can be placed in C2 and the RESIdual matrix into C1.
The third technique (AHAM) has a third optional option FULL producing additional matrices into the print file (print file should be open, otherwise FULL is ignored).
CFIX [RESID] [CONFIG]Fits the configuration in C1 to the stored MATRIX. This command is useful for imposing dimensional structures and for factor comparisons. It requires that a MATRIX and a C1 be stored.
This command permits to describe, using hypothesis vectors, the sequential arbitrary factors to be approximated by orthogonal factors. This extractopm of arbitrary factors permits the researcher to specify the loadings (s)he would like to have on each factor. This prescription is respected as far as possible, subject to the requirement that each factor be orthogonal to the preceeding factor.
The RESID option copies the residual matrix into MATRIX. (R-Matrix) As one of the uses of CFIX is often to remove control factors, this residual matrix may be used as input to a principal component analysis. (FACTOR NOCOMPUTE) would take up this residual matrix and perform a principal component analysis on it.
CONFIG copies the resulting factor pattern into C2.
If the CONFIG option is present the display of the factor pattern matrix is suppressed (use the C2 command).