Let us start with a first example.
>GET DEMO >BOXPLOT 1 >BOXPLOT 1-4 PARALLELThe > symbol IS NOT part of the command, but represents the prompt symbol you will see on the screen.It will be used in all examples to show that the line is an actual EDA command and to show exactly where a command starts.
GET DEMO reads an EDA dataset into the current work area (WA). EDA data sets are often simply called work areas (WA for short). GET is a command. A command starts always in column 1 of a command line. DEMO is the name of the dataset to read. Here upper and lower case can be used indifferently. The space is required to separate the command from the name (at least one space).
BOXPLOT 1: asks for a boxplot of the first variable. BOXPLOT is the command. Only the first four (sometimes fewer) letters are meaningful; we could have written e.g. BOXP 1. The second field, i.e. after the space, is the variable list (vlist); here the list is a single variable number. But on the next command a full list is specified. In EDA you may use variable numbers, i.e. 1 or variable names. The numbers indicate the position of the variable in the current work area (positional numbers).
BOXPLOT 1-5 PARALLEL is similar, except that there is an option on the command line. Options are specified after the variable list, separated by a blank. Variable lists are often absent; this means the same list as on the previous command will be used. Then the option follows directly the command separated of course with a blank.
>LIST 1-5 >LIST SORTED DESCENDING KEY=4 DECIMALS=0The first command produces a numerical list of variables 1 through 5. The second command works on the same variables. The variable list is omitted, therefore the same list applies. There are two options SORTED and DESCENDING and two more containing an equal sign. SORTED is a simple keyword option. A keyword can always be abbreviated (at most four letters, to distinguish the option from other options of the same command). DECIMALS=0 is a different kind of option, called named values. Here it means that we do not want to see decimals (hence 0 decimals). Instead of writing DECIMALS=0 we could have written D=0, i.e. the first letter and the equal sign. This is always the case for these options; only the first letter is meaningful.
These few concepts will allow you to start exploring the EDA program. Note this basic syntax holds for most commands, except for commands using logical or algebraic expressions, as well commands available in special modules like TED (the EDA text editor) or TOOLBOX which have their specific syntax. However most of the concepts apply. On normal EDA commands there are other kinds of options and the syntax of named values can be more complex (in particular you can use names of cases and variables).
EDA commands are written on a single line, called a command line. While it is possible to specify several commands on the same line (separating them with a semi-colon); a command line is limited to 80 characters. No continuation lines are possible. There is no difference between lower and upper case letters, except in strings, case identifiers and variable names.
[<command>] [<vlist>] [<options>] [!<comment>]<command> is the command name; <vlist> a variable list and <options> a list of one or more options of the <command>. All fields are explained below in detail.
All four fields may be optional (repeat feature), but whenever a field is present it should appear in order, e.g. the variable list <vlist> is always found as the second field of a command. The separator between fields is the blank character. At least one blank character is needed to separate different fields and items.
The <command> field always starts in very first position (first position) of a command line, i.e. the first character after the prompt or, if several commands are specified on the same command line, the first character after the ; separator. A blank at the start of a command means an empty <command> field, i.e. the previous command.
The <vlist> field (always the second field of a command) starts always with a numerical character (the # is considered as numerical character in EDA). [Options always start with an alphabetical character]. If no <vlist> field is present the current <vlist> is assumed i.e. the <vlist> from a previous command.
The third field contains zero or more options. See below for the various forms options can take.
Finally the sequence <blank> ! terminates syntax analysis, the remaining part of the command is then treated as a comment.
If several commands are specified on the same command line, the <command> field starts immediately after the ; symbol. (If you insert a space after the ; command repetition will be activated.)
Whenever an error occurs, the remaining commands on a command line are discarded.
A command line, even one containing several commands, may never exceed 80 characters.
Note that the ; symbol is a special symbol which might be different in your EDA version. The multiple command separator may be changed in the profile or with the SET SPECIAL command. Type STAT SPECIAL to see what symbol is currently used.
[<command>] [<target=>]<expression> {\ [<vlist>] [<opt>] [! <c>]where only <expression> is a new definition, i.e. all fields after the \ character correspond to a normal EDA command.
The syntax in modules like EDIT, TED and the TOOLBOX differs slightly from the general syntax, however the basic concepts are them same. Details are explained with each module (see there).
A number of commands (PLOT, SMOOTH, CODE) have special "play" or "inspect" modes, i.e. highly specialized modules used to explore the details. Those modes use a simplified command language, but still the basic rules will hold.
If the first position is left blank the preceeding command is repeated (normally with a different variable list or different options). When more than one command are specified on the same command line (i.e. separated by a semicolon) you should make sure that the next <command> starts in the first character position after the semicolon, otherwise you will repeat the previous command.
Normal command lines are:
>PLOT 1,2 CASID X=10 Y=20 >*READ RAWDATA "infile" VARS=25 >Status >STATUS_CURRENTwhere <command> corresponds to PLOT, *READ and STATUS. Note that Status and STATUS_CURRENT are identical as only the first four letters are meaningful. (The > is not part of the command). The last example contains several EDA commands, i.e. BOXPLOT, STEMLEAF and DLINE. Note that there is no space between the ; and the next command.>BOXPLOT 1;STEMLEAF 2;DLINE 3
There are several instances, where <command> is not a simple command name:
<vlist> may contain the following elements:
Variables in the list are specified using either integer numbers or some other reference preceeded by the # symbol [Note that the # sign might have been replaced by a different symbol in your EDA installation; refer to local documentation for details, or use the STAT SPECIAL command to find out what symbol is currently in use.
<v2> must be larger than <v1>. If this is not the case the range specification will simply be ignored (beware!): e.g. a range of 6-3 will result in an empty variable list.
A <vlist> defines in fact the current variable list, which will remains in use not only for the current command but also for the following commands util a new <vlist> is specified on a command line, then the new list becomes the current variable list. (See for exceptions below).
With multivariate commands the absence of a <vlist> is treated depending upon the setting of the ALLVARS switch (see ALLVARS, and multivariate commands for details). If ALLVARS is ON and a no <vlist> is present EDA builds a new variable list containing all variables and this list becomes the current variable list.
There are a number of specific situations, where the current variable list is used in a different way, especially with the commands used to build a new variable list, like the DS/DESCRIBE/VARS commands. These commands offer options to build a <vlist> using search criteria (e.g. in labels) or statistical criteria.
For obvious reasons some commands like the KEEP and DELETE (variable deletion) commands do not accept an implied variable list, i.e. no <vlist> field on the current command.
(1) In the case of the designation of a predefined list (ties) by a scalar variable instead of the integer number: #A refers to the letter variable A (i.e. a single variable), whereas the reference ##A means the list number designated by the letter variable A.
(2) There is a possible confusion between variable names and other valid elements which may be part of a <vlist>. Specifically: (a) Variable names starting with a numeric character are interpreted as list references (the use of names starting with a numeric character should be avoided) (2) single letter variable names are interpreted as letter (scalar) variables.
You best avoid these sources of problems by not using such variable names, but if you must use them you should enclose the names in quotes (only the opening quote is mandatory).
<vlist1>&<vlist2>where the & character separates the first from the second list. (cf. for example the CANONICAL command). Note that when specifying more than one list for a command requiring only one, the & is disregarded, i.e. treated like a ",", i.e. the two lists are in fact taken as a single list.
Whenever the variable list is preceded by a + sign, the variables following the sign will be added to the current list. IF a - sign preceeds the list, the variables following it are removed from the current list.
This way of editing the current list comes in handy when you are exploring multivariate command and you would like to check the consequence of adding or removing one or several variables, e.g. in the case of a multiple regression you might ask whether the removal of a particular variable changes the coefficients. Then you might issue the following sequence:
>REGRESS 1-7 YVAR=8 >REGRESS 1-7 YVAR=8 >REGRESS -4 YVAR=8 > -4(The left and right example are identical, the second shows that you may, if you want, save quite a lot of typing).
Let us consider that you would like to know, after looking at the results of the first command line, how the regression changes without variable 4, i.e. you would like the same regression, but without that variable.
This can be done with the second command line: The first command line defines a variable list containing variables 1 through 7 (the current variable list). As the second variable list (-4) starts with a minus sign, EDA keeps the current list, but removes variable 4 from it.
You should clearly understand the difference (second command line) between a REGRESS -4 as above or a REGRESS 4 (without the minus sign); in the second case a new current list is defined, containing only variable 4.
When adding variables no checks are performed whether you add variables already in the current list (except commands like the REGRESS command, which performs these checks in any case). If you want this kind of verification you will need to use the UNIQUE option on the VARS/DS/DESCRIBE commands. When removing variables (minus sign) you may specify non-existing variables, i.e. EDA only checks whether a variable occurs in the current list and removes it. No other checks are performed. If variables are effectively removed, a message tells you how many variables have been removed.
The normal EDA limitations apply, i.e. you may never specify more than NVAR variables, i.e. the current variable list and the list to be added may not produce a new list exceeding NVAR variables.
Check the VARS/DS/DESCRIBE commands information on how adding and removing variables work with these particular commands (i.e. commands creating a new variable list).
>VARS 1-20 SORT RANGE >LISTThe first line creates a new variable list by sorting the variables 1 through 20 by the range of each variable. The LIST command (as no vlist is present) then picks up that new list to produce the desired numerical list.
1,2,3,4 list of integers 1 - 10 range -10 the same #1 all variables tied to list 1 (members of list #1) #VAR77 variable name #VAR* all variables starting with VAR followed by any character. #A,#B variable numbers as defined by the letter variables A and B. #'A,#'B' variable names, quotes are needed to avoid interpretation as letter vars. 10,#1,#RAD any mixture is allowed #0 or #* all variables in the WA #$NVAR using a system constant
Options are always found in the third position on a command line, as none of the preceeding are mandatory (command names may be omitted if the same command is to be issued; variable lists may be absent if you intend to use the current list).
form-1 : keyword
A keyword is a string of any length, where only - at most 4 - the first characters are meaningful. In this manual long keywords are used to show the meaning of the keyword explicitely.
form-3: Named values: <a>=<val>
<a> is an alphanumeric string of any length, where only the first letter is meaningful, and <val> a number, case identification, defined (letter) variable, constant reference, a variable name preceeded by the # sign or substitution value, i.e. a simple expression. No blank characters are allowed; the option is identified by the equal sign. Lower case <a> is treated as upper case, except in names.
In its simplest form <val> is a number. In place of a simple number a valid <simple expression> may be specified: a defined variable (letter variable), a constant reference, a case identifier and/or any operation allowed in <simple expressions>.
Where a case reference is needed, the corresponding alphanumeric case-id may be used instead of the sequential number of the case. If the casid starts with a numeric character (e.g. with the default casids), the case-id should be included in ' ' to indicate that an alphanumeric item follows and not a (sequence) number (which might be different after filtering or sorting). In fact the first ' symbol is sufficient to signify that the the succeeding item is to be taken as alphanumeric and not as numeric.
The same problem arises when casids are single letters, then a confusion arises with letter variablesr. In all these cases, if you wish to specify the case identifier, you should enclose it in quotes. Examples are: LIST CASE=BE (same as LIST C='BE'), LIST C=1, LIST C='56'. Note also that blanks are not allowed, even within quotes, i.e. C='56 ' is not correct, as the blank terminates the C= option (in fact the second ' will be analyzed as an additional option.
Note that for names (case identifiers, variable names) upper and lower case are different]
A $ sign after = invokes the substitution of a system constant.
The '=' sign may also be followed by a simple expression (see the exact definition in the "glossary". In fact the preceeding items are a special case of the simple expression, where no arithmetic operation is involved (only one argument). Compare also to $A$: substitution of string variable A$.
<val> may be also a variable name; it then is preceeded by the # sign, in fact it simply returns the index of the variable given. This feature is useful whenever you need to specify a variable number as option and wish to give its name instead, e.g. when specifying a target variable for the copy of residuals with the regression programs.
There is a second form of this option type, allowing for the specification of more than one value for a specific option. Then the format is as follows:
<a>=(<val1>,<val2>,<val3>....,<valn>)
where the <val>s follow the same definition as above; they are included in a set of parentheses and separated from each other by a comma. Note again that blanks are not permitted within a option. The maximal number of <val>s depends upon an implementation parameter (MAXC, usually 8).
form-3: "<names>"
Namestrings are character strings of up to 60 characters enclosed in " (double quotes). (The DEFMAC command allows a string of up to 80 characters). They are used, e.g. to specify file names for input/output commands. If the closing " is missing, the remainder of the command line is treated as a string (and possibly truncated if the resulting string exceeds 60 characters). Upper and lower case letters are considered different within strings. Only one option of this type may be present on a command line. To specify a " within a string, enter two consecutive "".
All forms of options may be present on a command line in any order. Options are separated by blanks. If the same command is used again (i.e. the previous command is the same), without options, the same otions as on the preceding command will be used. Options not modified remain in use. Note that any other command intervening between two invocations of the same command will disable this). For some commands this feature is disabled to ensure correct operation or usually unwanted results. In this case the user manual states explicitly that options have to be re-specified and usually this is also clear from the context and the function of the command. E.g. it is unlikely to read in a file and just on the next command read in the same file again.
To restore the default value of a type-2 option <a>= (or <a>=$ to maintain compatiblity with older versions) is used. For keyword options the default values can be restored, using any character string, which is not a valid option for the particular command. Options not used by a command are not detected, i.e. options conforming to the syntactical rules, but not used by the command are not diagnosed.
The option field is optional for most commands, except where noted. Repeating the same command uses the same options as on the previous command.
Some examples:<form-1> ALL NOCONSTANT N varimax <form-2> : C=3 F=0.03 Q=-0.023 C=ZH case=ZH Case_I_want_to_list=ZH (all these mean exacly the same) Nvars=B CASE=B-12 CASE=$MCAS LIMIT=$MIN.VAR10 <form-3> I "myfile"
There are two categories of special characters. A first category are symbols having only a special meaning when used in the first character position of a command; a second category has a special meaning in all locations.
********************************************************** * IMPORTANT * **********************************************************Check the local EDA documentation for possible differences in special symbols; on some systems they might be unavailable for EDA, because the system makes special use of that character, e.g. on Multics # and \ are used for control of keyboard input.
Note also that the SET SPECIAL command may be used to change the most important special characters to fit your needs, tastes or preferences. The manuel will always use the default characters; therefore you will have to be careful when changing these characters.
Special symbols may also be set from the users profile. Check the special symbols using the STAT SPECIAL command, when you are not sure what symbols are used.
Within expressions other special symbols are used with their traditional meaning +,-,*,/,^(exponentiation), & (and), | (or), ~ (not), > and <, as well a % (percent operation). Note that the * (wildcards) and / (immediate mode for editor) or ) (same for the toolbox) symbols are used in a different context; therefore no confusion is possible.
(1) If column 1 is left blank the same command as the previous command is assumed. (2) if the variable list is not altered from one command to the next the same variable list is used (if this is appropriate for the next command) (3) if the same command is used (by leaving column 1 blank or by retyping the same command) the options set by the first command remain valid, unless explicitly altered (see below).
Regarding (1): The PRINT command does not change the current command.
Regarding (3): This is true for most analysis commands, however this is not extended to commands where this makes no real sense (e.g. reading twice the same file with exactly the same options). The rules applied should be clear from the context (hopefully). There is a systematic exception from this: if a command terminates in error the previous options are cleared.
Furthermore there is an SET OPTIONS switch you might use to turn the automatic repetition of options off.
WA Work area; variables used in the analysis must reside in the WA. MATRIX a matrix stored with the WA (e.g. correlation or distance matrix) CONFIG configuration C1 Configuration 1 (usually variable space) C2 Configuration 2 (usually observation space) TIE group of variables (bundle) CASID case identification. GVAR Grouping variable vlist variable list. var# v# variable number or variable reference. v1 first variable in list cas# c# case number or case reference g# group number (defined by GVAR) d#,dim# dimension number, dimension ref. (CONFIG) l#,list# List reference (ties) MCAS Maximum of cases allowed in a specific EDA implementation. NVAR Maximum of variables allowed in a specific implementation "name" name string enclosed in " (any character string) maximum 60 characters "filename" a valid external file name "str" a character string <nn> Error number. #1 variable bundle number 1 (#1=tie) A..Z scalar (letter) variable #A defined variable A used on vlist #VAR01 variable reference using a variable label format a standard fortran format enclosed in () I/O Input Output (commands) Dec=0 Specifies a named value CASE=UK A named value referring to a case id $NVAR system constant reference $0 Result variable (ResVar) reference A$ String variable (A$..Z$) $A$ String variable substitution #target target variable (store result) waname work area name referenceNames like RAWIN, EDAOUT are generic names for files, i.e. they stand for the file type. If casids are displayed, they are usually followed by the group membership of the case separated by a "/".