Transformations II: Expressions

This chapter introduces all EDA commands where logical or arithmetic expressions are used, i.e. commands where the syntax differs from the normal EDA syntax: LET/CALCULATE are used for computations, IF for conditional transformation and OUT is the user formatted output facility.

Overview: The LET, CALCULATE and IF commands.

These commands have the following syntax:

 Computations:

 LET       <arith-exp> [ \ <options>]
 CALCULATE <arith-exp> [ \ <options> ]

Conditional computations


 IF <log-exp> THEN <arith-exp> [ELSE <arith-exp>] [\ <options>

Selection commands

The selection commands are also listed below as they use expressions. However the details are found in the chapter on selections.

 INCLUDE <log-exp>
 EXCLUDE <log-exp>

 AND <log-exp>
 OR <log-exp>
 BUTNOT <log-exp>
 REMOVEIF <log-exp>
 IF <log-exp> INCLUDE   (obsolete for INCLUDE)
 IF <log-exp> EXCLUDE   (obsolete for EXCLUDE)

 IF <log-exp> DO <command-line>

 The following definitions apply:

 <arith-exp> ::= [<target>=] <expression>
 <log-exp>   ::= <expression>
 <options>   ::= [<vlist>] <option-list>

LET, CALCULATE

LET and CALCULATE perform unconditional calculations. In fact these two commands are identical, except that the LET command generates variable labels in a different manner than CALCULATE, as will be explained below. The different command names are only used for the sake of clarity in order to show that these commands may be used to generate new variables, but also as a pocket calculator.

If <target> or <expression> start with a numeric character (0..9,- or #) LET/CALC may be omitted. The command line must then start in column 1 of the command line. The same is also true if the second column contains an '=' sign, i.e. defines a letter variable. As a consequence CALC 12*24 or LET A=20+(sin*12) may be written as 12*24, and A=20+sin(*12), whereas in LET SIN*12+20 LET may not be omitted.

IF

The IF command comes in several forms: either it is used to perform conditional computations on variables or to specify a selection. The second possibility, as well as IF .. DO are explained in the following sections.

Use to performed conditional calculations IF may take the full form

IF <log-expr> THEN [<target>=]<express.> ELSE [<target>=]<express>

 or the simpler form

IF <log-expr> THEN [<target>=]<expression>

where the ELSE part is omitted. See below for the definitions.

Restriction: Note that whenever a selection is active when using IF it is turned off, before performing the IF command.

INCLUDE, EXCLUDE, AND, OR, BUTNOT, REMOVEIF

Logical expressions are also used with a number of commands used to activate selections, namely INCLUDE, EXCLUDE, AND, OR, BUTNOT and REMOVEIF.

Refer to chapter on selections for detailed information on these commands. You will however need to know how to specify logical expressions. Logical expressions are explained in detail below.

IF... DO

The IF .. DO <command> form of IF is used to execute a single EDA command on the cases satisfying the logical condition.

IF ... DO is a special selection command. Refer to the chapter on selections for more details. (You will need to know how to specify logical expressions).

Definitions

This section explains the syntactical constructs used with the transformation commands and lists all elements you may specified on any transformation command.

A very important distinction should be made first: The evaluation of an expression may either yield a scalar result or a vector result; these result types are referred to as scalar expressions and vector expressions. It is also important here to recall that EDA uses two kinds of variables (1) variables, i.e. data vectors residing in the WA or some other matrix and (2) single letter variables A..Z which are scalar.

A reference to a variable in the WA is always preceded by the # sign and may be (1) a reference number (position) (2) a variable name (3) a letter variable or a constant. ($).

Expressions

The following elements are used in expressions: (1) numbers (integer or decimal numbers, - sign) (2) variable references (3) letter variables and constants (4) operators, (5) functions and (6) matrix references. Blanks are not meaningful, they may be inserted for readability, except within names.

Note that within EDA (see glossary) often simple expressions (S-expressions) are used; these should be clearly distinguished from the expressions defined here, as simple expressions are defined much more restrictively.

Logical expressions

<logical-expression>

Logical expressions follow the same rules as arithmetic expressions. They (usually) yield 0 for false and 1 for true as the result of an expression evaluation (logical functions and operators return 0/1 results). If the result is not 1/0, all values below or equal to zero are treated as false, values greater or equal 1 as true.

Target

<target> may be a letter variable, a variable (vector) or a matrix reference. Variable references are either variable names or an integer number preceded by the # sign. Letter variables are the scalar letter variables A..Z. However you may not specify a $<const> as a target ($ items may however be part of a #reference, e.g. you might use LET #$NVAR=#myvar/100 to store some variable into the highest possible variable (NVAR is the max. number of variables)).

When specifying a variable reference on the target two situations may arise (1) either the variable does not exist (2) or it is already present in the WA. In the second case the variable is overwritten, unless the variable is protected or another security feature is turned on. A message will tell that a variable has been overwritten (unless you have asked not to show those messages (informationals).

Remember also that EDA does not require that all variables be of the same length (same number of cases).Therefore n this case where <target> is a #variable and the expression evaluates to a scalar result, the problem of determining the number of cases for the resulting #v arises. It is determined as follows: if the <options> field contains an N=ncas it is used, otherwise the N of the WA is taken if the WA is rectangular. If not the N is taken from the target v#, if v# already exists, otherwise an error occurs.

This problem occurs for instance in the following situation: Suppose that the current WA contains several variables which do not have the same number of cases; then a command like LET #20=0 does not know how many cases should be set to 0 on this variable, even if the variable does exist it is not clear if it is intended to replace it by a variable of the same length or just to place some other variable into that location. Therefore the rules on how to determine the n of cases of that variable.

If the result of the evaluation is a scalar and no <target> is specified the corresponding result is displayed. If the evaluation yields a vector result <target> should be specified, unless SET EXPRESSION is set to display or copy the result. If display or copy (default) are not set, a vector result from an expression without a target specification will issue an error message, that the target is missing.

SET EXPRESSION offers the following possibilities:

   SET EXPRESSION      no target specified
                       (and vector result)
   ------------------------------------------
   default settings    error message
   COPY                copy the result into a free location
                       in the WA (as a new variable)
   DISPLAY             display the result on the screen
   COPY DISPLAY        both actions at the same time
   DEFAULTS            reset switches to default

Note that for the COPY option the evaluation for a full IF command like

IF #pop > 10000 then #vara-100 else #varb+100

creates two new variables.

Options

The <options> section follows to the standard EDA syntax and is separated from the expression section by the backslash \. <Options> is used to specify additional control options, especially for repetitive executions.

The following options are used:

 N=ncas      determines number of cases see <target>
 F=fuzz      comparison tolerance, see NEAR()
 CHECK       see % operation

More options are used with output expressions and repetitive executions, as well as with scalar variable definitions. These options will be explained later.

Variable descriptors and names

As transformations altering variables or producing new variables often profoundly change their sense, special attention has to been paid to the documentation (labels and descriptors) of the new variables. As always EDA attempts to use all available information to create new labels and descriptors, but usually this will not be sufficient.

The default action is as follows, consider:

   LET #1=#20*#21
   CALC #1=#20*#21
   #1=#20*#21

In the first case the user is asked to enter labels and descriptors on the screen. In the second and third case (which are in fact identical, because CALC may be omitted, whenever the first character in an expression is numeric) a label "trans" will be created and the descriptor will be "CALC #1=#20*#21, i.e. the command line is encoded into the descriptor.

   LET #vara=#20*#21
   CALC #vara=#20*#21
   #vara=#20*#21

In this second case a variable name is specified, therefore EDA does not generate a label, but takes it from the command, but the descriptor is generated or asked from the user as before. Note that two cases may be present (1) if vara exists it is overwritten (unless of course if vara is protected, then an error occurs) or (2) vara does not yet exist, then it will be created on free location of the WA. It is possible that this solution does not satisfy some specific demands; therefore the SET EXPRESSION command allows for some modification of the variable information generation process.

SET EXPRESS SAME: labels and descriptors are never changed, i.e. all automatic generation is suppressed and the variables keep the information they had attached before modification. If a variable did not exist before, the default label is used.

SET EXPRESS QUERY forces EDA to ask the user for label and descriptor in all instances, whereas SET EXPRESS NOQUERY never asks the user for label and descriptor (always generate it automatically).

SET EXPRESS APPEND appends the automatically generated descriptor to the existing descriptor.

Finally SET EXPRESS DEFAULT restores the default method.

The descriptor contains a modification stamp *t* in its last three positions, unless the user supplies the descriptor or SET EXPRESSION is set to no modification, i.e. SAME.

Matrix references

An <expression> as well as <target> may contain matrix references, i.e. refer to single elements or row or columns of one of the EDA matrices, i.e. the data matrix, the MATRIX, C1 and C2, the GVAR or the TABLE definition.

They take the following forms:

n[i,j] or n[i]

where <n> is the name of the matrix and , resp. <j> are the indices.

Some very often used rows or columns can be abreviated, namely the reference to the full GVAR or to one of the dimensions found in C1 or C2. See below for additional information.

The names <n> are the following:

    D[i,j]  refers to the data matrix
            the D may be omitted, i.e. [i,j] only
    M[i,j]  MATRIX reference
    C[i,j]  refers to C1
    K[i,j]  refers to C2
    G[i]    refers to the GVAR
    S[i]    case sequence number
    T[i]    refers to the TABLE
    Z[i]    refers to CENTER

The indices may be present or absent (null). In the first case they refer to an element within the matrix (scalar result) in the second case to a row or column of the matrix. For example D[10,1] refers to the 1st case of the 10th variable; D[,1] refers to the first case of all variables. Note that the , must be present and that you may not omit both indices (matrix operations are not (yet) supported.

The indices take the form of simple expressions (without case identifier references). Additionally they may contain specific strings, namely for

Short forms

For frequently used references short forms are available, namely C1 for C[,1], K1 for K[,1], GVAR for G[].

D[i,j] or [i,j]

may be a variable name (the # is not needed by may be specified. <j> may be a case reference.

M[i,j]

both and <j> may be row and column names of the MATRIX (usually variable names).

C[i,j] and K[i,j]

refers to the variables and <j> to the dimensions. may be the name of the coordinate (variable names or case identifiers), depending upon the contents of the C1 and C2 matrix. Note that C stands for C1 and K for C2. Note also (see below) that Z stands for Center.

As references like C[,1], referring to the first dimension (all variables) are very often used, you may also specify a simple form, namely C1, referring to the first dimension of C1, K2 referring to the second dimension found in C2. Note that this syntax leeds to a notational problem which might be confusing. Namely you may use a C2 reference, meaning the second dimension in the FIRST configuration (C1). This is not a syntactical problem in this context, as single letter names are used: C for C1 and K for C2. You may specify Cn or Kn to refer to one of the dimensions, where <n> stands for the requested dimension. See the C1/C2 commands for examples, where these references might be useful.

G[i]

may also be a case identifier. Instead of specifying G[], i.e. the full GVAR, you may also use GVAR (all four letters specified). See the GVAR command for examples, where this might be useful.

S[i]

may also be a case identifier. S[i] may not be used as target. The result of S[] is different if the WA is rectangular or not. In the first case it returns a vector corresponding to the length of the variables, otherwise a variable 1..MCAS. S[i] represents only the sequence number of the cases is most useful in situations where some action on specific cases are needed and you wish to use the case identifiers, as in IF K>S[USA] THEN <do something>. Beware with non-rectangular WAs: s[] returns a MCAS long vector, therefore results are not always interesting, especially with IF's, e.g. an IF not(S[]=3) then ... would yield a MCAS-1 long vector... In case of doubt you should prefer to use either a variable generated e.g. with GENERATE or us the IDX function instead, where you are sure what is created.

T[i]

may be specified as a variable name (with or without the # sign)

Z[i]

may be specifieds as a variable name (with or without the # sign). These matrix references may be used on both sides of an expression. If they are specified on the target side the matrix MUST be defined with correct dimensions, i.e. the indices must lie within predefined limits.

Therefore if you wish to define a new GVAR with an expression, you should first declare the dimension of the GVAR with the GVAR SET=dim command. Use C1/C2 SET or MATRIX SET to declare the dimensions of the C1,C2, resp. MATRIX. Of course if there is already a matrix or GVAR you need not set its dimensions if they correspond to your purpose.

In order to define new tables the corresponding variables must exist. LET T[]=#tabdef would be used to transfer the variable #tabdef into the table definition. If the length of #tabdef does not correspond to the number of variables in the WA no message is given.

When using the [i,j] or [i,j] reference in a target expression the corresponding variables must exist (This is the only difference between [#var1,]= and #var1=). For non rectangular WA it may happen that an expression like [,cas1]= assigns a value to a position outside a specific variable; then a warning is displayed.

Active selection

As a general rule selection may be active for the LET command, but NOT for the IF command. The latter command turns selection off.

Error messages

During the evaluation of expressions errors may occur. As a general rule errors are always diagnosed an the evaluation of the expression is terminated, i.e. the program will not asks you e.g. to enter a replacement for an incorrectly specified case id, as this would be the case in a normal EDA command line (while working in interactive mode).

Due to the mechanism of evaluation (postfix expression evaluated on a stack) EDA is not always to clearly locate the error; this is especially the case for parenthesis errors and wrong number of argument errors. If parenthesis do not balance the program will tell you so; if you specify something like LET #1=SIN(#1,#2-#3) the program will only tell you "illegal use of function".

Operators and functions

The following algebraic operators are available:

+ (add) - (subtract) * (multiply) / (divide) % (percent **) ^ (exponentiation ***) - (unary) (->see below])

Logical operators:

< less than > greater than & logical AND | logical OR = equal (see below)

**) The percent operator may be used with the CHECK option causing percentages less than 0 and greater than 100 to be diagnosed, without CHECK no check is done. Small errors due to rounding are tolerated (a value greater than 100 but less than 100.9 is set to 100.00).

***) exponentiation is not defined for negative arguments with a non-integer exponent; as all numbers are taken to be real in an expression, EDA decides by itself whether a value is to be considered as integer (very small fractional part); therefore in some instances EDA might treat a real number with a very small fractional part as integer.

If a division or a percent operation encounters a zero division the result assigned is the value defined by ASSUME D=<val> (default -1). The event is reported. This feature works for vectors only, otherwise the error is reported and evaluation suspended.

Unary -1 (minus) is treated as 0-1, i.e. in some cases (*,/,%) this results in an incorrect evaluation if the negative number is not specified in parentheses.

8+-9 -> result is o.k. 8/-1 -> is treated like 8/0-1 -> zero divide -> use 8/(-1)

As for the logical operators, they are exactly the same as the functions described below. The = operator must be distinguished from the assignment operator on arithmetic expressions. Except on the logical expression part of the IF command, the first = sign is taken as assignment operator, i.e. the evaluation result has to be stored in <target>.

The table below gives the functions, which may be part of an expression. Only the first three characters are meaningful. A function name may not exceed 6 characters. <varg> means that the function requires a vector expression as argument, <sarg> requires a scalar expression and for <arg> the type of the argument(s) does not matter (however in functions like MEAN() a scalar argument does not make much sense, but the operation is permitted).

NOTE: Commands like ADDVARS, PERCENT, MIN etc may be used to simplify often used operations on the data, like adding a large number of variables or computing percentages with large numbers of variables. Therefore often when you want to perform a common operation on a number of variables you might find another EDA command doing these operations much easier.

 -----------------------------------------------------
 | Name          purpose                examples/note|
 |---------------------------------------------------|
 | SQRT(arg)     square root            SQRT(#1)     |
 | ABS(arg)      absolute value         ABS(10-B)    |
 | TRUNC(arg)    truncate               TRUNC(2.12)  |
 | ROUND(arg)    nearest whole number                |
 | LN(arg)       natural logarithm                   |
 | LOG(arg)      log base 10                         |
 | EXP(arg)      exponential e**arg                  |
 | POWER(arg,pow)exponentiation arg**pow Note 35     |
 | MOD(arg,arg1) remainder of arg/arg1               |
 | SIGN(arg1,arg2) sign transfer         Note 32     |
 |---------------------------------------------------|
 | Random numbers                                    |
 |---------------------------------------------------|
 | RAND(arg)     random numbers          Note 2 20   |
 | ROLL(sarg,arg)draw a sample of size   Note 29 20  |
 |               sarg from arg1 (pop.)               |
 | DRAW(sarg,arg)draw a sample of size   Note 29 20  |
 |               arg from arg (pop.)                 |

 |---------------------------------------------------|
 |          Trigonometric functions                  |
 |---------------------------------------------------|
 | SIN(arg)      sine (radians)         SIN(3.14/4)  |
 | COS(arg)      cosine                              |
 | TAN(arg)      tangent                             |
 | ASIN(arg)     arc sine                            |
 | ACOS(arg)     arc cosine                          |
 | ATAN(arg)     arc tangent                         |
 | SNH           hyperbolic sine                     |
 | CSH           hyperbolic cosine                   |
 | TNH           hyperbolic tangent                  |
 |---------------------------------------------------|

 | RNK(arg)      rank transformation    RNK(#RAD)    |
 | CUM(arg)      varg is cumulated                   |
 | FOLD(arg,sarg,sarg) folded           Note 3       |
 |               reexpressions                       |
 | LAG(varg,sarg)lags varg (sarg lags)  Notes 19     |

 |---------------------------------------------------|
 | MEAN(arg)     mean value             Note 1       |
 | SDV(arg)      standard deviation          1       |
 | MEDIAN(arg)   median                      1       |
 | MID(arg)      midspread                   1       |
 | MIN(arg)      minimum                Note 1,4     |
 | MAX(arg)      maximum                Note 1,4     |
 | LHI(arg)      Lower hinges           Note 1       |
 | UHI(arg)      Upper hinges           Note 1       |
 | LAD(arg)      adjacent value (low)   Note 16 1    |
 | HAD(arg)      adjacent value (high)  Note 16 1    |
 | LIF(arg)      Lower inner fence      Note 16 1    |
 | HIF(arg)      inner fence (high)     Note 16 1    |
 | LOF(arg)      outer fence (low)      Note 16 1    |
 | HOF(arg)      outer fence (high)     Note 16 1    |
 | SUM(arg)      sum                    Note 1       |
 | SSQ(arg)      sum of squares         Note 1       |
 | PERC(varg,arg)arg-th precentile           1       |
 | DIS(varg,varg) Distance              Note 27 28 1 |
 | COR(varg,varg) Pearson corr          Note 28 1    |

 |---------------------------------------------------|
 | Logical functions                                 |
 |---------------------------------------------------|

 | BOOL(arg)     sets logical value     Note 5       |
 | NOT(arg)      sets to neg. log. val  Note 5       |
 | AND(arg1,arg2)logical and            Notes 5,6    |
 | OR(arg1,arg2) or (non exclusive)     Notes 5,6    |
 | XOR(arg1,arg2)exclusive or           Notes 5,6    |
 | GT(arg1,arg2) arg1 greater than arg2 Note 6       |
 | LT(arg1,arg2) arg1 less than arg2    Note 6       |
 | EQ(arg1,arg2) arg1 equal to  arg2    Note 6       |
 | NEAR(arg1,arg2)arg1 approximately                 |
 |               equal to arg2          Notes 7      |
 | BTW(arg,sarg,sarg) interval check    Notes 22     |
 | GRP(sarg)     group membership       Note  9      |
 | GRM(varg,arg) group membership       Note  40     |
 | OUT(arg)      out-values             Notes 8,16   |
 | FAR(arg)      far-out values         Notes 8,16   |
 | EXT(arg)      far out or out values  Notes 8,16   |
 | ADJ(arg)      adjacent values        Notes 8,16   |
 | WIA(arg)      within adj. values     Notes 8,16   |
 | WHI(arg)      within hinges          Notes 8      |

 |---------------------------------------------------|
 |  Distributional functions                         |
 |---------------------------------------------------|
 | GAUS(arg)     gauss (normal)         Note 14      |
 | CHI(arg,sarg) chi-2, sarg=df         Note 14      |
 | STU(arg,sarg) student, sarg=df       Note 14      |
 | FISH(arg,s1,s2) Fisher , s1,s2=df    Note 14      |
 | GIN(arg)      gauss inverse          Note 15      |

 |---------------------------------------------------|
 | Lookup functions                                  |
 |---------------------------------------------------|
 | VNUM("s")     Variable lookup        Note 36      |
 | CNUM("s")     Case name lookup       Note 36      |
 |---------------------------------------------------|

 | Input functions                                   |
 |---------------------------------------------------|
 | QRY("s")      queries for a value    Note 10      |
 | QRA("s")      query yes or no        Note 10      |
 | QRL(sarg1,sarg2,"s") query  w/limit  Note 10      |
 | RDV("s")      reads a vector         Note 10 20   |

 |---------------------------------------------------|
 | Sorting functions                                 |
 |---------------------------------------------------|
 | UGRD(arg)     Ascending sort         Note 17      |
 | UGI(arg)      idem, return indices   Note 17      |
 | DGRD(arg)     Descending sort        Note 17      |
 | DGI(arg)      idem, return indices   Note 17      |

 |---------------------------------------------------|
 | Categorical data/groups                           |
 |---------------------------------------------------|
 | FDG(varg)     Find diff. groups      Note 38      |
 | FRQ(varg,arg) Counts arg in varg     Note 39      |
 | GRP(sarg)     -> logical functions   Note 9       |
 | GRM(sarg,arg) -> logical functions   Note 41      |
 | MEM(varg,arg) Membership indices     Note 40      |
 | MKG(varg,arg) create categories      Note 37      |

 |---------------------------------------------------|
 | Miscellaneous functions                           |
 |---------------------------------------------------|
 | CC(arg,arg)   concatenate            Note 23 20   |
 | LNG(arg)      length of arg          Note 42      |
 | SUB(varg,arg,arg) subrange           Note 24 20   |
 | SEL(varg,varg) selection             Note 33 20   |
 | SLI(varg)     selection indices      Note 34 20   |
 | TRM(arg,sarg) trim sarg% cases       Note 31 20   |
 | DRP(varg,sarg)  drop sargth element  Note 30 20   |
 |---------------------------------------------------|
 | IDX(sarg,sarg,sarg) index generation  Note 26 20  |
 | TCN(arg)      counts true values     Notes 9a,1   |
 | ALL(vnam)     include ALL values     Note 25      |
 | OTH(vnam)     incl. all other values Note 25      |
 |---------------------------------------------------|

Note that <arg> ,<varg> and <sarg> may be expressions (not <var>)

Notes:

(1) These functions produce a scalar result.

(2) <arg> is a dummy argument: if <arg> is a scalar the result is a scalar value (i.e. a constant in an expression containing # variables), if <arg> is a # reference, the result is a vector.

(3) <arg1> is p and <arg2> ceil in the formulas below. Fold reexpresses <varg> as follows:

y=x^p ^ (ceil - x)^p , if p <> 0

y=ln (x/ceil -x)) , if p = 0

if p=0, FOLD yields a logit transformation, if p=0.14 a probit transformation is closely approximated. ceil, if specified as 0 is defaulted to 100 (maximum value for percent data). Note that <varg> must be positive.

(4) MIN and MAX compute the true extreme values for <varg>. (Compare also to the $MIN and $MAX constant reference, where you may use the minimum and maximum stored with each variable, i.e. these are not affected by any selection). (5) Values of 0 or less are considered as false, positive values as true, if the argument is not already 0/1, which is usually the case in logical expressions. BOOL and NOT might be used to produce binary variables.

AND(10,20) -> 1 AND(0,20) -> 0

(6) These functions are identical to the logical operators & | < > =. (Note that the first = in an expression containing a <target> is interpreted as assignment operator, therefore the EQ function must be used in such an event).

(7) NEAR is a logical function to specify "approximately equal to". The "close to" criterion is specified by the system fuzz value (see set FUZZ). You may also specify a FUZZ=<fuzz> value in <option> field of an expression command. (Note that this fuzz setting overrides the system setting for the current command only.

(8) Return a true value if a case is an OUTlier, a FAR-outlier or and ADJacent value, otherwise the result is false. The WIA function returns true for all cases lying within the upper and lower adjacent values. WHI is true if a value is within the hinges. EXT combines FAR and OUT. Note that EXT(#var) is the same as NOT(WIA(#1)) or FAR(#var) | OUT(#var).

(9) The scalar argument is a group reference number, the result is a logical vector, where cases belonging to group <sarg> are true values.

(9a) TCN produces as scalar result containing the number of true values, i.e. the expression TCN(GRP(1)) returns the number of cases in group 1, or TCN(#1=#2) give the number of cases in #1 and #2 which are equal.

(10) The input functions are use to obtain a value or a vector(RDV) from the user. Various forms exists. QRY asks for a value, any numeric value is acceptable. QRL does the same, but the value entered by the user must lie between sarg1 and sarg2; finally QRA asks the user to enter Yes or No; a no answer (No) returns 0, a yes answer (Yes, Ok) a value of 1 (only the first letter of the answer is analyzed, i.e EDA is looking for upper or lower case Y, O or N). Finally RDV asks the user to enter a vector. Normal rules of data entry from the keyboard apply (see e.g. the NEWVAR command for additional information). "s" is the prompt string used to tell the user what to enter. if the first letter is a *, then the value is to be entered . on the same line as the prompt, otherwise on the next line. The '*' is not shown.

(14) Computes the distribution function at value <arg> (probability to be smaller than <arg>) for four distribution functions related to the normal distribution). For CHI, STUD and FISH the associated degrees of freedom(df) must be specified.

(15) <arg> must be in the range: 0.0 < arg > 1.0.

(16) The definition of the inner and outer fences depends on the setting of the SET DEFOUT options.

(17) Sorting functions: UGRD/DGRD sort varg into ascending or descending order. These functions do not change any attributes (GVAR, case identifiers etc.), i.e. only the vector argument is sorted. Therefore be very careful when using these functions, because if the result is stored into the WA as a new variable case ids and GVARs will no longer be correct for this variable. If you need to sort the WA, use the SORT command.

UGI/DGI: the result of these functions are indices to the original vector, i.e. unlike UGRD(#1) which whould return the sorted values of #1, UGI(#1) return the sorted index numbers to #1, i.e. the first value points to the smallest values.

(19) The replacement value (default -1) is used to replace the inserted cases. <sarg> indicates the number of lags, where sarg may also be negative: (lead and lag)

      #1   lag(#1,2)   lag(#1,-1)

       1      -1          2
       2      -1          3
       3       1          4
       4       2         -1

(20) These functions evaluate to an vector with a length depending upon the additional arguments. The length will usually neither be 1 nor a length corresponding to the length any other variable. Therefore the case identifiers usually do not have any meaning for these new variables and you should be very careful when using these variables with other variables in the WA, i.e you will have to make sure that e.g. when plotting two variables, that the cases are matching in both variables.

(21) asks the user to supply a single value (may be a simple expression). Input is ALWAYS from the terminal, even while in macro mode or command-file-mode; this is useful to give more flexibility to macros etc.

(22) The BTW function returns true (1) if <arg> lies in the interval <sarg1> - <sarg2>: sarg1 <= arg <= sarg2. Note that only the first argument may be a vector argument.

(23) Concatenation : two variables are combined to form a single variable. Of course the new variable may not exceed mcas. With conditional expressions results are not always correct.

(24) Subrange: SUB(#1,1,10) means that the result are cases 1 through 10 of variable 1 (subrange of cases). Note that you may also specify SUB(#1,10,1) meaning that the cases are copied in reversed order, therefore SUB(#1,NC(#1),1) means that the whole variable is reversed. With conditional expressions results may not always be correct.

(25) ALL and OTH functions. Users of these functions should clearly understand how they work and therefore read carefully this note.

These functions are only useful in connection with the IF command. Consider

      >IF EXT(#1) THEN #1=MEDI(#1)

It means that the outliers on variable #1 are replaced by the median of the outliers of this variable. But sometimes it is useful to replace these values by the overall median of #1. Then you should write:

      >IF OUT(#1) THEN #1=MEDI(ALL(#1))

Then all values are used to compute the median. (note that is equivalent to using MEDI(D[#1,]) A third possibility is to replace the outliers by the median of the non-outlying values only, i.e. the cases not satisfying the condition; this is done by:

      >IF EXT(#1) THEN #1=MEAN(OTH(#1))

It is important to understand that ALL and OTH should only be used with a single argument, i.e. a variable reference. It is syntactically correct to say ALL(#1+#2) but this DOES NOT mean to add all values of #1 and #2, but the addition is performed first, i.e. only on the selected values. Then the result is replaced by the full variable #1, i.e. it works as if ALL(#1) had been specified; this is because the ALL command simply REPLACES the top of the execution stack by the contents or #1, i.e. ignoring what is there (the result of the priority operation #1+#2). Currently there is no way of diagnosing this as an error, as ALL and OTH are treated the same way as all other functions, but have a special effect on the evaluation process.

(26) IDX(start,end,increment) generates a variable, where the cases have values start, start+incr, start+ 2*incr etc. until end is reached. Note that the last case generated is equal or smaller than end, depending on the increment. If start>end the increment is from start down to end with negative increments. The number of cases for the resulting vector is determined by the arguments. An error occurs if an attempt is made to generate more than MCAS cases. Compare to the GENERATE command.

(27) DIS(varg,varg) computes the distance between the first and second variable. The distance computed depends upon the setting of the SET POWER option (default euclidean distance). See SET POWER for details.

(28) The result is a scalar.

(29) ROLL and DRAW : ROLL draws a sample with replacement (roll a dice), whereas DRAW draws a sample without replacement. These functions come in two forms;

ROLL(sarg,varg) or DRAW(sarg,varg): If the second argument is a vector, then a sample of size <sarg> is drawn from the values in <varg>. For instance ROLL(20,#test) will draw 20 values from all values found in variable #test (#test will of course need to have at least 20 values).

ROLL(sarg1,sarg2) or DRAW(sarg1,sarg2): If second argument is a scalar then a sample of size sarg1 is drawn from a population 1 ... sarg2.

(30) DRP(varg,sarg) drops a single element from vector varg. I.e. the result is a vector with one less observation. Note that this should be used with precaution when generating new variables, as CASIDs are never affected by functions. Use DELETE CASE if you need to remove cases from the WA. This function is mainly provided for experimenting, e.g. for jackknifing.

(31) TRM(arg,sarg) trims of sarg% of the cases from arg (sarg% from both tails of the variable!). E.g. Trim(#1,5) removes the 5% smallest and the 5% highest values, i.e. removes a total of 10% of the cases. The resulting variable will be shorter than the initial variable and the result will be sorted in ascending order. Warning: case identifiers are not meaningful for the trimmed variable.

(32) SIGN(arg1,arg2) sign transfer: the sign of arg2 is transferred to the absolute value of arg1.

(33) SEL(varg,mask) produces a new variable containing only the values of varg, where <mask> is true. Both <varg> and <mask> must have the same length. <mask> is usually a logical expression; if it is a variable the usual rules apply, i.e. values larger than 0 are considered true, all other values false. If <mask> causes no value to be selected an error message is issued and expression evaluation terminates. In the example

   >LET MEDIAN(SEL(#1,(#2>MED(#2))))

we compute the median for variable #1, but only for the cases with a value larger than the median in variable #2. Note that this produces the same result as:

   >IF #2>MED(#2) THEN MED(#1)

(34) SLI(varg) Selection_index function, where varg is usually a logical variable: returns the indices to the selected (true) values, for instance

   >LET #3=SLI(#10>100))

produces a variable containing indices pointing to the cases where variable #10 exceeds 100. Note that the function requires that SLI() select at least one case, otherwise an error message is issued (No cases selected).

(35) POWER(arg,power) This function is identical to the exponentiation operator (^). It is provided as - on foreign keyboards - you might have trouble entering the operator.

(36) VNUM/CNUM are variable and case (observation) lookup functions. The "s" (string) argument specifies a variable (case) name; the function returns the variable (case) number of the corresponding variable (case); if the variable (case) cannot be found the function returns 0. Note that upper and lower case characters are not distinguished (if you need this, check the STATUS LOOKUP command for details).

(37) MKG(varg1,arg2) creates categories ("make groups"). arg2 (usually a vector argument) specifies cutpoints defining the categories to create from the variable specified by varg1. Observations with values less than the first cutpoint are assigned a value of 1; values smaller than the second cutpoint, but greater or equal to the first cutpoint fall into category 2 etc. Note that MKG() sorts varg2 before proceeding. You might use the CODE command (SAVECUTPOINT option) to create a variable suitable as a varg2 specification. The result of MKG() is often used in conjunction with the SCD() output procedure (coding).

(38) FDG(varg) ("Find different groups/categories") Returns a vector containing all different group numbers (different categories, codes) in varg (treated as integer variable). The resulting vector is sorted in ascending order of the codes. This function is useful e.g when you want to loop over all groups defined by a variable or as input to the FRQ function.

(39) FRQ(varg,arg) [varg, arg are integers] Returns counts (frequencies) for all codes (categories, group ids) found in arg. The resulting vector has the same length as arg. If you want to produce a frequency table from a variable (varg is treated as integer variable), you might want to use FRQ(varg,FDG(varg)).

(40) MEM(varg,arg) [varg and arg are integers] (Membership indices) Arg contains a list of integer codes to be looked up in varg; the function return the indices into varg. See also the GRP() and GRM() functions. (41) GRM(varg,arg) [varg, arg are integers] group membership: This function is similar to the GRP() function, except that instead of looking up the GVAR any integer variable in the WA may be looked up. The result is a logical variable. (42) LNG(arg) Returns the number of elements in arg. The result is always a scalar value.

Hierarchy of operations

The table below shows the hierarchy of evaluation in expressions:

     evaluated
     ----------------------------------------------
     first             matrix references [] and constants
     2nd               parenthesized expressions
     3rd               functions
     4th               exponentiation
     5th               multiplication, division
     6th               addition, subtraction
     7th               relationals  < > =
     8th               & (and)
     9th               | (or)

Result variables defined

Commands using full expressions define result variables, depending upon the type of the expression: If the result of the evaluation is a scalar, then

   $0 is equal to 1
   $5 contains the result (scalar)

If the result is a vector then

   $0 contains the target variable
   $1 the number of cases
   $4 the result type
      0 means  display
      1        copy into variable ($0 is that v#)
      2        target is a letter variable
      3        otherwise

If the expressions is an IF THEN ELSE expression, the above ResVars are set for the expression after ELSE and for the first expression (after then):

   $2 is the target variable
   $3 the number of cases

Scalar variables (letter variables) (*)

Scalar variables are single letter variables (A..Z). They may be used in expressions, in <vlists>, named values (options) and and everywhere within simple expressions.

Scalar variables are defined with a LET CALC or IF command. LET and CALC may be omitted, as long as the = sign is specified in column 2 of a command line.

>A=20.5 >LET A=20.5 >CALC A=20.5

are identical commands. This command defines 'A' as a letter variable having value 20.5. Initially letter values are undefined, and reference to a undefind variable produces an error.

Scalar variables (called letter-variables) are usually modified explicitly by the user via a new value assignment.

(*) Alternatively letter values may be defined as auto incrementing variables, i.e. after each reference to the variable it is automatically incremented by a specified value. See the section on macros for more details on this.

Repetitive evaluation

Expressions may be evaluated repetitively, controlled by an index (scalar) variable. This applies to the LET, CALC, IF and as well as to the OUT command explained in the next section.

Repetitive execution of a transformation command is controlled by specifications in the <options> field of an expression command line. The <options> field is separated from the expression part by the \ character (Note that this special character might be different in your EDA version).

There are three different forms:

  ... \ FOR <letter> [START=first] END=last [INCREMENT=incr]
  ... \ <vlist> INDEX <letter>
  ... \ INDEX <letter> VARIABLE=index.var#

The first format defines an index <letter>, which varies on each step from <first> (default value 1) to <last> (END= is required) by increments of <incr> (default 1). <letter> is a letter-variable A..Z, which becomes undefined after termination.

The second format defines an index <letter> with the values taken from the <vlist>, i.e. <letter> takes the successive values from the variable list. Note that this command form uses temporarily a free location in the WA to store the current variable list, i.e. if the WA is filled to its full capacity, this command will terminate in error, telling you that the WA is full and no variables available.

The third format takes the index values from a variable in the WA specified with the VARIABLES=var# option, i.e. at the first iteration the index value is the first case of var#, the second at the second iteration until the last case of the variable is reached, i.e. the number of cases (length) of the variable determines the number of iterations.

Any error encountered during the repetitive evaluation terminates repetition.

   >LET #A=#A/100. \ FOR A START=20 END=40
   >LET #A=#A/100. \ 20-40 INDEX A

Both examples divide variables 20 to 40 by 100.

The EXECUTE command as well as the LOOP command explained in the chapter on macros use the same syntactical conventions to control execution of macros.

Output expressions

Output expressions, specified with the OUT command are a special form of the general expression syntax. The purpose of the OUT command is to let an advanced user produce and format output in a customized way. The OUT command, altough you can type in OUT commands like any other EDA command, will usually only appear in macros.

Overview

<expression> may be specified in a second form as:

  <expression>::= <out-exp1>[,<out-exp2>, .. ]

  where:

    <out-exp>::= <output-procedure>(<expression>)

The output-procedures are shown in the table below.

Output expressions are used either on the OUT command or on the IF command. Note however that in fact the OUT command is identical to LET or CALC; it is the presence of an output expression that alters the way an expression is handled.

Output expressions are used to control the way of showing data on the screen, i.e. provides a means of displaying, printing or writing the data the way you need/like it and not the way EDA produces it normally. Output expressions are useful in macros.

Output expressions are lists of output procuedures. Output procedures look like functions (SIN(), BTW() etc.) only that they do not return value(s) used in computations but perform an output action. For this reason output procedures cannot be nested. An argument to an output-procedures can be a simple string or a value, as well as a algebraic expression of any complexity. The following example

   >OUT LABEL(1),PRT(" Median=",med(#1))

would produce something like

   myvar    Median=  74.2

The example uses two procedures LABEL(1) displays the label of variable 1, PRT has here two arguments: as string argument and an expression MED(#1), computing the median of variable 1. (This is an illustration how output expressions look like; in most situations such a expression would appear in a macro with the reference to a specific variable replaced by a letter variable.)

Normally output procedures produce results on the screen and write them to the print file if it is active. It is also possible to send the results to the print file only (no screen display) or to the RAWOUT file. The various destinations depend upon the purpose of your macro. In most case you will use the default (normal behaviour), but if your macro has been created to produce a specific output file you intend to use later as an input to some other software you might want to write directly to the RAWOUT file without filling the screen with unwanted information.

The destination of the output produced by the output procedures is controlled with the SET OUTPUT command, which offers three main options: SCREEN (default), RAWOUT and PF_ONLY. Refer to SET OUTPUT for more information.

The output buffer

OUT maintains an output buffer; each procedure appends to the buffer, until the buffer is full, then it is displayed or printed and the emptied (this operations is called flushing the buffer). The length of the buffer depends upon the destination (screen, print file only or rawout file): Screens are usually 80 characters wide, the print file width is normally 132 characters and the rawout file contains 80 characters per line.

Normally the buffer is flushed when the buffer is full or the OUT command terminates.

    >OUT LABEL(1),DESCR(1),PRT(MEAN(#1))
    >OUT LABEL(A),DESCR(A),PRT(MEAN(#A))  \ FOR A END=20

With the first example, the buffer is flushed once at the end of the operations. The second examples flushed the buffer each time the buffer is full and after all the iterations have been executed; this means that the buffer is not flushed after each iteration; which would mean in this example that you will not see each variable on a separate output line. To avoid this unwanted effect the FLUSH() procedure has been supplied, therefore the following example would produce the desired effect (each variable on a separate output line):

 >OUT LABEL(A),DESCR(A),PRT(MEAN(#A)),FLUSH(0)  \ FOR A END=20

FLUSH(0) flushes the buffer by writing its contents. Note that the same effect may be obtained with the TAB() procedure, if you TAB beyond the length of the buffer (e.g. TAB(91) with a buffer width of 80).

(**) Not that you can avoid flushing the buffer using TAB(0), i.e. setting tabulation to 0. This is useful when several OUT commands should write to the same output line. The first OUT command should be closed with a TAB(0) and the next OUT command should start with a TAB(n), i.e. tabbing to the last position used by the previous command. You macro needs to keep track of these positions.

Output procedures

The following output procedures are available:

 |---------------------------------------------------------------------|
 | Name                 Description                 Q/S  Buff   Notes  |
 |---------------------------------------------------------------------|
 | PRT(arg1[,arg2]..)   Output values or strings    Y/N  Yes    Note 1 |
 | LAB(arg)             Variable label              Yes  Yes    Note 2 |
 | DESC(arg)            Variable descriptor         Yes  Yes    Note 2 |
 | CASI(arg)            Case identification         Yes  Yes    Note 2 |
 | CLB(arg)             C1 labels                   Yes  Yes    Note 3 |
 | KLB(arg)             C2 labels                   Yes  Yes    Note 3 |
 | NAM(sarg1,sarg2)     various names               No   Yes    Note 13|
 |---------------------------------------------------------------------|
 | CHAR(c[,c1,..]       Special characters          No   Yes    Note 6 |
 | FMT(len[,dec])       Format specification        --   --     Note 4 |
 | FMT(len,"fmt")                                                      |
 | SPC(ns)              Produces ns spaces          No   Yes           |
 | TAB(pos)             Tabulation                  --   Yes    Note 5 |
 | FLUSH(nop[,cnd])     Flushes buffer (with        No   Yes    Note 7 |
 |                      optional string operations)                    |
 |---------------------------------------------------------------------|
 | BXP(varg[,sarg])     Boxplot                     No   Yes    Note 8 |
 | LST(arg,...)         Listing variables/values    No   No     Note 12|
 | IFY(mask[,"msg"])    Identify cases              No   Yes    Note 11|
 | PLT(varg[,varg])     Plot of variables           No   No     Note 9 |
 | SCD(varg,"codes")    Show codes for varg         No   Yes    Note 14|
 |---------------------------------------------------------------------|

General notes

A majority of procecures accept scalar and vector arguments, some like the PLT() or BXP() procedure require vector arguments (varg).

String arguments: You may specify string arguments either by typing a string constant, delimited with double quotes or specify a string variable (no quotes). OUT PRT("hello") displays "hello" and PRT(A$) displays the contents of the string variable A$. Note that with string variables PRT() will always display the trimmed string (no blanks after the last non-blank character).

The format of numerical output is determined either by the default setting (format used for "normal data" in EDA) or as set by the FMT() procedure (see below for more information).

The Q/S column indicates whether the procedure is affected by the SEPARATOR/QUOTE settings for output written to RAWOUT (other destinations are not affected). SEPARATOR/QUOTE is controlled by the SET OUTFILE option (see there for directions) and defaults to no separators and no quotes. If separators and/or quotes are active the procedures with a "Yes" in the Q/S column will be affected, i.e. after each item a separator will be inserted and string items will be enclosed within the quote character. Note that trailing blanks will be removed from strings before inserting the ending quote character (in normal operation mode trailing blanks are not removed!)

The "Buffer" column indicates whether a procedure uses the output buffer or not. The PLT() and LST() procedure do not use it, i.e. are not affected by operations modifying or flushing it.

Notes

Note 1: PRT()

PRT accepts a list of arguments separated by commas. All types of arguments are accepted: scalar values, vector arguments (variable or matrix references) and string arguments. All argument types may be mixed without any restriction.

Note that strings written to RAWOUT are never quoted with the PRT() procedure, even if QUOTE has been set.

Note 2

The argument to these functions is either a scalar or a vector. LAB(1) displays the label of variable 1, whereas LAB(#1) takes the contents of variable #1 as an index variable, containing the indices to the variable labels to be displayed. Make sure to understand this difference. LAB, DESC and CAS are put into the print buffer using their maximum length (8, 48 resp. 4 characters). LAB(0), resp. DESC(0) means the WA label, resp. descriptor.

Note 3: CLB, KLB

CLB() and KLB() accept scalar or vector arguments using the same rules as the LAB(), DES() and CAS() procedures (note 2). CLB() return the labels of variable oriented configurations. as well as the labels of the variables stored in MATRIX. Normally CLB() shows the labels of the configuration stored in C1, but this might not always be the case (e.g. when EXCHanging configurations etc. KLB() on the other side shows case-oriented labels (see CONFIGUR for more details). CLB returns 8 characters, KLB 4.

Note 4: FMT(len), FMT(len,dec), FMT(len,"fmt")

FMT() controls the output format for numerical values. FMT affects the format of the numbers displayed by the PRT() and the LST() procedure. If no FMT() has been used during an EDA session, EDA uses the current default display format for data values (automatically set when a WA is read in and initially set to a default value). FMT settings remain in use until they are changed by another FMT specification or until the EDA session ends.

<len> indicates the total field length; FMT(len,dec): <dec> the number of decimal places. If only FMT(len) is specified, the number of decimal places is set to the SET DECIMALS value, i.e. uses the same number of decimal places as other EDA displayed numbers. If the number of decimals is 0, the value is displayed without decimal point, i.e as integer.

FMT(0): With <len>=0 each value is examined individually to create a format sufficently large to display that value (flexible format). This should not be used to output numbers in tabular form, as only a fixed format produces aligned columns in all cases. With this format the number of decimals displayed is always the number of decimales set with SET DECIMALS.

FMT(-1) is used to reset the numeric format used by OUT to the current default display format for numbers (as set by a GET command or set by default when calling EDA). Make sure to understand that FMT(-1) uses the current default display format and not a general default; default display formats are changed dynamically.

FMT(len,"fmt") (**) The last format is used to set a Fortran format to be used on subsequent numerical output. <len> should be specified correctly and contain the width the format produces. Note that in this case len does not affect the way the number is shown, but only the width it occupies, i.e. when setting a much larger len than the number output, you will produce a number of trailing blanks. Note that this is an advanced option, make sure to use only format elements referring to real numbers and to include the format elements with parentheses. Up to 12 characters (including the parentheses) may be used. No checks are performed by EDA, i.e. check carefully.

Note 5: TAB()

Each time a string or a number is inserted into the output buffer the current character pointer is set to the last character in the buffer. The TAB procedure may be used to manipulate the pointer. TAB() sets the current character pointer forward or backward in the buffer, i.e. the next output operation will add characters at that location. This is useful if you need to produce tabular output, start output at fixed locations. Backward tabbing is permitted.

Setting the tab value beyond the maximal buffer length causes the buffer to be flushed (displayed and emptied).

Any TAB operation sets the $3 ResVar to the current pointer, BEFORE setting it to the new value specified by TAB.

(**) TAB(0) may be used to avoid automatic flushing of the buffer (after an OUT command has finished). This means that the pointer is set to 0, i.e. the next OUT will start at location 1, unless you TAB to a different location before. TAB(0) is useful when building an output line interferes with automatic flushing of the output buffer. Note also that a SET A$=OBUFFER command retrieves the current output buffer and stores it into a string variable. This of course makes only sense before flushing as flushing empties the buffer.

Note 6: CHAR()

CHAR() accepts one or more scalar or vector arguments. Each value produces a single character on output. A positive value produces the character corresponding to its position (specified by the argument) in the systems collating sequence (usually ASCII sequence). This may be used to obtain non-printable control characters (e.g. form feed, bell etc). The sequence (assuming your systems uses the ASCII character set) OUT CHAR(7) rings the bell. OUT CHAR(7,7,7) would ring it three times. Refer to an ASCII table for a list of possible values. Note that the argument is specified as a decimal value (take care to pick the right column from that table, many ASCII tables show columns containing the decimal, the octal and the hexadecimal value of a character (sometimes decimal values are not shown!).

If a value is negative an EDA character used in the semi-graphical displays is returned. The documentation of the SET GRAPH command shows a table with the different characters. <c> refers to the number found in the first column, but with a minus sign added, e.g. CHAR(-5) will return the character used on plots to indicate two observations with the same coordinates.

Note 7: FLUSH(op), FLUSH(0,cond)

FLUSH flushes the buffer, i.e. displays or writes it out and empties it. See the section on the output buffer above on how the buffer works and when it is normally flushed. The flush procedure is used to flush the buffer whenever you need to do so and optionally provides a number of operations performed before flushing. The possible arguments (op) have to following meaning:

     0   Flush the buffer [see below for FLUSH(0,cond)]
     1   Right justify the buffer before flushing it.
     2   Center the buffer and flush
     3   Left justify and flush.
     4   Remove spaces and flush.
     5   Remove last separator and flush.
     6   Blank beyond current pointer and flush (1)
    -1   Empty the buffer (no output).

(1) FLUSH(6) is only useful in conjunction with TAB() in situations where you are building an output line by tabbing forth and back and you want to make sure that the buffer does not contain any characters beyond the current pointer position. If no backward TAB() has been used, FLUSH(6) is the same as FLUSH(0).

FLUSH(0,cond): If a second argument is present with FLUSH(0), the buffer is flushed conditionally. <cond> is a positive or negative integer. FLUSH(0,60) means that the buffer is flushed if the current character pointer is at position 60 or beyond, otherwise nothing is done. <cond> may also be negative; then remaining space is checked, i.e. FLUSH(0,-20) checks whether 20 positions remain in the current buffer, it this is the case nothing is done; if there are no 20 positions available beyond the current position the buffer is flushed.

Note 8: BXP(varg,width) or BXP(varg)

BXP(varg,width) produces a simple form of a boxplot with and overall width of <width>. If the width is either too small or too large it is adjusted. BXP(varg) will produce a boxplot of width 20.

You may modify the characters used to draw the boxplot with the SET GRAPH command.

Note 10: PLT

PLT(varg) and PLT(varg,varg) produce simple plots. With only one argument <varg> is plotted on the Y axis, the X-axis being the sequence of the cases. With two arguments the first variable is plotted on X, the second on Y. The number vertical and horizontal units are by default those set by the SET PLOT SIZE command (or preset by EDA). With a single argument, the dimensions set for time series plots are used. STAT PLOT SIZE will show the current sizes. The default dimensions may be overriden using a PLOT=(xx,yy) option in the control field of the OUT command (separated by a \ from the command): xx and yy indicate the number of units to use on x and y.

Note 11: IFY()

    IFY(maskvar#)
    IFY(maskvar#,"message")

The IFY procedure is helpful when you need to list cases (case identifiers) according to some criterion. For instance

>OUT IFY(FAR(#1))

lists the case identifiers of the far out values of variable 1.

The first argument to IFY is a logical variable (mask) or logical expression, telling IFY whether a case has to be listed or not. If no case satisfies the condition a "No cases." message is issued, unless you use the second form of the procedure, i.e. the second argument contains the message to be issued if no case matches the condition. Therefore >OUT IFY(FAR(#1),"No far out values") issues "No far out values" if no far out values are found in variable 1.

Note 12: LST()

(9) LST(arg,...) produces a list of the items included in parenthesis. Only numerical lists are possible. The format used is set with the FMT() procedure (or the default values set at the time of the LST() invocation). Note that all items are displayed with the same format. It is possible to display arguments with different lengths, the display will occupy as many lines as the largest <arg> has values, for shorter <args> the positions above their respective length will be filled with a quotation mark.

<arg> may also be a string in double quotes. Unless the string starts with a $ sign, a string argument is constant, i.e. will be repeated on each line, the number of lines being determined by the longest <arg> (if all non string <args> are scalars only a single output line will be produced). For more details see the special section below on the LST function. The list proc may contain numeric and string arguments in any order. String arguments are displayed as is, unless the first character is a $ symbol, i.e.

      >OUT  LST("|",A,"|") \ FOR A END=3

    displays

       | 1. |
       | 2. |
       | 3. |

Customized lists may contain case ids, labels and the like. This is done with a special $tag at the beginning of a quoted string. A $ followed by a letter tells LST() to display one of the following:

  $L  Variable labels
  $D  Variable descriptors
  $I  Case identifiers
  $C  C1 labels
  $K  C2 labels

Normally the information is found by simply taking the current index:

IF #1 > 10 THEN LST (#1,"$I")

will produce a list with the values of variable number 1 and the case ids corresponding to them; in some instances however there is no index there (no variable reference with the LST() function) or the information is not correct (e.g. when using the DGR() or UGR() functions. For these circumstances and special needs the information could be indexed by a variable argument following the $ request:

      OUT LST("$K*",#1)

would use variable 1 as index to the K's (C2 labels) to show. Note that the * requests that the next argument should be taken as index and the #1 will not be displayed).

Note 13: NAM(sarg1,sarg2)

The NAM() procedure displays various names and informations as directed by the arguments. [It is in fact an interface to a procedure EDA uses to display that information]. This procedure duplicates some functions of other procedures. You should use the other procedures whenever possible. The following table shows the possible arguments to NAM.

  sarg1       Produces               sarg2
 ----------------------------------------------------------
  1           <space> rownam   \
  2           <space> colnam    |    0 singular+space
  3           xname             |    1 plural
  4           yname             /
  5           GVAR:label             not used
 10           label <space>          variable ref.
 11           descr <space>          variable ref.
 12           casid <space>          case ref.
 13           casid (group) <space>  case ref.(*1
 14           label(v#) space        variable reference
 15           label(v#) descr <sp>   variable ref.
 16           label(#tie) <sp>       variable ref.
 17           label(#tie)+des <sp>   variable ref.
 18           group name             group reference
 20           casid   (*2)           case ref.
 21           label   (*3)           variable ref.

 *1)  same as 12, if no GVAR exists)
 *2)  as set by SET CASID
 *3)  as set by SET LABEL

For codes>10, a positive sarg1 pushes the trim length of th string to the output buffer, whereas a negative value pushes the full length including trailing blanks (e.g. a descriptor is always 48 positions wide).

Note 14: SCD(varg,"codes")

(Show codes) Each symbol specified in the codes string represents a category (group) found in varg. The first symbol will be shown for all values of 1 found in varg, the second symbol for values (group, category) of 2 and so on. If a category value is larger than the number of code supplied a ? (question mark) will be shown. If a value is 0 or less, a blank will be shown.

This procedure is supplied to help macro authors to create customized coded lists or other coded displays. (See the section on the "Art of Coding" in the glossary for an overview on what is available with various commands).

SCD() will often be used in conjunction with the MKG() function, i.e. a function creating groups. The expression

 
>OUT SCD(MKG(#5,MEDIAN(#5))-1,"*")

will produce a display for #5 where values below the median appear as spaces and values above as "*". MKG(#5,MEDIAN(#5) will create a vector where all values below the median (MEDIAN(#5) specifies a single cutpoint) are assigned 1 and values above 2. 1 is subtracted to cause SCD to show values below the median as spaces (0 is shown as space. For this reason only a single code ("*") is specified in the example, i.e. the code for the 1s.

Some examples

As OUT procedures are usually used in macros, the macro library provided with EDA contains a number of examples illustrating the use of the OUT command. There are two main areas where the OUT command is useful: creating your own analysis commands producing output to the screen exactly the way you want it and writing data or other information to an output file in a format EDA does not supply with the standard commands for exporting data.

The following examples intend to show a number of possible uses of output procedures. Note that these examples are sketches illustrating various aspects of output procedures, which usually are part of a more general macro.

Symmetry measure

The difference between the mean and the median is a simple measure of symmetry. The difference is easily displayed typing (for variable 1):

   >LET MEAN(#1)-MEDIAN(#1)

This command however just diplays a number on the screen, i.e. no documentation whatsoever.

   >OUT PRT("Symmetry=",MEAN(#1)-MEDIAN(#1))

used the PRT procedure to display a string "Symmetry=" followed by the difference between the mean and the median. PRT accepts any number of arguments, here a string argument, followed by an expression which is evaluated to a scalar value.

   >OUT PRT("Symmetry for variable "),LABEL(1),PRT(MEAN(#1)-MEDIAN(#1))

adds the label of variable 1, making a more readable display. Note that we write LABEL(1). [LABEL(#1) would have a different meaning, i.e. producing a list of labels for all variables whose indices are found in #1.]

 >OUT PRT("Symmetry for "),LABEL(A),PRT(MEAN(#A)-MEDI(#A)) \ #* INDEX A

produces the same output, but for all variables in the WA, i.e. the direct variable reference hase been replaced by a letter variable, controlled by #* INDEX A, i.e. #* means all variables. There is however a problem, i.e. after the display of the first variable, EDA continues to fill the current output line, i.e. the symmetry measures will not appear on separate lines for each variable. In order to achieve this, EDA has to be told to flush (write out) the output buffer after each variable:

 >OUT PRT("Symmetry for "),LABEL(A),PRT(MEAN(#A)-MEDI(#A)),FLUSH(0) \ #* INDEX A

Boxplots

You need a simple way of checking what a log-transformation of a sequence variable does to variable.

  >OUT LABEL(A),BXP(#A),PRT("  "),BXP(LOG(#A)),FLUSH(0) \ #* INDEX A

The BXP procedure accepts a vector argument #A and LOG(#A) in our example and produces a simple boxplot. As BXP is used with a single argument, the width of the boxplot will be the default width (20). The PRT procedure separates the boxplots with a few spaces. Note again the use of FLUSH(0), which is needed in order to produce an output line for each variable.

Comma delimited output

Many software packages can read raw data with data items spearated by commas and strings appearing in quotes. Note that the *WRITE DELMITED provides a number of common output forms, i.e there is no need of using a macro or the OUT procedures. There are however situations where standard EDA facilities do not provide what you need. Let us consider an example, where you need the following output: For each variable in the WA you will need the following:

     "label","descriptor",value1,value2,value3,...,value-n

      followed by a blank line

The following sequence of commands will produce that output file:

1.   >SET RAWOUT OPEN "myfile" OUTFILE
2.   >SET OUT SEPARATE
3.   >SET OUT QUOTE DOUBLE
4.   >OUT LABEL(A),DESCR(A),PRT(#A),FLUSH(5),FLUSH(0) \ FOR A END=$NVR
5.   >SET RAWOUT CLOSE

Line 1: Opens a RAWOUT file called "myfile". The OUTFILE option tells EDA in addition that the output from the OUT command has to go to that file. Without this option an additonal command would be needed, telling EDA that the output has to go to a file, i.e. >SET OUTFILE RAWOUT. Line 2: Tells EDA that the items written to the output file have to be separated by commas (this is the default separator, you could specify an alternative character, e.g. SET OUT SEPARATE ";"). Note that SET OUT SEPARATE is only valid for output to a file (and not the the screen or the print file). Line 3 instructs EDA to quote string items using DOUBLE quotes (default is single quotes, an alternative character could be specified). Line 4 produces the actual output. It contains a control expression performing the command line for each variable in the current WA. LABEL() and DESCR() will output the label and the descriptor for the current variable; these items will be quoted and separated by commas, trailing blanks will be removed from the strings. PRT(#A) writes out all values of the current variable, separating them with commas and creating as many output lines as needed to fit all values. FLUSH(5) has a double purpose: flushing the last line (i.e. writing it out) and, before doing that, remove the last separator. Finally FLUSH(0) creates an empty line before going to the next variable. Line 5 closes the output file. This means not only that the RAWOUT file is closed, but the OUTFILE is reset to default, i.e. output to the screen and no separators and quotes.