Documents

This section describes the commands used in connection with documents. In addition to the commands documented here TED, the EDA text editor offers additional commands for creating and editing. Furthermore the toolbox describes a script language which may be used to build document files.

Below you will find two main sections, a section giving detailed information on the document feature and a reference section explaining a number of document related commands.

Introduction

EDA documents are texts of any length used in connection with variables, WAs and user defined concepts. Related commands are
   DOC       Main document handling command
   EXTRACT   Extracts numerical information from documents
   SEARCH    Search within documents; produce search reports

GET/PUT Manage implicitly documents TED the EDA text editor, used to create and edit documents

Documents are an optional feature and may be used or not be used in a specific application. We shall also see that it is very flexible in the sense that you might create minimal documentation like a few lines of comment describing the contents of a particular WA or you might design a complex documentation structure with different levels of documentation, systematic embedding of numeric information etc. The document feature is designed in a way that theses different needs can be satisfied in a simple straightforward way. Complex structures are designed and implemented by the user; the document remains readable as an ordinary text (sequence of text lines).

There are several types of documents:

Document display, document limits

Up to now we defined a document as a text of any length referring to some unit (variable, concept, special document). More precisely a document is a number of text LINES, where each line may belong to a document level: the first character of a line is taken as level information. If no level is desired the first line is left blank. When displaying documents these levels are not visible, but when creating documents this is important. Levels may be used for selective access of a document or a series of documents, this selection is known as "limiting" (DOC LIMIT command). The user should use only A..Z,a..z and 0..9 as level-id. You should not use the # sign as well as the % sign which have a specific meaning: # identifies a new document and the % signals a case related document line. You should avoid other special symbols in the level-id, because future extensions of the document facility will use special symbols for internal purposes, meaning that older document files might cause trouble.

There is no predefined meaning assigned to the levels; the user may attach any meaning to it to build a coherent documentation fitting his very specific needs. This simple approach allows for a very flexible use (nothing forces the use of limits, if it is not desired) of this facility.

Let us now turn to some practical examples. First we assume that we have already created a fully documented file (an EDA file):

   >GET VOTATION
    WA read: Votations 1975-1977
    GVAR Stored
    24 Variables read from file: VOTATION
    >>> Documents available.
    Bloc de votations federales 1975- 1977
    La NOTE du bloc 1 contient des informations
    quant a la notation utilisee dans les documents.
We access, like in previous examples, a file called "VOTATION", containing documents. Therefore - unless the NODOC option is present - the documents are available for inspection and manipulation. The three text lines following the ">>>Documents available" message is the HEADER document, i.e. a text always displayed when accessing this WA (there is of course an option to inhibit this display). The header tells the user, what data is stored in that WA and that the NOTE contains more information regarding the notation used and the organization of the documentation.

The first thing we would like to now could be what documents are available. One way to know whether there are documents is to look at the output of the DESCRIBE command, where a star ("*") in the dc field tells you that there is a document for that particular variable. Another way to find out is:

   >DOC LIST
   Documents for var#: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
      15 16 17 18 19 20 21 22 23 24
   there is a NOTE
   DOC : #HIST1, #HIST2
One of the document handling commands is the DOC command, which is used here to obtain the list of available documents for the current WA: there are documents for variables 1 through 24, there is a NOTE and two documents pertaining to user defined concepts #HIST1, #HIST2 containing historical background for the variables in the WA.

The main purpose of the DOC command is display/printing of documents; some of its facilities are more oriented towards report production than display. The simplest form of the command is DOC <v1> as many EDA commands: it causes the document related to <v1> to be displayed:


>DOC 1 O751254 Politique conjoncturelle T:Article de la constitution sur la politique conjoncturelle T:Konjunkturartikel der Bundesverfassung D:2.3.1975 P:28.4 A:11 ZH BE UR GL SO BS BL SG GR TG TI VS Introduction dans la Cst. un art. donnant a la Confederation des competences dans le domaine de la politique conjoncturelle. Debat important de type federaliste et "centraliste". L'article conjoncturel a ete accepte par le peuple, mais i il n'y a pas eu majorite des etats.

The document contains some information which is very structured title in French and German, date, turnout etc. and some explanatory text. The meaning of the various symbols has been explained in the NOTE document, e.g. P: stands for "participation", i.e. turnout. In fact each of these specific lines correspond also to levels. Before turning to the levels of documentation, let's consider the case documents, i.e. documents pertaining to specific cases (columns of the data matrix). These are documents imbedded within other documents, but they are not displayed by the normal DOC command, they may be displayed with the DOC CASE command, which may have three different forms:
>DOC 1 CASE
Avril mai, elections cantonales a BL GR LU TI ZH {DOC}
BL  election cantonale, voir document
>DOC 1 CASE=BL
BL  election cantonale, voir document
The first form asks to display all case related documents for variable #1; the second only for information on case 'BL'.

This document type is specific to a particular variable, i.e. these documents are stored within the document of that variable. Sometimes you need to say something on a specific case, but for all variables in the WA. Then the corresponding text is embedded within the NOTE document. These documents are accessed by:

>DOC CASE GLOBAL
25 cantons sans Jura
As long as documents are simple explanatory texts, i.e. containing some lines with informations that the label/descriptor could not hold the described features are sufficient, but when it comes to larger amounts of documentation and to more systematic documentation we need some criteria to limit display and/or access; these criteria are specified through a level-id, which is the first character of each document line; these level ids may be used to LIMIT the search, like in:
>DOC LIMIT INCLUDE "1234"
>DOC 1
T:Konjunkturartikel der Bundesverfassung
D:2.3.1975
P:28.4
A:11  ZH BE UR GL SO BS BL SG GR TG TI VS
The DOC LIMIT INCLUDE command asks to display only the lines having level 1 or 2 or 3 or 4. This limit is set and in use as long as no other limit is set or the DOC LIMIT DROP command is not given. To include the document header, i.e. the document name and descriptor into the display you should also include a "#", identifying this header. Note that only four different levels might be specified this way and lines with no level id, i.e. space are always displayed.

Beside the DOC LIMIT INCLUDE command, there are three more limits EXCLUDE, UPTO, DOWNTO. The EXCLUDE is used exactly like the INCLUDE, whereas the two others specify ranges: DOC UPTO "9" specifies that all document lines having level ids # 0..9 are included. For the range specifications the collating sequence is blank, #, 0-9, A-Z and a-z. No levels are provided with case documents, therefore case documents can not be limited (in fact the case doc has a record-id %, which identifies them as case oriented documents).

Finally the DOC LIMIT command alone displays the limits currently in use.

When using limits to produce e.g. a report for the data residing in a file, it is not always clear from the printout to which variable etc. a piece of documentation belongs, because the document header is not included, as long as it is not specified; therefore the document list command has a facility to add the label, descriptor or a user defined string in front of the text line printed/displayed. Together with other facilities (TED, TEXT), combined with the macro facilities, provisions are made to produce highly readable printouts.

>DOC 1 LABEL
O751254  : T:Konjunkturartikel der Bundesverfassung
O751254  : D:2.3.1975
O751254  : P:28.4
O751254  : A:11  ZH BE UR GL SO BS BL SG GR TG TI VS

>DOC 1 "*** " *** T:Konjunkturartikel der Bundesverfassung *** D:2.3.1975 *** P:28.4 *** A:11 ZH BE UR GL SO BS BL SG GR TG TI VS

In the first example the label of the variable is use to do the cosmetics, in the second case a string of "*** " is put in front of the displayed text.

As a conclusion for the various examples we should say that the power and the usefulness of these facilities depend greatly upon the user's design of the documents, i.e. especially the use of the limits for specific information and careful documentation (e.g. in the NOTE) of the conventions used. It would be good practice to document the meaning of the record-ids in the NOTE section.

Creating and editing documents

Documents are stored with the EDA system files. Documents may also be added later. This section discusses how to created documents: There are three basic ways of creating documents (the third possibility is a minor one): As long as a document file exists for the current WA it is saved, when you do a *WRITE EDA (unless you specify the NODOC parameter). At this point documents attached to variables which do not (any more) exist in the WA are dropped.

The structure for a document file is fully described in the technical appendix; let's just recall the most important aspects. The first line(s) until the first document are considered as header (there should be at least one line (even empty) of header. After the header are stored least one line (even empty) of header. After the header are stored the documents in any order. Each document starts with a document id "#" in column 1, followed by the document name, i.e. an eight character long string corresponding either to a variable label in the WA or when starting with an additional "#" (counted in the 8 characters]) to a user defined concept. Furthermore one document may be called NOTE. The rest of the id line may contain any descriptive text. After the id you may insert any number of text lines, remember that the first character is the level id, therefore indicate there a level or if you do not wish to use levels leave it blank (because the first character is never displayed). To conclude this introduction, we will show the text of the document we have use above, how it is stored and how it will appear in TED or another text editor:

#O751254  Politique conjoncturelle
0T:Article de la constitution sur la politique conjoncturelle
1T:Konjunkturartikel der Bundesverfassung
2D:2.3.1975
3P:28.4
4A:11  ZH BE UR GL SO BS BL SG GR TG TI VS
EIntroduction dans la Cst. un article donnant a la Confederation des
Ecompetences dans le domaine de la politique conjoncturelle.
Adebat important de type federaliste et "centraliste".
AL'article conjoncturel a ete accepte par le peuple, mais il n'y a pas
Aeu majorite des etats.
%    Avril mai, elections cantonales a BL GR LU TI ZH {DOC}
%BL  election cantonale, voir document
The last two lines are case level documents. The first one is not linked to any case (i.e. general), the second is specific for case BL; remember that case-ids are max. 4 characters long, therefore after the % sign, identifying a case level document, the follow in four characters are used for the case-id.

Following that structure you might now write a file with your text editor and store it in a file named, say "mydoc", then the following sequence will make it available to EDA and store it on a system file. You have already loaded the variables into the WA, then:

  >DOC FILE "mydoc"
  >PUT myfile
The first command makes your external file available to EDA, you might then use the DOC facility, as if you had loaded the documents form a EDA system file. Then a subsequent *WRITE EDA will save the data together with your documents to the EDA file "myfile".

Note that DOC FILE might also be used to keep documents in a separate for data files, which to not have the possibility of storing documents, like the EDA workfile.

The third possibility of writing documents, i.e. entering them from the interactive terminal is only suitable for short documents; its use has been demonstrated in the section on EDA files (*WRITE EDA).

Let's now turn to the first possibility of creating documents and document files: the EDA text editor TED. This section discusses TED only regarding its use with documents; you should also read the chapter on TED as a text editor, not dealing specifically with documents.

TED may be used not only to correct documents, it may also be used to create (insert) new documents and to create a non-existing document file from the start. The document file is the (temporary) file accessed by the various commands using documents, i.e. the documents belonging to the currently loaded WA. Creating it then means to initialize such a file for a WA which had no documents at all. file from the start. The text unit treated by TED is the single document. As we have already said, the EDA system file is a sequential file; this in fact need not bother you greatly, but it has some consequences on editing such files, consequences dealing with access and security. Basically the steps are the same as for editing data, using EDIT or other facilities: (1) You read file "A" into the WA (2) correct/ edit (3) you write the WA to a file, say "B" (4) You delete file "A".

For documents step (2) will make use of TED, and you edit document by document. To guarantee optimal security TED permits undoing of modifications; it makes in fact a copy of the original document. For this reason (several copies) you have to specify explicitly if you wish to correct the documents, i.e. saving the corrections in the EDA file, or you edit the document for output, i.e. you wish to write it to the print file or another file. This second mode is known as read-only mode. Of course creation of documents cannot be done in this mode.

Another implication of the sequential nature of the document file is that we may speak - with respect to a particular document - of a "next document" or of a "preceding" document, i.e. it is possible to go through the documents from the first to the last by telling the system to edit the next document. Of course this is not the only way to do it, the user is free to edit the documents in any sequence, but this is surely the most efficient way.

What it is important to retain from these remarks is:

The length of a document is not limited, but TED has a maximum number of lines, depending upon the size of your specific implementation. This should not be a major problem, because with e.g. a 120x120 WA implementation there are nearly 500 text lines you may load into TED. A single document will probably never have more than 500 text lines. In such an event you might use your system text editor (using a symbolic EDA file) or you might create a second document containing the rest of the text.

For documents TED is invoked by:

   >TED DOC <docname>ŻEDIT
when editing a already existing document. Default working mode is read-only, i.e. the editor may not be left via a simple W, but you may edit the document and write to a file or to the print file to produce a report. If you wish to replace and/or add new documents you should specify the EDIT parameter. <docname> may be either HEADER, NOTE or the name of a document enclosed in "". If it is not specified the first document in the file is assumed (i.e. the "next" document when starting on top of the file.

If no document file exists, TED is used to create it:

   >TED DOC CREATE
    New document file created
    One blank header line written
    Use CD to create documents
    TED, EDA text editor
    Remember: col 1 is document level
    1 lines from Header
The document file is created with a single blank line in it; because the first line(s) of a document file is considered as HEADER, TED then accesses this file as if TED DOC HEADER EDIT had been specified. In fact your editing now the HEADER, i.e. you may use all tools in TED to create the header or to create any other document using TED'S CD command. When creating a document from the start full edit mode is assumed.

Because the single document is the basic text unit to be edited, there are facilities to continue editing on a different document than the current document, i.e. continue to edit the same document file. By the way exiting from TED would start editing all over again, which would cause considerable overhead (because of the sequential nature of the file that we discussed above).

We shall now have a look at the TED commands, which are specific for documents (in fact these commands are disabled for other types of text).

    ND  [HEAD | "doc" | NOTE] [NOAPPLY]
ND stands for Next_Document or New_Document. This causes a new document to be loaded into TED. When in EDIT mode the previous document loaded is written to the document file, unless the NOAPPLY parameter has been specified. (Note that Quitting TED causes all modifications of the same TED session to be discarded, in fact editing is done on a copy of the original document file).

If no document (header, note or document) is specified the next document is loaded, next being the document physically stored after the previous document, i.e. using a sequence of ND will go though the document file from the first to the last document. This works also when calling TED, i.e specifying only TED DOC (this will load the next document in the file, initially the first document).

    CD  | "doc " | [NOAPPLY]
        | NOTE   |
CD: Create_document creates a new document. The NOAPPLY parameter causes the modifications done to the current document not to be applied before creating the NEW document.

To create a new document EDIT mode is required. The user is asked to enter a document descriptor, i.e. the 70 character description used on the document header besides the document name.

Existing documents may not be created. To create variable related documents, the corresponding variable must exist in the WA, other documents (i.e. starting with the # symbol) of course may be created freely.

A HEADER may not be created, because their always is a header in a file, even when creating it by TED DOC create a single blank header line is written.


   DD  "doc" | NOTE
Delete_Documents: deletes the document given in the parameter field. Note that document deletion is NOT undone by a TED Q command, i.e. when doing a *WRITE EDA or a PD (see below) the document is lost. Therefore if you wish to undo a DD you will have to re-*READ EDA the document.

  PD
Pack_docfile (requires EDIT mode). This command is provided to assist the user in editing large files with many documents. Usually it will be needed very seldom.

If documents are deleted (either by deletion of a variable or by an explicit DD) documents are deleted only logically, but not physically, i.e. they cannot be accessed by still exist and occupy also space in the document address table. Normally only *WRITE EDA removes these deleted documents physically.

When creating new documents etc. the document address table may become full, despite the fact the maximum number of documents is not reached (because there are many logically deleted documents). In such an instance the PD command brings help: it removes the logically deleted documents physically, thus freeing space for new documents. Note that the maximum number of documents is a implementation constant.

     EDescriptor ["newname" {NOVAR}]

Edit_Document-descriptor: this command allows modification of a document descriptor (requires EDIT mode) as well as the document name. Without any parameter the program ask for a new descriptor. If "newname" is present the document name is replaced by "newname". Because the document name is used to link variables to documents the variable label is also modified, unless the NOVAR parameter is present, then the variable label is not altered. These two possibilities solve two different problems (1) the first allows to rename documents (2) the second with the NOVAR gives the possibility to transfer a document from one variable to another one, in the case where a document is attached to the wrong variable.

This command should be used very carefully. Checks to assure that no duplicate document names exist are made only partially to allow for more flexibility, with the drawback that duplicate names may exist and cause one document to be lost.

For all other TED commands, not specific to documents, you should consult the chapter on TED.

Searching documents

One of the useful things you might do with documents is to search them for the occurrence of specific texts passages or key words. This is done in EDA with the SEARCH command, which currently does not allow for very complex searches, i.e. it allows one only the search one string at the same time. But together with the DOC LIMIT facility (SEARCH is sensitive to the limits set) searches can be performed, which have been very useful. The simplest form of the SEARCH command is:
   >SEARCH "string"
The search command is sensitive to the ALLVARS mode setting, i.e. while in that mode all variables having documents are searched; when the mode is off only the variables in the list are searched. During the search each line containing "string" is displayed or otherwise processed and the variable label is put in front of the text line.

Display can be obtained in two formats: normal format and KWIC format:

Assume that we would like search the string "politique"; we will choose to search "olitique" because "politique" and "Politique" should be a match.

 >SEARCH "olitique"

24 variables O751254 # O751254 Politique conjoncturelle O751254 0 T:Article de la constitution sur la politique conjoncturelle O751254 E Competence dans le domaine de la politique conjoncturelle O764273 0 T:Politique de l'argent et du credit/

>SEARCH "olitique" KWIC

24 variables O751254 O751254 P olitique conjoncturelle O751254 la constitution sur la p olitique conjoncturelle/ O751254 dans le domaine de la p olitique conjoncturelle. O764273 T:P olitique de l'argent et du credit

In first form displays the text line, together with label and document level; the second format puts the searched string in the middle column and displays the context of the line.

The output written to the print file contains longer text lines, the label and the level is moved to the end of the line and the relative location (line number) within the document of that particular is also given.

When the SEARCH command is used to search out variables for analysis, the VARS parameter is used to put the variables, within witch the string was found, into the current variable list, therefore a subsequent DESCRIBE will show all variables where the string could be found.

As this facility is likely to be used (especially with the KWIC option) to produce some index list, an option allows to write the result to the RAWOUT file, where it can be processed later (e.g. sorting). As the search command cannot connect the RAWOUT file, this has to be done first by issuing a SET RAWOUT "filename", before doing a SEARCH FILE. The second command format manages this memory stack, shared with

Data from text: EXTRACT

In many instances, e.g. when collecting data on referenda, we will get information on specific events, but also on position of parties or other groups to particular issues. This type of data is often semi-textual data seen from the usage, i.e. when analyzing single variables this is part of the information on a particular referendum, but we can easily imagine that when considering structures we might wish to have a variable where we know whether the socialist party was in favour or against the various referenda considered for analysis.

In a more general perspective we would then have the choice to create that particular variable "position of the socialist party" as separate variable AND add that information also to the document, or NOT to add it (with the drawback that we might loose this information out of sight when analyzing the variables separately again). The first solution has also the disadvantage of rendering the data structure more complex (basic structure for our example is referenda by case, whereas the position of a party (in our case) is a global attribute for a referendum, i.e. variable level data.

The EDA solution has the advantage of keeping the document still readable, without loosing that piece of information for numeric analysis. The only constraint is that the user has to think hard before building the documents, i.e. the advantages of the flexible structure are fully used only if the structure designed and conceptualized from the start.

In conclusion we might say that the document facility may be used to store within highly readable text less frequently used data.

The EXTRACT command produces numeric variables from documents: for each variable a single numerical value is extracted. If there is a document associated with a variable, EXTRACT searches either a tag or sentinel (':','(','[') or an arbitrary string. If the search criterion is satisfied the data item (either numeric or alphanumeric) following the search tag is extracted as value for that specific variable, otherwise the missing value code is assigned, i.e. the system replacement value. Note that only the first occurrence of the criterion is used, the rest of the document is not searched. The DOC LIMIT facility applies also to the EXTRACT command, thus allowing both to minimize search time and specifying more precise search criteria.

The EXTRACT command has two basic command formats, one dealing directly with extraction, the other with the management of the memory stack used in conversion from non-numeric data.

 EXTRACT [vlist] [<crit>] V=targetv# [MISS=val]
         [CONTEXT]

where <crit> may be:

| [COLON {"string"} ] | PARENTHESIS | BRACKETS | "string"

If no variable list is specified all documents are searched for a value to extract; if a vlist is presnet only the variables from the list are treated. Usually results are stored into a new variable in the WA; you may however chose to store it as a CENTER or reference value of each variable. The MISS=val specification may be used to override the globally defined replacement value used for replacement, when no data item can be extracted. Finally the CONTEXT option displays every line from which a data item is extracted. The different possibilities for <crit> and their use will be clear from the examples.

With the : sentinel is also possible to require that a specific string (of up to 30 letters) must precede the sentinel. As said above, data items (i.e. the character string following immediately the sentinel (or search string) may be numeric or alphanumeric. The numeric data items are delimited by any non numeric character. If the first character is not numeric, the data item is considered alphanumeric: only the first four characters of the data item are considered for conversion (blank or a closing sentinel (:,),resp. !) terminate the string earlier.

For non-numeric strings the same memory mechanism is used as on the LABEL EXTRACT command. This is described in the section on "data within labels and descriptors". Note that the same stack is shared, i.e. a value defined with LABEL EXTRACT is also applied in an EXTRACT command (details below). The second command format manages the memory stack, shared with the LABEL ETRACT command, i.e. it remembers conversions you have already done before.

           | STACK
 EXTRACT   | LEARN
           | RESET
Suppose we intend to extract a variable from the documents regarding the position of the socialist party towards the different referenda and within the text, this is specified in parenthesis following the party name, e.g. Socialist (against), meaning that the socialist party recommended to vote against that particular referendum. The values within parenthesis may be 'for', 'against','free' (meaning that the party has said that it has no particular attitude), 'unknown' or '?' meaning that the position is not known.
     >EXTRACT LEARN
      Enter up to 10 search strings (4 char long)
      1   2   3   4   5   6   7   8   9   0
      for agaifreeunkn?
      Enter   5 replacement values
      1 2 3 4 4
     >EXTRACT STACK
      Memory stack contains:
      for  replaced by                 1.00
      agai replaced by                 2.00
      free replaced by                 3.00
      unkn replaced by                 4.00
      ?    replaced by                 4.00
The LEARN mechanism allows to define the replacement values prior to effective extraction, i.e. afterwards any time the program finds 'against' (only the first 4 characters are meaningful) it "knows" that it has to convert it to a numeric value of 2.00. The EXTRACT STACK command displays the values currently stored in the stack. The third parameter RESET, empties the stack. Note that the stack has a limited capacity; if it is full and the program will query each time it encounters that new string for a replacement value.

Let's now turn to some other examples:

    Examples of data items within a document

Turnout : 25.2% the last election took place in :1920 Socialist party (against) liberal position against

A command

>EXTRACT VAR=<target>

would extract a value of 25.2 and put it as a case into the target variable, as long as the semicolon preceding "Turnout" is the very first colon found in a document (note that the search may be limited to a particular level, where you have always specified turnout). The default search sentinel is the colon.

As we are in ALLVARS mode this will be done for all variables in the WA, therefore the resulting variable will have as much cases as are variables in the WA. The new casids created are the first four characters of the corresponding variable labels.

With the colon we might use another possibility to get the turnout numbers out of the documents, even in the case where these data are anywhere in a document; as long as the colon is preceded by "Turnout" it may be found with the following command:

    >EXTRACT VAR=20 COLON "Turnout"
You might specify "Turnout" or "Turn" etc because the program locates simply the non-blank string preceding the : character and checks only as many characters as specified in the parameter field. Note that a blank character may be inserted between the string and the colon.

The second example is analogous to the first; the third example shows a alphanumeric string that could be searched with:

   >EXTRACT V=20 PARENTHESES
then the memory mechanism described above would enter into action to supply a numerical value for the string 'against'.

The fourth example can still be used to extract information, because on the EXTRACT command you may specify a "string", which then is used as search sentinel, i.e.

    >EXTRACT V=20 "Liberal position"
will supply a value for the value or string immediately following that string.

As a sample dialog, let us assume that in our example regarding the position of the socialist party, the documents contain a series of party positions which do not conform to our predefined codes.

    >EXTRACT "Socialists" PARENTHESIS

Warning:Var# 20 overwritten 24 variables Enter value for "??": 4 Enter value for "yes": 1 Enter value for "no" 2 >DESCR 20 (type,n,table) label description #20 ( 1 24 0) Socialis Var extracted from documents

Instead of supplying a number you my just type <return>, then a missing value will be assigned for the string (a message will tell you that).

Note that the number of strings EDA remembers are limited; if more strings should be replaced you will be asked for each string to enter the corresponding value; EDA then replies with "Taken, but not remembered (stack full)".

Note the automatically constructed label and descriptor for the new variable #20.

Commands



DOC
            | vlist [ALL] <txt>] |  [NODISPLAY]
  DOC       | NOTE       [<txt>] |  [WRITE | PRINT]
            | HEAD               |
            | "name" [<txt>]     |

DOC | var | CASE | [<txt>] | [NODI] | | C=cas| [<txt>] | [FILE] | CASE GLOBAL [<txt>] |

<txt>::= LABEL | DESCR

DOC CREATE ["fname"]

DOC LIST [ "name" | NOVAR]

DOC LIMIT | [DISPLAY] | DROP | "level" | [INCLUDE] | EXCLUDE | UPTO | DOWNTO

The EDA documentation facility (see the preceding sections for more information) provides a means of documenting and describing data and other items.
Displaying documents
Without option the DOC command displays the document(s) for the specified variable(s). DOC HEADER and DOC NOTE display the header and the NOTE document. The "name" form displays the corresponding document, which is either a document related to a variable in the WA (same as DOC <vlist>) or any other named document related to the current WA. "name" may contain wild cards.

PRINT and NODISPLAY control printing and displaying. By default output from the DOC command is displayed on the screen and, if PDOC is set to ON (default), written to the print file, if a print file is active at that time. NODISPLAY inhibits displaying on the terminal. (The /D global option does exactly the same). PRINT is used to request printing, if PDOC is OFF. However a print file must be active, otherwise print has no effect at all. See SET PDOC for additional details.

WRITEwrites the output to the RAWOUT file, rather than display it on the screen. The RAWOUT file has to be connected before using this option (See the SET RAW command).

The text specified by a <txt> string is inserted in front of each line displayed/printed. This is useful e.g. when LIMITing the documents to show to what document the output belongs.

ALL displays also the documents normally excluded, i.e. the case related documents, normally only shown with DOC CASE described in the next paragraph.

Case oriented documents
The second command format is used to display case oriented documents. The CASE option displays all case documents for a variable, whereas the C=cas# option gets only the documents for case "casid". The GLOBAL option displays the case documents for the whole WA. The remaining options are described with the preceding format. Note that the LIMIT feature does not apply to case documents (in fact case documents are program "LIMITed" documents.
LIST available documents
DOC LIST displays the currently available documents. The default form displays the variables for which a document exists (variable number), as well as not variable connected documents by their name. NOVAR shows only documents not related to a variables. "name" shows the document(s) matching name; wild cards may be used.
LIMIT document display
DOC LIMIT allows for selection of specific records in the document file. On each document line, the first character may be used to qualify the documentation found on that line. This first character is never displayed. The simple form without additional options (default DISPLAY) displays the current active limits, if any are defined (default is none).

DROP is used to drop the active limit.

The the "level" option lets you specify up to 4 levels for INCLusion (default) or EXCLusion. The UPTO and the DOWNTO options are used to specify ranges of levels, which are specified by one character in the "level" parameter. UPTO "A" means all documents with level "A" and less are displayed. See the introduction for more details.

Create document files
DOC CREATE creates documents for the current WA from an external file (generic name EDADOC). See the introductory sections of this chapter for additional information on the structure of these files. Documents are usually created and updated using the EDA text editor TED, which has a series of commands designed especially to handle EDA documents (see TED). For more details refer to the applications manual.

EXTRACT
EXTRACT [vlist] <crit> [VARIABLE=target# | CENTER {NOMISS}] [CONTEXT]
                       [NEWCASID {LABEL=(st,end)] [M=miss]

<crit> ::= | [COLON {"crit" [ANYCASE]}] | | BRACKETS | | PARENTHESIS | | "crit" [ANYCASE] [ASIS] | | USER {"crit" {ANYCASE}] |

Other functions

| STACK | EXTRACT | LEARN | | RESET |

Produces numeric data from documents. This information is extracted from documents using using search sentinels (tags) or search strings, as specified by the remaining options.

Extract lets you create either variables into the WA (default or VARIABLE option) or centers (CENTER option). For each variable (all variables, or - if present - the variables in the vlist) a single value is extracted from the attached document. If there is no document, or no data item is found, a missing value is assigned (default -1, or as set with the SET REPVAL command, unless the Miss=val option is used to override the system value). If no target variable is specified the result is copied into the first free variable location in the WA.

If the CENTER option is used EDA replaces the current center (reference) values by the extracted information. If no value is found in the corresponding document a missing value is used, unless the NOMISSING option is specified, which lets the current value unchanged. A message informs you on how many centers have not been changed.

The command uses a search criterion to find the value for each variable. Either a sentinel character and/or a string is used to identify the information to extract. The default sentinel is the ":" character, i.e. EXTRACT searches for the first occurrence of ":" and extracts the information immediately following that character. This is useful when using structured documents. As the SEARCH and DOC command the EXTRACting process may be LIMITed to some document levels as set by the DOC LIMIT command; therefore if each document at document level "2" contains information of the form

Turnout : 67%

(voting turnout) the numeric information may be easily put into a variable in the WA (using DOC LIMIT to look only at level 2 and no other level).

If the document is not organized in levels a ":" will not be sufficient to identify correctly the item searched for in larger documents; therefore it is possible to use COLON "crit", meaning that the colon must be preceded by "Turnout" to extract the same information. The ANYCASE option is used to ignore lower/uppercase; if not present an exact search is performed.

If the first character of the information is numeric EXTRACT takes all numeric characters up to the next non-numeric character; i.e. in the 67% example the % sign terminates the information. If the first character is not numeric the first four characters are considered using the mechanism described below. EDA looks for the first non-blank character after the sentinel, therefore you may insert blank characters.

Other search criteria are available BRACKET and PARENTHESIS look for information enclosed in [] or (). "crit" searches for the specified string: default is to search the string and extract the string after the string found, even if the string is longer than the search string, i.e. EXTRACT "Turn" will extract the number in "Turnout 67"; if you specify ASIS then EDA takes the string immediately after the search criterion, i.e. "out" in this example.

Finally USER will ask for user defined sentinels: two sentinels are queried from the user, if the second sentinel is entered as a blank character "crit" may be specified as with the COLON option, otherwise a begin and end sentinel is used as with () or []: you might use e.g. {} or <> pairs to store specific information. Note that the ~ character may not be used as a search sentinel.

If the information is not numeric the first four characters after the first non-blank character are used to determine a numerical value. The program will ask you for a value for that string (replacement value). This value is remembered for the next occurrences. (placed on a memory stack). The number of strings EDA can record is limited; if the limit is reached you will have to enter a value for each occurrence of a string not recorded. The same mechanism is also used with the commands extracting information from labels and descriptors. Below you will find information on how to manage the memory stack outside an actual extraction procedure.

Normally EXTRACT does not modify the case identifiers (remember that when EXTRACTing you are creating "cases" from "variables"; therefore the case identifiers are not appropriate, except in the default cases with numeric case ids). The NEWCASID option creates new case identifiers from the variable labels (column names are set to unknown). An additional option LABEL=(start,end) lets you specify what portion of the label and descriptor is to be used as a new casid. For this purpose the label and descriptor are considered as a single concatenated string of 56 characters, the first 8 being the label and the remaining 48 the descriptor for the current variable.

The extracted variable is labelled "Extract", unless you use a "string" on a command line; then this string is used as variable label. It is however recommended to document, as always, it more precisely.

The second command format is used to manage the memory stack, which is also used by the LABEL command. Unknown alphanumeric strings are placed on the stack, when encountered in conversion, or the user can put them on the stack before converting, using the LEARN parameter. The STACK parameter displays the current strings and replacement values on the stack, the RESET parameter empties the memory stack. The stack has a limited depth, if the stack is full, and unknown values are encountered, the replacement values are not memorized.

For more details see user application manual (AM).

ResVars set by EXTRACT
    $0  Target variable.
    $1  Number of character items replaced (i.e. non-numerical
        items found and replace by numbers)
    $2  Number of variables for which no item has been found
        in the corresponding document (or no document existed
        for that variable).


RENAME
See the chapter on "General utility commands".

SEARCH
SEARCH [vlist] "string" [NOEXACT] [KWIC]
                [WRITE | {NODISPLAY}] [VARLIST]
Searches all variable related documents or selected documents (if a vlist is present) for "string" and displays all records containing an occurrence of string together with the variable label. NOEXACT By default an exact search is performed (i.e. the string must match exactly, including case. NOEXACT performs a case insensitive search. KWIC The KWIC option displays it in a non-sorted KWIC index format. WRITE WRITE writes the result of the search to the file RAWOUT which must be open at that time (see SET RAWOUT). If WRITE is used the results are NOT written to the printfile and no screen output is produced. NODISPLAY NODISPLAY inhibits display of the result on the screen (same effect as /D global option). VARLIST The VARLIST option puts the variables for which <crit> has been found into the current variable list, i.e. a subsequent DESCRIBE or VARS (or any command using the current vlist) will show or analyse these variables.
Limiting the search
The SEARCH command is also sensitive to document level limits, as set by the DOC LIMIT command; this allows to specify slightly more complex search criteria, as long as the document file is well structured.
Hints and other information
In the output file (print file or RAWOUT file) FILE) output is 132 characters wide instead of 80 on the screen and additional information is provided. (variable label, document level, line number (relative to the beginning of the document, 0 = document header).

A sorted KWIC index can be produced by sorting the output file on the appropriate column(s).