The p-code system

Introduction

The idea behind p-code is very elegant: it is an "ideal" assembly language to be run on a virtual microprocessor. Concretely, it means that it requires an interpreter, the p-machine emulator or PME, to emulate this virtual microprocessor. The big advantage is that you can write p-code "assembly" programs (or compile higher languages into p-code) and run them on many different machines. The language was optimized so as to be easy to emulate and to run smoothly on any machine. Apart from the TI-99/4A there were emulators written for the Z80, Z8000, 6502, 6509, Motorola 6800 and 68000 (Apple computers), HP-87, PDP-11, Intel 8080 and 8086 (PCs) , and GA440 microprocessors (and what on earth is the latter?).

This article is split in two pages. The present one discusses the p-system operation, another describes the TI-99/4A p-code card and accompanying software.

p-code is a fully relocatable language: all addressing is done relative to the current instruction or to the beginning of a segment. This allows for a very flexible operating system, that can load program segments in memory, move them as needed, reload them from disk at a different location, etc. The main drawback is that it is slower than real assembly language since it is interpreted. On the other hand, it is fairly compact and lends itself easily to compilation of high-level languages (mainly Pascal).

A typical p-system implementation consists in several layers:

  • The PME (pseudo-machine emulator) translates p-code into assembly, handles the stack and registers, etc. All parts of the PME that are not dedicated to directly emulate p-code form the RSP, or run-time service package. The section of the RSP that communicates with the BIOS is called RSP/IO.
  • The BIOS (Basic Input/Output Subsystem) allows for interfacing the PME with computer-specific hardware. Generally, this will be the ROM BIOS (on the TI-99/4A: the cards DSRs) with a thin wrapping layer to implement a few niceties specific to the p-system.
  • The operating system loads p-code programs, manages segments and the memory heap, deals with run-time errors, etc.
  • Programs compiled in p-code run whithin the operating system. These can be system-level utilities, such as an Editor or a Compiler, or user-written programs. It is also possible to run 9900 assembly-language programs within the p-code system.
  • The p-machine emulator (PME)
    Run-time service package (RSP)
    BIOS
    Operating system
    File formats


    The p-machine emulator

    Registers
    Stack
    Procedure nesting
    Instructions
    _Calls
    _Data access
    _Stack manipulation
    _Arithmetic & logic
    _Comparisons
    _Flow control
    _Arrays and Strings
    _Sets
    _Multitasking
    _Miscellaneous
    Standard procedures
    _Summary
    _Unit-related
    _String-related
    _Code pool management
    _Concurency procedures
    _Miscellaneous procedures
    _Compiler usage
    Faults and execution errors
    _Faults
    _Execution errors
    Data representation
    _Words (byte sex)
    _Long integers
    _Real numbers


    Registers

    Every microprocessor contains registers, i.e. a small amount of on-chip memory. These can be general purpose or dedicated registers. For instance, the PC register on a TMS9900 microprocessor points to the next instruction to be executed. By contrast, general-purpose registers can be used by the programmer for any purpose he fancies.

    The p-machine emulates only special-purpose registers. To perform any calculation you must use the memory stack:. push a number on the stack, push another number, perform an addition, and pop the result from the stack. This is a bit of a pain, unless you a are a fan of Hewlett-Packard calculators and their reverse-polish notation. On the other hand, it sure makes the p-machine easy to implement on any kind of system.

    The dedicated registers are implemented in the host system memory and hold various pointers and data required for proper operation of the p-machine. They are normally used only by the p-machine emulator itself, however there are two p-codes that let you fool around with them: LPR and SPR.

    IPC
    This register contains a pointer to the next p-code to be executed.

    SP
    Points to the word at the top of the stack. This pointer changes after execution of almost any p-code whenever something is push on, or poped off the stack.

    CURPROC
    This register contains the number of the current procedure (0-255) in the current segment.

    MP
    Contains a pointer to the current activation record (MSCW). This is required to access local variables that are allocated on the stack when a procedure is called. Global variables are accessed via BASE, or indirectly via EREC.

    CURTASK
    Points to the current task information block (TIB).

    READYQ
    Points to the TIB at the top of the list of tasks ready to run.

    EREC
    Points to the current environment record (E_Rec), which changes every time a call is made to a procedure in a different segment. The E_Rec contains many usefull values that may also be copied into a dedicated register. This makes the implementation of such registers optional since a given p-machine emulator (a.k.a. PME) can access them indirecly via EREC.

    EVEC
    Points to the current environment vectors. This is an exemple of a redundant register, since its content can be found in the E_Rec. However, it should be implemented on any PME.

    IORESULT
    Contains the completion code resulting from the last I/O operation (0-255). This is the only register that can be accessed direcly by the system, as it is located in the SYSCOM memory area.

    BASE
    Contains a pointer to the global variables. This register is optional since its value can be found via EREC.

    SIB
    Contains a pointer to the current segment information block. Optional, since such a pointer can be found in the E_Rec.

    CURSEG
    Points to the current segment in memory. Optional since such a pointer is found in the SIB.


    Stack

    As the p-machine virtual processor does not comprise any general-purpose register, all data manipulation must be performed on the stack. This is why basically any p-code affects the stack. For instance, to add two numbers you would push each of them on the stack (e.g. with LCB, load constant byte), add them (with the ADI p-code), and retrieve the result from the stack (here, using the SRO p-code to store it in global variable # 5):

                                            <-SP
    <-SP | 34 | <-SP
    <-SP | 10 | | 10 | | 44 | <-SP
    |//////| |//////| |//////| |//////| |//////|
    LCB 10 LCB 34 ADI SRO 5

    The stack is used by the p-machine to implement local variables when a procedure is entered. The caller must also place the procedure parameters on the stack prior to the call, and reserve room for the return value if the procedure is a function.


    Procedure nesting

    From the structure of the p-code language, you can tell it was invented by Pascal programmers: provision is made for procedures that are local to other procedures (i.e. nested into them) and a lot of p-codes are devoted to access these procedures or their variables. Let me illustrate this with an exemple:

    segment SEG1
      int VARG1
      proc GLOB1
    proc LOC1
    int VAR1
    end-of-LOC1
           proc LOC2
    end-of-LOC2
    end-of-GLOB1
      proc GLOB2
    proc LOC3
    int VAR3
    proc LOC31
    int VAR31
    Proc LOC311
    end-of-LOC311
    <<you are here>>
    end-of-LOC31
    end-of-LOC3
           proc LOC4
    int VAR4
    end-of-LOC4
    end-of-GLOB2
    end-of-SEG1

    In this example, GLOB1 and GLOB2 are global procedures, VARG1 is a global variable. LOC1 and LOC2 are procedures local to GLOB1, whereas LOC3 and LOC4 are local to GLOB2. LOC31 is a procedure local to LOC3 and LOC311 is local to LOC31. Note that at each level, there can be variables (in this case, integers) local to the procedure.

    Assume the program has reached the point indicated by <<you are here>>, in procedure LOC31. From there, you could:

    Similarly you can access the following variables:

    OK, I hope you got the picture, let's procede with p-code instructions.


    Instructions

    A p-code (pseudo-opcode) is always 1 byte long, but it can be followed by one or more arguments. There are several types of arguments:

    For your convenience I have arranged the p-codes in several families:

    Calls

    These p-codes are used to call a procedure. Before executing such an instruction, you should prepare the stack with the adequate parameters, as illustrated below. It is the responsability of the called procedure to pop each parameter from the stack and to leave the adequate result on it (if any). Note the practice of reserving space for the result before the first parameter. This generally will be one or two words, but may be upto four words, depending of the format used for real numbers.

                                                <                  < 
    | MSCW | | MSCW |
    < | data | | data |
    | param | | param | ... | param | <
    < ... | 0 | | 0 | | result| | result|
    |///////| |///////| |///////| |///////| |///////|
    prepare CXG 2,31 called proc RPU caller can
    for call calls a proc does its job return use result

    Within the program stream, each procedure entry point consists of two data words that immediately precede the first p-code to be executed. These are exit_ic that points to some (optional) exit code to be execute upon returning from the procedure and data_size that indicated how many bytes must be reserve on stack to store the procedure local variables. This space is marked as "data" in the above illustration.

    You may have wondered what MSCW stands for. That's "Mark Stack Control Word", also known as "activation record". A mark stack consists in five words:

    MSSTAT
    MSDYN
    MSIPC
    MSENV
    MSPROC

    MSPROC hold the procedure number of the caller, MSIPC is a segment-relative pointer to the point of call, MSENV contains a pointer to the E_Rec of the caller (i.e. its segment). MSDYN points at the MSCW of the caller and MSSTAT points at the MSCW of the lexical parent (i.e. the procedure into which the current procedure is nested, if any). The latter will be needed to access variables that are not local to the procedure.

    When a call is performed, the PME allocates space on the stack and initialises the MSCW. It sets IPC to point at the first p-code in the procedure, and update the CURPROC, EREC and EVEC registers. If there aren't 40 words free on the stack after data and MSCW have been reserved, a stack fault is issued that will be handled by the system.

    There are several p-code calls, depending where the procedure is located:

    CGP <UB>
    Call Global Procedure. The procedure number in passed with UB, an unsigned data byte included in the code stream, just after the CGP p-code itself.

    CLP <UB>
    Call Local Procedure, i.e. a procedure nested into the calling procedure. Again, UB is the procedure number and follows CLP in memory.

    CIP <DB><UB>
    Call Intermediary Procedure, i.e. a procedure which is a lexical parent of the current one. DB is a value from 1 to 127 that indicates how many nesting levels should be walked back to the target procedure, from MSCW to MSCW. UB is the procedure number.

    SCIPn <UB>
    Where n is 1 or 2. This is a short form of CIP, that calls procedures <UB>, and sets the MSSTAT field of MSCW to the lexical parent (for SCIP1) or grandparent (for SCIP2) of the calling procedure. It is shorter than CIP since the nesting level is part of the opcode, so it uses two bytes instead of three.

    CXG <UB1>,<UB2>
    Call eXternal Global procedure, same as CGP but targets another segment. UB1 is the segment number. If it's not in memory yet, a segment fault is issued so that the system has a chance to load the segment. UB2 is the procedure number.

    SCXGn <UB>
    Where n is in the range 1 to 8. This is a short form of CXG that calls global procedure UB in the segment indicated by n.

    CXL <UB1>,<UB2>
    Call eXternal Local procedure. UB1 is the segment number, UB2 the procedure number.

    CXI <UB1>,<DB>,<UB2>
    Call eXternal Intermediary procedure. UB1 is the segment number, UB2 the procedure number. DB is the number of nesting levels to be walked backwards through the MSCWs to find the procedure.

    CFP <UB>
    Call Formal Procedure. This allows you to code a procedure call "on the fly" rather than including it in the code stream. Three data words must be placed on the stack prior to execution:

               
    |proc-num |
    |e-rec-ptr|
    |stat-link|
    |/////////|

    Proc_num is the number of the procedure to be called, it will be placed in the CURPROC register.
    E_Rec_Ptr points to the environment record of the segment to be used.
    Stat_link is the value to be placed in the MSSTAT field of the MSCW.

    In other words, procedure <proc_num> in the segement indicated by <e_rec_ptr> is called. I'm not sure what's the use of <UB>, it may be a typo in the manual...

    LSL <DB>

    Load Static Link. DB indicates the number of static links to travel. A pointer to the MSCW that is DB static links above the current one (1=parent, 2=grandparent, etc) is placed on the stack. This comes handy to prepare the stack for a CFP instruction.

    RPU
    Return from ProcedUre. This is the instruction used to return from any procedure call. The various fields in the MSCW are used to restore the caller's environment:


    Data access

    Just as you can call global, local and intermediary procedures, you can access global, local or intermediary-level variables. You can also access global variables in another segment. For constants, you can either access constants located in the constant pool of the current segment or include a constant value whithin the code stream.


    Global Local Intermediary Extended Constant
    Load address LAO LLA, SLLA LDA LAE LCO
    Load value LDO, SLDO LDL, SLDL LOD, SLOD LDE LDC, LDCRL
    LDCB, LDCI, SLDC, LDCN
    Store value SRO STL, SSTL STR STE

    Let's start with global variables:

    LAO <B>
    Load glObal Address. Places the address of the global variable with offset B (in the global MSCW pointed by BASE) on the stack.

    LDO <B>
    LoaD glObal. Places the value of the global variable B on the stack.

    SLDOn
    Where n is a number from 1 to 16. This is a short form of LDO, used to load one of the first 16 global variable. The number of the variable is part of the p-code, and thus does not require and extra byte (or two) in the code stream.

    SRO <B>
    StoRe glObal. Stores the word currently on the stack into global variable B.


    Now let's see local variables. They are created on stack when a procedure is entered, between the calling parameters and the MSCW. They are specifed by their offset with respect to the MSCW.

    LLA <B>
    Load Local Address. Places on the stack the address of the local variable with offset B in the data area of the current procedure (i.e. on the stack, below the MSCW).

    SLLAn
    Where n is a number from 1 to 8. This is a short form of LLA, that loads the address of the first eight local variables. It is shorter than LLA because the number of the variable is part of the p-code instead of requiring an extra byte (or two).

    LDL <B>
    LoaD Local. Places on the stack the value of the local variable with offset B below the current MSCW.

    SLDLn
    Where n is a number from 1 to 16. This is a short form of LDL, that loads one of the first sixteen local variables.

    STL <B>
    STore Local. The word at the top of the stack is poped and placed into the variable with offset <B> below the current MSCW.

    SSTLn
    Where n is a number from 1 to 8. Short form of STL where the number of the local variable to modify is part of the p-code.


    Intermediary variables are local to a ancestor of the current procedure and are therefore present on stack, below the MSCW of the ancestor. They are accessed by walking up the chain of MSSTAT words in the MSCW of the respective procedures.

    LDA <DB>,<B>
    Load intermeDiary Address. Places on the stack the address of variable B below the MSCW found <DB> levels above the current one. DB=0 means the current procedure (i.e. local variables), DB=1 means the lexical parent in which the current procedure is nested, etc

    LOD <DB>,<B>
    LOad intermeDiate. Places on the stack the value of the variable with offset B below the MSCW found <DB> levels above the current one .

    SLODn <B>
    Where n is 1 or 2. Short form of LOD to access variables in the lexical parent (SLOD1) or grandparent (SLOD2) of the current procedure.

    STR <DB>,<B>
    SToRe intermediate. DB indicates the number of static links to travel to find the proper MSCW (0=current, 1=parent, etc). The variable with offset <B> in the target procedure will receive the word currently on top of the stack.


    It is also possible to access global variables in another segment.

    LAE <UB>,<B>
    Load Address Extended. The address of the global variable with offset B in segment UB is placed on the stack.

    LDE <UB>,<B>
    LoaD Extended. Places the value of global variable B in segment UB on the stack.

    STE
    STore Extended. Pops the word on top of the stack and stores it in global variable B in segment UB.


    Constant values may be placed in a dedicated memory area called the code pool, or included within the p-code stream.

    LCO <B>
    Load Constant Offset. B is the offset of a constant in the constant pool of the current segment (i.e. the constant "index"). The instructions places the address of this constant on the stack, in the form of a segment-relative offset.

    LDC <UB1>,<B>,<UB2>
    LoaD Constant. B is the offset of a constant in the constant pool of the current segment. The instructions pushes UB2 words on the stack, starting from this offset. UB1 is the mode, if it is 2 and the current segment is of the opposite byte sex than the host processor, bytes will be swapped upon copy.

    LDCRL <B>
    LoaD Constant ReaL. The real constant at offset B in the real sub-pool (a specialized portion of the constant pool) is loaded on stack. Depending on the format used by the code file, a real number uses up two or four words on the stack.

    With the following p-codes, the constant is part of the code:

    LDCB <UB>
    LoaD Constant Byte. Places the value <UB> on stack.

    LDCI <W>
    LoaD Constant Integer. Places the value W on stack.

    LDCN
    LoaD Constant Nil. Places the "nil" value on the stack. The value depends on the microprocessor running the PME. With the TI-99/4A it is >0000.

    SLDCn
    Short LoaD Constant. Where n is a number from 0 to 31. These p-codes place a number from 0 (SLDC0) to 31 (SLDC31) on the stack.


    Stack manipulation

    Now that we have placed variables on the stack, we can manipulate it.

    DUP1
    Duplicate 1 word. Pushes on top of the stack an extra copy of the word at top of stack.

    DUPR
    Duplicate real. Make an extra copy of the real at top of stack (two or four words depending on the real number format).

    SWAP
    Swaps the two words at the top of the stack.


    Real <--> integer conversions:

    FLT
    FLoaT. Replaces the integer at top of stack by the corresponding real number.

    TNC
    TruNCate real. Assuming the stack contains a real number, it will be replaced with the corresponding integer after truncation of the decimal part. If the real is not in the range -32768 to 32767 a floating point execution error is issued (and that's bad news!).

    RND
    RouND. Replaces the real number on top of the stack with the corresponding integer rounded to the nearest integer. If the number is not in the range -32768 to +32767, an execution error occurs.


    Address <--> value conversions.

    LDB
    LoaD Byte. Assuming a byte-pointer is at the top of the stack, it is replaced with the value of this byte (in the form of a word whose high byte is >00).

    STB
    STore Byte. Assuming the stack contains a byte pointer topped with a byte value (i.e. a word with value 0 to 255), this value is assigned to the pointed byte. The stack is cleared afterwards.

    IND <B>
    INDex. Assuming the word at top of stack is the address of a record, it is replaced by the value found at this address plus offset <B>. IND 0 is thus the equivalent of LDB for words.

    SINDn
    Where n is a number from 0 to 7. This is a short from of IND, used to access the first 8 words of a record.

    INC <B>
    INCrement. Assuming the word on top of the stack is an address, it is indexed by B (i.e. B is added to it).

    STO
    STOre. Assuming the stack contains an address topped with an integer value, this value is stored at the indicated address. Both are subsequently removed from the stack.

    LDRL
    LoaD Real. Assuming the top of the stack contains the address of a real number, it is replaced with the value found at this address (two or four words depending on the real number format).

    STRL
    STore ReaL. Assuming the stack contains an address topped with a real number, this number will be stored at the indicated address. Both the number and the address are then removed from the stack.

    LDM <UB>
    LoaD Multiple. Assuming the stack contains an address, it is replaced with a block of UB words found at this address. A stack fault is issued if less than UB+20 words are available on the stack.

    STM <UB>
    STore Multiple. Assuming the stack contains an address, topped with a block of UB words, these words will be stored at the indicated address. The stack pointer is then adjusted to remove the block and the address from the stack.

    LDP
    LoaD Packed. Assuming a packed-field pointer is at the top of the stack, it is replaced with the value of the field it points at. This value will be in the form of a right-justified word, padded on the left with "0" bits.

    STP
    STore Packed. Assuming the stack contains a packed-field pointer, topped with an integer value, the value it stored into the pointed field. Value and pointer are removed from the stack.

    MOV <UB>,<B>

    MOVe. The word on top of the stack is the address of a source block of words. The word below it is the address of the destination. B is the number of words to be copied from the source to the destination. If UB=0 the source is a variable, otherwise it is a constant and the word on top of the stack is its offset in the current segment. If UB=2 and the current segment has the opposite byte sex from the host processor, bytes will be swapped upon copy.


    Arithmetic and logic instructions

    OK, now that we have data placed on stack, we can perform math on them. Let's begin with integers:

    ABI
    ABsolute value Integer. Takes the absolute value of the integer found on top of the stack. Exception: -32768 stays as it is (since +32768 cannot be represented with 16 bits).

    NGI
    NeGate Integer. Negates the integer on top of the stack. -32768 remains unchanged.

    DECI
    DECrement Integer. Decrements by one the integer on top of the stack. Wraps around from -32768 to +32767.

    INCI
    INCrement Integer. Increments by one the integer on top of the stack. Wraps around from +32767 to -32768.

    ADI
    ADd Integer. Adds up the two integers on top of the stack and replaces them with the result. Overflow and underflow wrap around to the opposite sign.

    SBI
    SuBstract Integer. Substract the integer on top of the stack from the integer below it. Overflow and underflow wrap around.

    MPI
    MultiPly Integer. Multiply the two integers on the top of the stack and replace them with the result. The result is calculated as a 32-bit number, but only the least significant 16 bits will appear on stack (i.e. overflows result in crazy values).

    DVI
    DiVide Integer. Divides the second integer from the top by the integer on the top of the stack, and replaces them with the result. Truncation is always toward zero. Divisions by zero cause an execution error.

    MODI
    MODulo Integer. Adds or substract the word on top of the stack from the word below it until the remainder is in the desired range. Replaces both integers with the remainder. If the top word is zero an execution error occurs, if it is negative the result is undefined but no error occurs.


    And now, let's do the same for real numbers. Remember, depending on the adopted format, a real number can be two or four words long:

    ABR
    ABsolute Real. The real number on top of the stack is replaced with it absolute value.

    NGR
    NeGate Real. Negates the real number on top of the stack.

    ADR
    ADd Real. Adds the two reals on top of stack and replaces them with the result. Undeflow defaults to zero, overflow causes an execution error.

    SBR
    SuBstract Real. Substract the real number on top of the stack from the real number under it and replaces them with the result. Undeflow defaults to zero, overflow causes an execution error.

    MPR
    MultiPly Real. Multiplies the top reals on top of the stack and replaces them with the result. Undeflow defaults to zero, overflow causes an execution error.

    DVR
    DiVide Real. Divides the second real from the top by the real above it and replaces them with the result. Undeflow defaults to zero, overflow causes an execution error.


    Note that there is not p-code for modulo real. Let's conclude this section with a few logical operations:

    LNOT
    Inverts the word on top of the stack, bitwise (i.e. "0" bits are replaced with "1" and conversely).

    BNOT
    Same as LNOT, but only the least significant bit is kept. Thus the result will always be >0000 (false) or >0001 (true) no matter how many bits were set in the word.

    LAND
    The two words on top of the stack are "anded" bitwise and replaced with the result. Bitwise: 0 and 0 = 0, 0 and 1 = 0, 1 and 0 = 0, 1 and 1 = 1.

    LOR
    The two words on top of the stack are "ored" bitwised and replaced with the result. Bitwise: 0 or 0 = 0, 0 or 1 = 1, 1 or 0 = 1, 1 or 1 = 1.

    NB. Sadly enough, there is no p-code for XOR operations (exclusive or).


    Comparisons

    The following p-code are used to make comparison between values placed on the stack. The result of the comparison is generally a boolean word (i.e. >0001 for "true" and >0000 for "false) that will replace these values on the stack. The boolean can subsequently be used for a conditional jump (see next section).

    EQUI
    EQUal Integer. Places "true" on the stack if the two integers on top of the stack are identical, "false" otherwise.

    NEQI
    Not EQual Integer. Places "false" on the stack if the two integers on top of the stack are identical, "true" otherwise.

    GEQI
    Greater or EQual Integer. Places "true" on the stack if the second integer from the top of the stack is greater than or equal to the integer on top of the stack. This is a signed comparison, i.e. >8002 (a negative number) is smaller than >7030.

    LEQI
    Less Than or Equal Integer. Places "true" on the stack if the second integer from the top of the stack is smaller than or equal to the integer on top of the stack. This is a signed comparison, i.e. >FFFF (minus one) is smaller than >0001.

    GEUSW
    Greater or Equal UnSigned Word. Places "true" on the stack if the second integer from the top of the stack is greater than or equal to the integer on top of the stack. This is an unsigned comparison, i.e. >8002 is greater than >7030.

    LEUSW
    Lower or Equal UnSigned Word. Places "true" on the stack if the second integer from the top of the stack is smaller than or equal to the integer on top of the stack. This is an unsigned comparison, i.e. >FFFF is greater than >0001.

    EQREAL
    EQual REALs. Places "true" on the stack if the two reals on top of the stack are identical.

    GEREAL
    Greater or Equal REAL. Places "true" on the stack if the second real number from the top of the stack is greater than or equal to the real number on top of the stack. This is an signed comparison, by definition.

    LEREAL
    Less than or Equal REAL. Places "true" on the stack if the second real number from the top of the stack is lower than or equal to the real number on top of the stack. This is an signed comparison, by definition.


    The following operations deal with array of bytes. The stack can contain either the address of an array or its offset whithin the segment. Extra parameters in the code stream decide which it is.

    EQBYT <UB1>,<UB2>,<B>
    The two words on top of the stack should contain either the addresses of two byte arrays, or their offset in the current segment. UB1 and UB2 determine whether the address (UBx = 0) or the offset (UBx <> 0) is used: UB1 refers to the word on top of the stack, UB2 to the word below. B is the size of the array. The arrays are compared byte per byte in the natural order. If they are identical "true" is placed on the stack.

    GEBYT <UB1>,<UB2>,<B>
    Greater or Equal BYTe array. Same as above, but the comparison stops as soon as a mismatch is encountered. It returns "true" if the mismatched byte in the "bottom" array (pointed by the second word from the top of the stack) is greater than a byte in the "top" array, or if both arrays are identical.

    LEBYT <UB1>,<UB2>,<B>
    Less or Equal BYTe array. Same as above, but the comparison returns "true" if the first mismatched byte in the bottom array is lower than the corresponding byte in the top array (or if there is no mismatch).


    The following p-codes deal with strings. A string is essentially the same than an array of bytes, excepts that it begins with a length byte. Thus string comparisons begin with the second byte. If both strings are identical (upto the end of the shorter one), the size bytes are compared.

    EQSTR <UB1>,<UB2>
    EQual STRing. The two words on top of the stack should contain either the addresses of two strings, or their offset in the current segment. UB1 and UB2 determine wether the address (UBx = 0) or the offset (UBx <> 0) is used: UB1 refers to the word on top of the stack, UB2 to the word below. The strings are compared byte per byte from the second byte to the end of the shorter string. If they are identical, their first bytes (the size) are compared. If these are identical, "true" is placed on the stack.

    GESTR <UB1>,<UB2>
    Greater or Equal STRing. Same as above, but the comparison stops as soon as a mismatch is encountered. It returns "true" if the mismatched byte in the "bottom" string (pointed by the second word from the top of the stack) is greater than a byte in the "top" string or if both arrays are identical.

    LESTR <UB1>,<UB2>
    Less or Equal String. Same as above, but the comparison returns "true" if the first mismatched byte in the string is lower than the corresponding byte in the top string (or if there is no mismatch).


    The following p-codes deal with packed sets, and don't ask me why they are abbreviated PWR (packed words? power?). A set consists in upto 255 words to be accessed on a bitwise basis, preceded with a length word (that contains the number of words in the set).

    EQPWR
    The two sets on top of the stack are compared upto the end of the shorter one. "True" is pushed on the stack of no mismatch is found (even if the sets are not the same size).

    GEPWR
    "True" is pushed on the stack if the set on top of the stack is a subset of the set below is. Which means that all the "1" bits in the top set are also "1" is the bottom set (I think).

    LEPWR
    "True" is pushed on the stack if the set on top of the stack is a superset of the set below it, "false" is pused otherwise.


    Flow control

    Now that we have placed a boolean on top of the stack with a comparison, we can use the following p-code to perform conditional jumps. The address where to jump to is expressed as an offset in bytes, relative to the next instructio. This nifty feature allows a p-code program to be shifted in memory during execution without messing up all the addresses.

    TJP <SB>
    True JumP. Jumps by SB bytes if the boolean word on top of the stack is "true" (i.e. its last bit is 1).

    FJP <SB>
    False JumP. Jumps by SB bytes (relative to the next instruction) if the boolean word on top of the stack is "false" (i.e. its rightmost bit is 0).

    FJPL <W>
    False JumP Long. Same as above, but the offset is expressed as a word, which means that you can jump futher than away than -128 to +127 bytes. (And why on hearth is there no "true jump long" ?).

    UJP <SB>
    Unconditional JumP. Jumps by SB bytes (-128 to +127 relative to the next instruction). The stack is ignored and not affected.

    UJPL <W>
    Unconditional JumP Long. Same as above, but in the ranger -32768 to 32767 bytes relative to the next instruction.


    Alternatively, we can compare integers and perform the jump with a single instruction:

    EFJ <SB>
    Equal False Jump. Jumps by SB bytes (relative to the next instruction) if the two integers on top of the stack are different.

    NFJ <SB>
    Not equal False Jump. These guys like double negations! Jumps by SB bytes if it is false that the two words on top of the stack are not equal. In layman's terms this means that the jump if taken if the words are identical.


    Here is a more sophisticated form of jump:

    XJP <B>
    case JumP (what does the X stand for in "case"?). B is the offset of a jump table in the constant pool. The integer on top of the stack should be an index whithin that table. The corresponding word in the table is used as a jump offset, relative to the next instruction. A jump table begins with two words that indicate the minimum and the maximum index, if the integer on top of the stack is not in this range, no jump is taken.

    Example of a jump table created by the Pascal instruction:

    case I of                       >0002  Minimum index (table begins at 2)
    2: xxx ; >0005 Maximum index (table ends at 5)
    5: xxx ; >FFFA Jump offset for I=2
    3:xxx; >FFFE Jump offset for I=3
    end; >0000 No jump for 4 (as it does not exist)
    >FFFC Jump offset for I=5


    Two more p-codes that can fall in the category of flow controlling instructions:

    BPT
    BreakPoinT. A breakpoint execution error (number 16) is issued unconditionally. This will result in entering the debugger, if it is running. NB The debugger is not implemented on the TI-99/4A (rats!).

    CHK
    CHecK subrange bounds. The stack contains three integers, from top to bottom: the upper limit, the lower limit and the test value. If the test value does not fall whithing the specified range (as judged by signed comparisons) a value range execution error is issued. After the instruction, the two limits are removed but the test value remains on the stack.

    Examples:
    44 32 44
    12 -21 12
    18 0 1233
    OK OK Error



    Arrays and strings

    Arrays can contain elements of any size, but all elements in the array must have the same size. If the size is not a multiple of a byte (e.g. if it is 10 bits, or 3 bits) elements can be squeezed together, regardless of byte boundaries. This is called a packed array, just like in Pascal.

    A string is an array of characters terminated with a nul byte (>00).

    ASTR <UB1>,<UB2>

    Assign STRing. Copies the string whose address is at the top of the stack, into another string whose address is just below on the stack. UB2 is the size of the string. UB1 determines whether the source string is a variable (UB1=0) or a constant (in which case the word at top of stack is an offset in the constant pool). If the source string happens to be greater than UB2 bytes, a "string overflow" execution error occurs.

    CAP <B>

    Copy Array Parameters. Pretty much like CSP, but the parameter copied is a packed array of characters (i.e. is not necessarily nul-terminated). Thus, there can be no check for size, and <B> characters are always copied to the buffer.

    CSP <UB>

    Copy String Parameter. This instruction is usefull to copy a function parameter into a local variable:

    MyFunction( S1: string);
    {
    var S2: string;
    }

    For obvious reasons, string parameters are passed by pointers. The word on top of the stacks is the address of a two-word parameter descriptor. If the bottom word of the descriptor is NIL, the second word is the address of the string parameter (S1 in the above example). Alternatively, the bottom word can be a pointer to an E_Rec (environment record), in which case the second word is the offset of the string in the segment corresponding to this E_Rec. If this segment is not resident, a segment fault occurs.

    The second word from the top of the stack is the address of a local string buffer (S2 in the exemple) into which the string parameter will be copied. UB indicates the size of this buffer. If the parameter string happens to be larger, a "string overflow" execution error occurs.

    CSTR

    Check STRing index. The integer on top of the stack is an index inside a string whose address is in the word below it. If this index is less than 1, or greater than the dynamic length of the string, a "value range" execution error occurs.

    IXA <B>

    IndeX Array. The integer on top of the stack in an index inside an array whose address is in the word below it. B is the size in words of an array element. These two words are removed from the stack and replaced with the address of the corresponding element. The algorithm is: Result = Address + (Index * <B> * atype), where atype is 1 for word-addressed computers and 2 for byte-addressed computers.

    IXP <UB1>,<UB2>

    IndeX Packed array. The integer on top of the stack in an index inside a packed array (i.e. an array of bit-fields) whose address is in the word below it. UB1 is the number of elements per word, UB2 is the size in bits of an element . These two words are removed from the stack and replaced with the corresponding element.


    Sets

    A set consits in 0 to 255 words of bit flags, preceded by a word that indicates the total number of words in the set. Bits are numbered from 0, starting with the rightmost bit of the first word. A bit set to "1" indicates that the corresponding number is present in the set, a "0" that it is not a member of the set.

    Example:

    The set [0,3,5,7] is represented as:

    00000000 00000001  00000000 10101001
    Size in words (=1) Bits: 7 5 3 0


    ADJ <UB>

    The set on top of the stack is stripped of its length word, then expanded or compressed until is UB words in length. Compression removes the high words of the set (even if they do not contain zero!), expansion adds zero-words on top of the stack.

    If, after the operation, less than 20 words remain free for the stack to grow, a stack fault is issued.

    SRS

    SubRangeSet. Two integers in the range 0 to 4097 should be on top of the stack (if not, a "value range" execution error occurs). If the top word is the smallest, the empty set is placed on the stack. Otherwise, a set is placed on the stack in which all bits between these two integers (included) are set as "1".

    If less than 20 words remain available on the stack after execution, a stack fault is issued.

    INN

    INcluded iN. Checks for set membership. The word on top of the stack is the index of a bit in the set below it. If this bit is "1", the stack will contain "true" after the operation, otherwise it will contain "false". Note that bit numbering starts from zero, with the rightmost bit of a word.

    DIF

    DIFference. Removes the two sets on top of the stack and replaces them a set corresponding to their bitwise difference. The difference is computed as: Set1 AND NOT Set2 ( 0 DIF  0 = 0, 0 DIF 1 = 0, 1 DIF 0 = 1, 1 DIF 1= 0).

    INT

    INTersection. The intersection of the two sets on top of the stack is computed as Set1 AND Set2 and the resulting set replaces these two on the stack.

    UNI

    UNIon. The two sets on top of the stack are replaced with their bitwise union. The union is computed as: Set1 OR Set2.


    Multitasking

    WAIT

    Waits for a semaphore. The word on top of the stack is the address of this semaphore. If the semaphore value is greater than zero it is decremented and execution continues. If it is zero, the current tasks must wait for the semaphore to become available. To this end, the current TIB is placed in the semaphore's queue (and Hang_Ptr in the TIB will point to the semaphore). Then the p-system switches to another task.

    SIGNAL

    Signals a semaphore. The word on top of the stack is the address of this semaphore. If the semaphore is less than zero (or if its queue is empty) it is incremented and execution continues. Otherwise, the first task waiting for this semaphore is removed from the semaphore queue and placed in the queue of tasks ready to run (and the Hang_Ptr in its TIB is set to NIL). If this task has a higher priority than the current one, the p-system switches to it.


    Others p-codes

    These do not fit in any of the above categories.

    LPR

    Load Processor Register. The word on top of the stack is a register number. It will be replaced on the stack by the content of this register. The register numbers are:

    -3: READYQ
    -2: EVEC
    -1: CURTASK
    0: Wait-Q
    1: Prior | Flags
    2: SP_Low
    3: SP_Up
    4: SP
    5: MP
    6: (reserved)
    7: IPC
    8: Env
    9: Proc_Num | TIB_IOresult
    10: Hang_Ptr
    11: M_Depend

    SPR

    Store Processor Registers. The word on top of the stack is stored in the register whose number is in the word below it. See LPR for the register numbers. If the number is 0 or greater, all registers are first saved in the TIB, then the adequate value is placed in the TIB, finally all registers are restored from the TIB.

    NAT

    Enter NATive code. Starts executing assembly language right after this p-code. It may be necessary to increment IPC to a word boundary on word-oriented processors (such as the TMS9900).

    NAT-INFO <B>

    Same as NAT, but IPC will be incremented by B before entering assembly language (i.e. it acts as a jump). If B=0, the first execution begins with the byte right after B in the code stream. This allows place infomation for the native-code generators uptream of a stretch of assembly language.

    NOP

    No OPeration. Execution continues.

    RESERVEn

    Where n is a number from 1 to 6. These p-codes are reserved for internal used by compilers. Trying to execute them generates an "unimplemented" execution error.



    Standard procedures

    A number of standard procedures have been included whithin the PME, either for speed, or because they contain some assembly language. These procedures should be called with CXG or SCXG1. Most of them expect parameters on the stack, and some return a result on the stack. In any case, the initial parameters will be removed from the stack by the procedure itself. The standard procedures differ from user-written procedures in that they do not return with RPU. In a sense, they can be viewed as extra p-codes...

    For consistency I listed them in their Pascal form, but you can easily convert the declarations into a stack description. For instance, with the following procedure

    DUMMY(UNIT: integer, PTR:addr): Boolean

    the stack would look like this

     ______ <         
    | addr |
    | int | <
    | 0 | | bool |
    |//////| |//////|
    Before After

    To call a procedure from assembly or p-code (when using a p-code assembler), you should push zero on the stack if the function is returning a value, then push the parameters in the order they appear in the Pascal declaration, and finally call the procedure. When it returns, make sure you pop the return value from the stack, if there is one.

    Number Name Parameters Returns
    4 RELOCSEG EREC: E_Rec -
    14 MOVESEG SIB: addr
    POOL: addr
    OFFSET: integer
    -
    15 MOVELEFT SOURCE: byte_ptr
    DEST: byte_ptr
    LEN: integer
    -
    16 MOVERIGHT SOURCE: byte_ptr
    DEST: byte_ptr
    LEN: integer
    -
    18 UNITREAD UNIT: integer
    BUFFER: byte_ptr
    LEN: integer
    BLOCK: integer
    CTRL: integer
    -
    19 UNITWRITE UNIT: integer
    BUFFER: byte_ptr
    LEN: integer
    BLOCK: integer
    CTRL: integer
    -
    20 TIME HIWORD: addr
    LOWORD: addr
    -
    21 FILLCHAR DESTR: byte_ptr
    LEN: byte_ptr
    CHAR: integer
    -
    22 SCAN LEN: integer
    EXP: Boolean
    CHAR: byte
    SOURCE :byte_ptr
    MASK: word
    integer
    23 IOCHECK - -
    24 GETPOOLBYTES DEST: addr
    POOL: addr
    OFFSET: integer
    NBYTES: integer
    -
    25 PUTPOOLBYTES SOURCE: addr
    POOL: addr
    OFFSET: integer
    NBYTES: integer
    -
    26 FLIPSEGBYTES EREC: addr
    OFFSET: integer
    NWORDS: integer
    -
    27 QUIET - -
    28 ENABLE - -
    29 ATTACH SEM: addr
    VECTOR: integer
    -
    30 IORESULT - integer
    31 UNITBUSY UNIT: integer Boolean
    32 POWEROFTEN POWER: integer real
    33 UNIWAIT UNIT: integer -
    34 UNITCLEAR UNIT: integer -
    36 UNITSTATUS UNIT: integer
    STAT_REC: addr
    CTRL: integer
    -
    37 IDSEARCH SYMREC: addr
    BUFFER: addr
    integer
    38 TREESEARCH ROOT: addr
    FOUNDP: addr
    TARGET: addr
    integer
    39 READSEG EREC: addr integer


    Unit-related procedures

    All these procedures place status information in IORESULT, upon return. See below for details.

    UNITCLEAR(UNIT: integer)

    UNIT is the unit number. The procedure initializes the unit and returns its status into the p-machine register IORESULT.


    UNITSTATUS(UNIT: integer; STAT_REC:addr; CTRL:integer)

    UNIT is the unit number.
    STAT_REC is the address of a status record.
    If CTRL is 0 status is written, if CTRL is 1 status is read.


    UNITREAD(UNIT:integer; BUFFER:byte_ptr; LEN:integer; BLOCK:integer; CTRL:integer)

    This procedures reads a number of bytes from a unit into a buffer.

    UNIT is the unit number
    BUFFER is a pointer to the destination buffer
    LEN is the numbe of bytes to read
    BLOCK is the number of the desired data block within the unit
    CTRL selects the input mode (see BIOS for details)


    UNITWRITE(UNIT:integer; BUFFER:byte_ptr; LEN:integer; BLOCK:integer; CTRL:integer)

    This procedures writes a number of bytes from a buffer into a unit.

    UNIT is the unit number
    BUFFER is a pointer to the source buffer
    LEN is the numbe of bytes to write
    BLOCK is the number of the target data block within the unit
    CTRL selects the input mode (see BIOS for details)


    UNITWAIT(UNIT: integer)

    This procedure waits until all I/O on this unit is completed.


    UNITBUSY(UNIT:integer) :boolean

    This procedure returns TRUE if there are any pending I/O operations in the unit, and FALSE otherwise.


    IORESULT(): integer

    This procedure returns the value of the IORESULT register.


    IOCHECK()

    This procedure verifies that the IORESULT register contains zero. If not, an I/O execution error is issued.


    String-related procedures

    MOVELEFT(SOURCE:byte_ptr; DEST:byte_ptr; LEN:integer)

    This procedure moves a number of bytes from the source string to the destination string, starting from the left (low-order) byte. If LEN is 0 or less, nothing happens.


    MOVERIGHT(SOURCE:byte_ptr; DEST:byte_ptr; LEN:integer)

    This procedures moves a number of bytes from the source string to the destination string, starting from the right (high order) byte. If LEN is 0 or less, nothing happens.


    FILLCHAR(DEST:byte_ptr; LEN:byte_ptr; CHAR:integer)

    The destination string will be filled with the LEN copies of character CHAR. If LEN is 0 or less, nothing happens.


    SCAN(LEN:integer; EXP:Boolean; CHAR:byte; SOURCE:byte_ptr; MASK:word): integer

    This procedure looks for a character in a string.

    If EXP is 0, the procedure scans the string SOURCE for LEN bytes, until it finds the byte CHAR.
    If EXP is 1, the procedure scans SOURCE for a byte different from CHAR.
    LEN is the maximum number of bytes to scan. If it is negative, scanning proceeds from right to left.
    MASK is not used.

    The function returns the offset of CHAR within SOURCE (or LEN if no match was found).


    Code pool management procedures

    RELOCSEG(EREC:E_Rec)

    Relocates the segment pointed at by the environment record EREC. All types of relocations are performed.


    MOVESEG(SIB:addr; POOL:addr; OFFSET:integer)

    SIB is a pointer to a segment information block that should contain the address where the segment should be moved.
    OFFSET is the offset whithin the code pool of the segment to be moved.
    POOL points to a code pool descriptor, the first two words of which are a pointer to the base of the code pool.

    The segment found at the specified offset whithin the code pool will be moved at the location specified in the SIB and relocated. Only segment-relative relocation is performed (as other types are not affected by the move).

    GETPOOLBYTES(DEST:addr; POOL:addr; OFFSET:integer; NBYTES:integer)

    This procedure copies a number of bytes from the code pool into a buffer.

    DEST is the address of the destination buffer.
    POOL is the address of a code pool descriptor, the first two words of which point to base of the pool.
    OFFSET is the offset of the first byte to transfer, relative to the base of the pool.
    NBYTES is the number of bytes to transfer.

    PUTPOOLBYTES(DEST:addr; POOL:addr; OFFSET:integer; NBYTES:integer)

    This procedures copies NBYTES bytes from buffer SOURCE into a code pool described in POOL, starting at offset OFFSET. It's the opposite of the above.

    FLIPSEGBYTES(addr:EREC, int:OFFSET, int:NWORDS:int)

    This procedures flips the bytes within a number of words of a segment.

    EREC is the address of an environment record describing the segment.
    OFFSET is the word offset where byte flipping should start.
    NWORDS is the number of words whose bytes should be flipped.

    READSEG(EREC:addr): integer

    This procedures reads a segment into memory.

    EREC is the address of an environment record describing the segment. Inside the E_Rec is a pointer to the SIB (segment information block) that contains the address where the segment should be read.

    The procedure returns the contents of the IORESULT register.


    Concurency procedures

    ATTACH(SEM:addr, VECTOR:integer)

    This procedure associates a semaphore with a p-machine event. The semaphore will be signaled each time the desired event occurs.

    SEM is the address of a semaphore. If it is NIL, the event be detached from any existing semaphore.
    VECTOR is the number of a p-machine event vector. It should be in the range 0 to 63, otherwise nothing happens. I have no idea what the events are...

    QUIET()

    This procedure disables all p-machine events such that no attached semaphore is signalled until ENABLE is called.

    ENABLE()

    This procedure re-enables the p-machine events that were disabled by QUIET.


    Miscellaneous procedures

    TIME(HIWORD:addr; LOWORD:addr)

    Saves the value of the p-system clock into the indicated words. This clock is a 32-bit number, incremented 60 times per seconds.

    POWEROFTEN(POWER:integer): real

    Returns a real number equals to 10 to the power of POWER. If POWER is negative or too big, a floating point execution error is issued.


    Compiler usage procedures

    These two procedures are used by the Pascal compiler. They have nothing to do in the PME, and were only placed here to speed up compilation.

    TREESEARCH(ROOT:addr; FOUNDP:addr; TARGET:addr): integer

    This procedure searches the symbol table tree for a target string. Each tree node is expected to have the following structure:

     RECORD
    Name: PACKED ARRAY[0..7] of CHAR;
    Right_link: pointer;
    Left_link: pointer;
    END

    ROOT is the address of the (sub)tree in the symbol table.
    TARGET is the address of the string to look for, a packed array of 8 characters.
    FOUNDP is the address where to put the result of the search, i.e. where the string was found.

    If the target string was found, the function returns 0 and places a pointer to the leaf node that contains it into FOUNDP.
    If the target was not found, FOUNP will point to the last node that was scanned and the function return 1 of the target should be placed to the right of this node, -1 if it should go to the left.

    IDSEARCH(SYMREC:addr; BUFFER:addr)

    This procedure scans a line of text for a Pascal keyword.

    BUFFER is the address of the buffer to scan
    SYMREC is the address of a record with the following structure:

     RECORD
    SYMCURSOR: integer;
    SY: integer;
    OP: integer;
    ID: PACKED ARRAY[0..7] OF CHARS;
    END

    The procedure scans BUFFER, starting at position SYMCURSOR, looking for an identifier. An identifier is a string containing only letters, digits and underscore, that begins with a letter. Upon return the first 8 characters of the identifier will be placed in ID, using padding spaces if needed, and SYMCURSOR will be updated to point after the identifier. Finally, the function looks up a table of reserved words and placed the corresponding values in SY and OP.

    Here are the reserved words recognised by the procedure, and the return values. The table is laid out in two columns for convenience.

    ID SY OP ID SY OP
    AND 39 2 NOT 38 15
    ARRAY 44 15 OF 11 15
    BEGIN 19 15 OR 40 7
    CASE 21 15 PACKED 43 15
    CONST 28 15 PROCEDUR 31 15
    DIV 39 3 PROCESS 56 15
    DO 6 15 PROGRAM 33 15
    DOWNTO 8 15 REPEAT 22 15
    ELSE 3 15 RECORD 45 15
    END 9 15 SET 42 15
    EXTERNAL 53 15 SEGMENT 33 15
    FILE 46 15 SEPARATE 54 15
    FOR 2 15 THEN 12 15
    FORWARD 34 15 TO 7 15
    FUNCTION 32 15 TYPE 29 15
    GOTO 26 15 UNIT 50 15
    IF 20 15 UNTIL 10 15
    IMPLEMENT 52 15 USES 49 15
    IN 41 14 VAR 30 15
    INTERFAC 51 15 WHILE 23 15
    LABEL 27 15 WITH 25 15
    MOD 39 4 unknown 0 15



    Faults and execution errors

    Two types of problems can occur while executing p-codes: faults and execution errors. A fault is a benigne condition, that requires the assistance of the p-system to be fixed. After this is done, execution can continue. By contrast, an execution error is generally fatal and will result in terminating the program that caused it and reseting the p-system.

    Faults

    The PME only generates two types of faults: segment faults and stack faults.

    Segment faults occurs when a p-code tries to access a segment that is not (yet) in memory. These p-codes are: CAP, CSP, CXL, SCXGn, CXG, CXI, CFP, RPU, SIGNAL and WAIT. The p-machine signals the semaphore Real_Sem to allert the p-system and the fault condition is handled by the high priority task Fault_Handler, which is part of the KERNEL unit. Typically, this will result in loading the required segment into memory.

    Stack faults occurs when there is not enough room on the stack. Only p-codes that place multiple words on the stack check it for room (except for real numbers). These p-codes are: ADJ, CFP, CGP, SCIPn, CIP, CLP, CXL, SCXGn, CXG, LDC, and LDM. Here, Fault_Handler will try to compact the memory so as to free some more room for the stack.

    Before signalling Real_Sem, the PME saves some values into the SYSCOM area for Fault_Handler to use. These are:

    Offset Name         Usage                                         
    14 Real_Sem Semaphore that Fault_Handler is waiting for
    18 Fault_TIB Task info block of the faulting task
    20 Fault_EREC E_Rec of the segment to be loaded (current E_rec for stack faults)
    22 Fault_Words Number of words needed on stack (0 with segment faults)
    24 Fault_Type >80 = PME segment fault, >81 = PME stack fault

    Once the problem has been taken care of, control is returned to the p-machine emulator and the p-code that caused the fault is executed again.


    Execution errors

    When an execution error occurs, the PME calls the procedure Exec_Error, which is also part of the Kernel. This call is performed with CXG 1,2 after placing two zeros on the stack, followed by the error number.

    Normally, Exec_Error will not return but rather reset the p-system, although it is possible to cause it to return (for instance from a debugger program).

    The possible execution errors are:

    Number Name Culprit Description
    1 Value range error CHK, CSTR Index out of array bounds
    2 No proc in seg table CLP, CFP, CGP, SCIPn, CIP
    CXL, SCXGn, CXG, CXI
    Call to a proc whose dictionary entry
    is 0 (probably because it was't linked).
    5 Integer overflow Long integer routines Long integer cannot be converted to int,
    or too large to be poped from stack.
    6 Divide by Zero DVI, MODI, DVR, long integer Division by 0.
    8 Interrupted by user You! <Break> key pressed.
    10 I/O error IOCHECK IORESULT was not 0.
    11 Unimplemented Any unimplemented p-code Illegal or unknown p-code.
    12 Floating point error LDCRL, LDRL, STRL, FLT, TNC,
    RND, ASBR, NGR, ADR, SBR,
    MPR, DVR, EQREAL, LEREAL,
    GEREAL, POWEROFTEN
    Floating point overflow (i.e. result
    cannot be expressed as a float).
    13 String overflow CSP, ASTR, long integer routines Destination too small to hold the string.
    16 Break point BPT This will enter the Debugger, if any.
    18 Set too large SRS Set larger than maximum legal size.
    19 Segment too large READSEG Segment too large to be read
    (this should never happen).



    Data representations

    The p-system was designed to run on any computer system, provided the proper PME is written. However, different computers handle numbers in a different way. Everybody agrees on the definition of a byte, but things become more complicated when it comes to larger numbers.

    Words

    Almost everybody agrees that a word is a 16-bit value, i.e. is made of two bytes. However, there are two conventions on how to order the bytes within a word. For instance, suppose you want to place the number >1234 in memory, at address >A000-A001. There are two ways to do it:

    +-----+             +-----+
    | >34 | | >12 |
    +-----+ >A001 +-----+ >A001
    | >12 | | >34 |
    +-----+ >A000 +-----+ >A000
    Big-endian Little-endian
    TI, Motorola Intel, PCs

    With the first convention, the bytes are stored in "natural" order, i.e. the most significant byte comes first, when moving ahead in memory. With the second, the most significant byte goes into the highest address. TI uses the first convention, so does Motorola (and consequently Apple computers), whereas Intel uses the second and so do all PC clones.

    As p-code is to be ported on any machine, a considerable effort was put into it so that it is independent of this so-called "byte sex". Since all p-codes are bytes, that part was easy. But trouble began with constants and addresses, which are word-oriented. The following solution was adopted: 1) The p-system always uses the byte sex of the machine it runs onto. 2) Each segment in a code file contains a flag indicating the byte sex used by the compiler that created it. If it does not match the current byte sex, the PME will flip bytes in the relevant portions of this segment.


    Long integers

    Again, the definition of a long integer can vary from compiler to compiler. Therefore, in the PME, integers can have any size from 2 to 10 words (thats 160 bits, about 36 decimal digits!). When it's on the stack, an integer is always preceded by a length word, i.e. a number from 2 to 10, indicating the number of words that follow.

    +-----+
    |>0003| Size word
    +-----+
    |>1234|
    +-----+
    |>6635| 3-word Long
    +-----+
    |>8701|
    +-----+
    |/////|

    However, when a long integer is assigned to a variable, or stored on disk (e.g. as a constant embedded in a program), the length word is stripped and the long integer is forced into the default "long" size on the current processor. If this conversion is not possible, an "integer overflow" execution error is issued.

    To avoid that kind of problem as much as possible, the routine CVT should be used to incorporate long constants into a program. It will convert an integer (or any combination thereof) into a long integer constant of the size required by the target processor. Thefore the code file will not contain any long integer, which makes it fully portable. Long integers will be created on the fly, when the program is run on the target machine.

    Examples:

    To code      Use                               
    12 CVT(12)
    225543 CVT(22554)*CVT(10) + CVT(3)
    -873342 -(CVT(8733)*CVT(1000) + CVT(442))

    As you can see, all the parameters of CVT are integers (-32768 to +32767) and can be coded as words. CVT does the dirty work in converting them to a long integer of the appropriate size.

    The CVT routine actually calls a package of routines that perform long integer math. These are part of a Pascal unit called LONGOPS, that contains three procedures:

    FREADDEC reads a long integer
    FWRITEDEC writes a long integer
    DECOPS performs several math operations.

    A peculiarity of DECOPS is that it takes a variable number of parameters, depending on the operation to be performed. This is not correct Pascal, but it's sure more convenient this way! This is possible because DECOPS is actually an assembly routine, embedded into the Pascal unit. When calling DECOPS, the required parameters are pushed on the stack, followed by OP, the number of the operation to be executed (and by the return address, obviously).

    Valid DECOPS operations are:

    OP Name Parameters Return Description
    0 Adjust SIZE: word
    LINT: long
    fixedlong Strips the size word of LINT and converts it into SIZE words.
    If not possible: causes integer overflow.
    2 Add LINT1: long
    LINT2: long
    long Returns LINT1 + LINT2.
    Can cause overflow if the result is more than 10 words.
    4 Substract LINT1: long
    LINT2: long
    long Returns LINT1 - LINT2.
    Can cause overflow if the result is more than 10 words.
    6 Negate LINT: long long Returns -LINT
    8 Multiply LINT1: long
    LINT2: long
    long Returns LINT1 * LINT2.
    Can cause overflow if the result is more than 10 words.
    10 Divide LINT1: long
    LINT2: long
    long Returns LINT1 / LINT2.
    Can cause overflow or divide-by-zero errors.
    12 Long to String SIZE: word
    ADDR: word
    LINT: long
    - Converts LINT into a string and places it at address ADDR.
    SIZE is the maximum number of characters in the string, if it is
    exceeded a string-overflow error occurs.
    14 TOS-1 to Long TOS: long
    INT: word
    TOS: long
    RESULT: long
    Converts integer INT into a long integer leaving another long intact on the top of the stack.
    16 Compare TYPE: word
    LINT1: long
    LINT2: long
    boolean If TYPE = 0, returns LINT1 < LINT2
    If TYPE = 1, returns LINT1 <= LINT2
    If TYPE = 2, returns LINT1 >= LINT2
    If TYPE = 3, returns LINT1 > LINT2
    If TYPE = 4, returns LINT1 <> LINT2
    If TYPE = 5, returns LINT1 = LINT2
    18 Int to Long INT: word long Converts integer INT into a long integer.
    20 Long to Int LINT: long word Converts long integer LINT into an integer.
    Can cause overflow if the result doesn't fit in 16 bits.



    Real numbers

    Encoding real numbers in floating point format so they can be ahndled by a digital computer is quite a challenge. Not surprisingly, different people came up with widely different schemes. (Texas Instruments did a pretty good job with its 8-byte radix 100 format). But p-code is supposed to run on any machine, so how can it standardize real numbers?

    Well, actually it doesn't! Real numbers calculations are always performed in the format used by the machine on which the program runs. The appropriate floating point math routines are coded inside the PME.

    However, there is still a problem when it comes to program files as there may be a need to include real constants within a program. To solve this problem, the p-system defines two "canonical" formats for real constants. Whithin a program file, all real-number constants are grouped together in a so-called "constant sub-pool". When loading the program into memory, the p-system converts canonical real numbers into the format required by the target computer.

    For some reasons, there are two canonical formats in p-code: 3-words reals, and 3-to-6-words reals. They are unfortunately incompatible and all compilation units in a p-code program must use the same format. There is a word in each segment that indicates the real size being used by this segment.

    The 3-words format is quite simple:

  • The first word is a signed integer containing the exponent (-32768 to +32767)
  • The second word is a signed integer from -9999 to +9999 containing the first 4 mantissa digits, including its sign.
  • The third word is an integer from 1 to +9999 containing the last 4 mantissa digits, left justified. If this word is negative, it will be ignored.
  • The decimal point is assumed to be at the end of the last word in use (i.e. the second word if the third is negative, otherwise the third word).
  • The digits in the last mantissa word in use are left justified, i.e. the word is padded with zeros on the right. (e.g. 11 is coded at 1100, which is >044C).
  • The second format, is just an extension of the above one: extra words can be added upto a total of 5 mantissa words, to accomodate more decimal figures. Just as above, a negative number in words 3 to 5 serves as a terminator and all the following words will be omited. The decimal point is always assumed to be at the end of the last word in use, and the digits in this word will be left justified.

    Examples:

    Real number         Exponent    Mantissa                           
    1234.0 0 1234 -1
    -1234.0 0 -1234 -1
    12345678.0 0 1234 5678
    12345678.0 0 1234 5678 -1
    1.234 -3 1234 -1
    12.0 -2 1200 -1
    12345678.2233 -4 1234 5678 2233 -1
    111122223333.4444555 -8 1111 2222 3333 44444 5550
    1.0 E+70 73 1000 -1

    Note the two different ways to code number 12345678: the first one is the 3-words format, the next is the 3-to-6 words format (in this case, it's 4-word long).


    The RSP

    RSP stands for Run-time Service Package. It is the part of the PME that does not deal directly with interpreting p-codes, but rather handles various run-time chores. An important part of it is the RSP/IO that deals with I/O operations and provides a standardized interface with the underlying BIOS.

    The RSP/IO implements the concept of "device units". A device is a peripheral that should be accessed in a standard manner, through a few predefined operations. Note how this resembles the concept of DSRs used by TI.

    Devices
    Procedures
    _Ioresult
    _Unitread
    _Unitwrite
    _Unitclear
    _Unitstatus
    _Unitbusy
    _Unitwait
    Special characters
    Subsidiary volumes


    Device numbers

    Each device is refered in the p-system with a unique number, from 0 to 255. Some device numbers are pre-assigned by the system:

    0  (reserved)
    1: CONSOLE
    2: SYSTERM
    3: (reserved)
    4: DSK1
    5: DSK2
    6: PRINTER
    7:  REMIN
    8: REMOUT
    9:  DSK3
    10: DSK4
    11: DSK5
    12: DSK6
    13-127: additional disks, subsidiary volumes, user-defined serial devices
    128-255: user-defined devices.

    Extra units defined in the TI-99/4A p-system implementation:

    14: OS      Card GROMs, contain the system files for the main menu.
    31: TAPE Cassette tape recorder.
    32: TP Thermal printer.


    Procedures

    Ioresult

    Most of the procedures describes below will cause a completion code to be placed in the SYSCOM area, which is an area of memory shared by the p-system and the PME. You can retrieve this completion code with the procedure IORESULT. The codes are the following:

    0: No error
    1: Bad block, CRC error
    2: Bad device number
    3: Illegal I/O request
    4: Data-com timeout
    5: Volume went off-line
    6: File is no longer in directory
    7: Illegal filename
    8: No room on volume
    9: Volume not found
    10: File not found
    11: Duplicate file
    12: Attempt to open an already opened file
    13: Attempt to close a file that isn't open
    14: Bad format on reading a real or an integer
    15: Ring buffer overflow
    16: Disk is write-protected
    17: Illegal block number
    18: Illegal buffer address
    19: Bad text file size
    20-127: (reserved)


    Unitread

    This procedures reads a given number of bytes from a device into a memory buffer.

    In Pascal, its declaration would look like this:

    PROCEDURE UNITREAD(UNITNUMBER: INTEGER;
    VAR DATA_AREA: PACKED ARRAY[1...NBYTES-1] OF [0...255];
    NBYTES: INTEGER;
    [LBLOCK: INTEGER;]
    [CTRL: INTEGER;] );

    Ok, this is not correct syntax, since Pascal does not allow optional parameters, nor variable-length arrays, but you get the idea. The parameters are the following:

    UNITNUMBER: the number of the device unit as defined above.
    DATA_AREA: A pointer to the buffer where data is to be stored. This buffer is an array of bytes of length NBYTES. The pointer is actually made up of two words: a word base and a byte offset. Byte-oriented machines just add the two numbers to get the buffer address. Word-oriented machines compute the buffer address by indexing byte-wise from the word base.
    NBYTES: Number of bytes to transfer.
    LBLOCK: Logical block. This is an optional parameter used with disk devices. Each disk is considered as an array of logical blocks each 512 bytes in length (If this is not the real format of the disk, the BIOS must do the conversion, so that it appears so to the RSP). If this parameter is not specified, it will be passed as 0.
    CTRL: Optional control word parameter, will be 0 if ommited. Out of 16 bits, only 7 are currently defined. Note that a given unit may use only a few of them.

    UUUr rrrr rrrr CSPA


    In summary, the stack would look like this before the call:

                            <  
    | Ctrl word |
    | Log Block |
    | Num bytes |
    | Byte offset |
    | Word base |
    | Unit number |
    |/////////////|

    After the call, all parameters will be poped from the stack by UNITREAD itself.


    Unitwrite

    This procedure writes a given number of bytes to a device. Its format is exactly identical to that of UNITREAD.



    Unitclear

    This procedure is used to reset a device. At the RSP level, this means clearing any state flag. The device can then be reset to a known state by the BIOS, but this is a device-dependent operation. The Pascal declaration is:

    PROCEDURE UNITCLEAR(UNITNUMBER: INTEGER);


    Unitstatus

    This procedure is intended for a device to report its status into a buffer that can be upto 30 words in length. The meaning of these words will depend on the device. The Pascal declaration of UNITSTATUS is:

    PROCEDURE UNITSTATUS(UNITNUMBER: INTEGER;
    VAR STATUS_WORDS: ARRAY[0..29] OF INTEGERS;
    CTRL: INTEGER);

    UNITNUMBER is the unit number as defined above.
    STATUS_WORDS is a pointer to a memory buffer that can accomodate upto 30 integers.
    CTRL is control word. Only 1 bit is defined:

    UUUr rrrr rrrr rrrD


    Unitbusy

    This procedure was intended for assynchronous environments and will therefore always return "false" with the TI-99/4A system. Its Pascal declaration is:

    FUNCTION UNITBUSY(UNITNUMBER: INTEGER): BOOLEAN;


    Unitwait

    This procedure is intended for assynchronous environments. On the TI-99/4A system, it returns immediately and can thus be considered as a NOP. Its Pascal format is:

    PROCEDURE UNITWAIT(UNITNUMBER: INTEGER);


    Special characters

    As already stated, the main responsability of the RSP/IO is to handle communications with the BIOS in a standardized manner. But in addition, the RSP/IO is also in charge of processing a few special characters: two during output (CR/LF and DLE), and two during input (EOF and ALPHALOCK).

    CR / LF

    With the p-system, lines in a text file end with a single CR character (>0D), but this has the meaning of "carriage return, plus line feed". Many devices consider these two functions as distinct: CR brings the cursor left, while LF (ascii >0A) moves one line down. The RSP/IO will thus append a LF to every CR while outputing a text file. This behaviour can be suppressed by setting the NOCRLF bit as 1 in the control word.

    DLE blank compression

    When a text file contains a long stretch of spaces,such as an indented line, they are compressed in two bytes: the first one is the DLE character (ascii >10), the second is the number of spaces plus 32 (so as to make it a printable character). For instance, a line indented by 8 would start with >10 >28. The RSP/IO must decompress such lines and send the appropriate number of spaces to the BIOS, instead of the DLE code.

    EOF

    Several devices, such as CONSOLE, PRINTER or REMIN send a special character, called EOF, to signal that the end-of-file has been reached. In the p-system, EOF doesn't have a fixed ascii value. Rather, it is a variable stored in the SYSCOM area (a portion of memory shared by the system and the PME). The RSP/IO must therefore compare each incoming character to the one in SYSCOM and, if they match, process it as an EOF. With the TI-99/4A, the value of EOF is generally >83, which corresponds to Ctrl-C.

    The meaning of an EOF depends on the device: for CONSOLE the rest of the input buffer will be filled with zeros, for the printer and remote devices, the EOF character is copied into the buffer. In any case, input is terminated immediately.

    ALPHALOCK

    Apart for the hardware "alpha-lock" key on the TI-99/4A keyboard, the p-system also provides a software-based alpha-lock. When the special ALPHALOCK character is received, all following lower-case characters ('a' through 'z') will be converted to upper-case. This will continue until another ALPHALOCK character is received. As with EOF, there is no fixed ascii value for this character, rather it is defined by a variable in the SYSCOM area (with the TI-99/4A, the ascii value of this variable is generally >0C, i.e. FCTN-6).

    As mentionned above, it is possible to disable the handling of EOF and ALPHALOCK by setting the NOSPEC bit to 1 in the control word. Note however that this will also disable the handling by the BIOS of the other special characters (break, start/stop, flush and character masking).


    Subsidiary volumes

    The p-system offers the possibility of creating virtual drives, that are physically implemented as a partition of an actual drive. These are called subsidiary volumes. Another responsability of the RSP/IO is to intercept calls to a subsidiary volume and to translate them into the appropriate call to the physical unit on which the subsidiary volume is implemented. Concretely, this means changing the unit number and adjusting the block address.

    The SYSCOM memory area contains a unit table with an entry for each unit in the system. For storage devices, this record contains a physical disk unit and a block offset. The RSP/IO looks-up in this table for the proper unit number to use and adds to block offset to the LBLOCK parameter passed by the caller. After some sanity check for absent devices or illegal block number, it calls the BIOS with the new values.



    The BIOS

    The BIOS (Basic Input Output Subsystem) is of course machine-specific. It is in charge of interfacing the computer hardware with the PME via the RSP/IO and with p-code programs via the RSP/IO or directly. It retains the basic definitions of units, but does not have to handle them in a standardized manner.

    In general, each device driver in the BIOS will handle four operations: read, write, control and status. However, the parameters required for the corresponding function may vary according to the device.

    All BIOS functions should return a completion code to the RSP/IO, so that it knows if anything went wrong. These codes are a subset of those that can be returned by the IORESULT routine of the RSP/IO. The following codes can be returned by the BIOS:

    0: No error
    1: CRC error
    2: Bad device number (e.g. unimplemented use-defined device)
    3: Illegal I/O request
    4: Undefined hardware error (not the same as IORESULT)
    9: Device not on line (or not implemented in BIOS)
    15: Ring buffer overflow
    16: Disk is write-protected
    17: Illegal block number
    18: Illegal buffer address
    128-255: Hardware-specific error.

    Console
    Printer
    Disk drives
    Remote
    Serial
    User-defined
    System

    Console BIOS

    On the TI-99/4A, the default CONSOLE input channel is the keyboard, while the default output channel is the screen. Note however that it is possible to redirect each channel to another device (e.g. output to a printer or input from a file).

    CONSOLEREAD

    Only one parameter is needed for reading from the CONSOLE:

  • The byte to be transfered. This character is not echoed to the screen by the BIOS, but by the RSP/IO.
  • CONSOLE is expected to contain a type-ahead buffer, able to store from 1 to 32 characters (or more) received from the keyboard until they are passed to the RSP. If additional characters are entered from the keyboard once the buffer is full, the bell should ring.

    Optionally, all characters input from the console can be masked with >7F, as the left-most bit is always 0 in ascci. However, to accomodate special characters, it is possible to mask the input with >FF, i.e. to leave the characters unchanged. The value of the mask is set by the SETUP utility and stored in the SYSCOM area, an area of memory shared by the PME and the p-system.

    In addition, the BIOS recognises the following control characters that regulates the output channel:

  • STOP: suspends ouput to CONSOLE, until another STOP, a BREAK or a FLUSH character is received, or the console is reinitialized. If another STOP is received, ouput should resume normally from where it stopped. This allows to freeze the screen when data scroll up too fast to be read.
  • FLUSH: suspends output to CONSOLE and discards all characters until another flush or a break character is received, an input from CONSOLE is requested, or the console is reinitialized. This allows to quickly terminate lengthy output. Note that FLUSH has priority over STOP (i.e. has the same effect whether output is frozen or not).
  • BREAK: if the NOBREAK flag in the SYSCOM area (value >40) is set to 1, nothing happens. Otherwise, this character causes execution of a "break" routine whose address is determined at startup time. Once this routine returns, the BIOS continues as before. Note that the BREAK character is not passed to the RSP: it is the responsability of the "break" routine to inform the RSP about the break.
  • CONSOLEWRITE

    Only one parameter is needed for reading from the CONSOLE:

  • The byte to be displayed.
  • But the BIOS must also process some characters that have a special meaning:

  • CR (ascii >0D): brings the cursor to the beginning of the current line, in column 0.
  • LF (ascii >0A): brings the cursor down one line, in the same column. If necessary, scrolls the screen up.
  • BEL (ascii >07): generates a beep (if possible).
  • NUL (ascii >00): causes a delay equal to the time needed to output a character, but does not output anything.
  • Optionally, some additional functions can be offered by the BIOS. There is no fixed ascii code for these control characters, their value is determined at startup time by entries in the SYSTEM.MISCINFO file.

  • Move-up: moves the cursor one line up, in the same column. If necessary, scrolls the screen down.
  • Move-left, move-right: two control characters that move the cursor forward or backward respectively, without modifying the text. The behaviour at the beginning or the end of the line depends on the device (normally, the cursor stays where it is).
  • Home: moves the cursor to the upper left-hand corner of the screen, with no changes to the text.
  • Move-to: moves the cursor to an X,Y position without altering the text.
  • Erase to EOL: erases the current line from the cursor position to the end of the line. Leaves the cursor where it is.
  • Erase to end-of-screen: erases from the cursor position to the end of the screen. Leaves the cursor where it is.
  • The effect of other non-printable characters is not strictly defined in the p-system specifications.

    CONSOLECTRL

    The CONSOLE initialisation function should be passed two parameters:

  • The "break" vector
  • The SYSCOM pointer
  • The p-system stores the values for the special characters, the character mask, and some flags into an area of memory shared with the PME, called the SYSCOM area (this area is part of the Kernel segment). Upon initialisation, a pointer to SYSCOM is passed to the console BIOS. From this pointer, the offsets of the various control characters are the following:

    >003A: NOBREAK (bit value >40)
    >0052: EOF character
    >0053: FLUSH character
    >0054: BREAK character
    >0055: STOP character
    >005C: Character mask (>7F or >FF)
    >005D: ALPHALOCK character

    Another parameter passed to the console BIOS upon initialisation is the address of the "break" routine, to be called upon reception of a BREAK character. The console BIOS should store this address for further calls.

    Upon initialisation, the console BIOS should reset any pending flushed or stopped output status and clear the type ahead buffer.

    CONSOLESTAT

    The status routine requires two parameter:

  • The address of a memory buffer where to report the status
  • The control word that determines whether the input (1) or output (0) channel status is wanted
  • Only the first status word in the buffer is used:


    Printer BIOS

    PRINTERWRITE

    Only one parameter:

  • Byte to print
  • So as to be as general as possible, the RSP sends characters to the printer one at a time. If the printer expects characters to be passed as whole lines, it is the BIOS job to buffer the characters until an EOL (end-of-line) is encountered. Valid EOL characters are:

  • CR (ascii >0D): returns the carriage to column 1 but does not advance the paper
  • LF (ascii >0A): causes the paper to advance to the next line. The RSP always sends a LF after a CR. If the printer cannot handle them separately, the BIOS should ignore the CR and perform a "new line" when it receives the LF.
  • FF (ascii >0C): causes the paper to advance to the top of the next page and performs a CR. If the printer does not offer this possibility, the BIOS only performs a new line (i.e CR + LF).

  • PRINTERREAD

    Only one parameter:

  • Byte received
  • If the printer is able to return data, the BIOS will pass them directly to the RSP. Otherwise, input operations result in an error code 3 ("illegal operation").

    PRINTERCTRL

    Upon initialisation, the character buffer is emptied and a new line (CR + LF) is performed. No parameter is passed.

    PRINTERSTAT

    As is the case for the console, the printer status function takes two parameters:

  • A pointer to a memory buffer where status words have to be placed
  • A control word that determines which channel is to be checked (1=input or 0=output)
  • If the printer has no self-checking abilities, this function just returns 0. For the output channel, the BIOS returns the number of characters remaining to be printed, zero is interpreted by the RSP/IO as a signal that the printer is ready to print. A non-zero value indicates that sending a character to the printer may hang the system until the printer is ready.


    Disk BIOS

    Disks devices are considered by the RSP as a linear array of 512-byte records. The BIOS is in charge of converting this abstraction to actual parameters needed for disk access. Note that this feature can be overriden by setting the PHYSSECT bit in the control word so that the RSP itself can do hardware-level operations (i.e. pass the actual sector number).

    DISKREAD

    This function takes five parameters:

  • The logical block number
  • The number of bytes to transfer (0 to 32767)
  • The address of the data buffer
  • The drive number (0 for DSK1, 1 for DSK2, etc).
  • The control word as defined in UNITREAD in the RSP/IO section.
  • On input operations, the BIOS will only transfer the required number of characters, even if this is not a complete sector.

    DISKWRITE

    This function takes the same five parameters as DISKREAD.

    For output operations, if the number of bytes sent is not a multiple of 512, the remaining bytes are undefined. This means that the end of the "block" is likely to contain garbage. It is the responsability of the application program, to know where the valid data ends in the block.

    DISKCTRL

    Only one parameter is passed:

  • Drive number
  • Initialisation brings the drive to a state where it is ready to read or write. Any buffered data that has not been saved to disk yet will be lost.

    DISKSTAT

    The status function takes three parameters:

  • The drive number
  • A pointer to a memory buffer where status must be reported
  • A control word the defines which channel is to be tested (1=input, 0=output)
  • For disks, the status report consists in 4 words:

    Disk format issues

    The BIOS is in charge of converting the real, physical sectors, into logical 512-bytes sectors, no matter what the actual sector size may be. This may require some tricky buffering if the physical sector size is greater than 512 bytes...

    Generally, the BIOS will stay away from the first two physical sectors on the disk, since these are meant for the boot program. On the TI-99/4A, the BIOS fills the whole floppy with one huge file so that all disk operations from the p-system translate into file operations for the TI-99/4A DSRs. The first three sectors on the disk will be occupied by the disk directory, the list of file and the file descriptor record (FDR), which leaves 357 sectors of 256 bytes available for the p-system.

    Provision was made for the p-system to access physical sectors directly, with DISKREAD and DISKWRITE. To this end, you must set bit 1 (weight >02) in the control word. The number of bytes should be set as 0 since the whole sector will be transfered, no mater what size it is. Finally, the physical sector number must be passed as the first parameter, instead of the logical sector number. (NB The p-system designers reserved the right for the "number of bytes" parameter to be combined with the sector number, so that one could access more than 65536 sectors. Which is why it should be 0 for now).

    The correspondance between logical and physical sectors is for you to calculate, using the following formula:

    PhysicalSector = (TrackNumber * SectorsPerTrack) + SectorNumber -1
    TrackNumber = PhysicalSector / SectorsPerTrack
    SectorNumber = ( PhysicalSector mod SectorsPerTrack) +1

    Where "mod" is the modulo operation, i.e. the remainder of the rounded division. Note the addition/substraction of 1, due to the fact that physical sectors are numbered from one on, whereas logical sectors are numbered from zero on. No correction is necessary for tracks, since they are numbered from zero on.


    Remote BIOS

    This device is inteded to be a RS232 serial connection. The BIOS should not alter any character during transmission (e.g. not mask the most significant bit). The usual transfer rate is 9600 bauds.

    REMOTEREAD

    Single parameter passed:

  • Byte received
  • Since they are read one at a time, input characters are buffered as for the keyboard, in a "type-ahead" buffer.

    REMOTEWRITE

    Only one paramter is needed:

  • Byte to send
  • Bytes are sent one at a time by the RSP. The BIOS transmits them as such.

    REMOTECTRL

    Initialisation brings REMOTE to a state where it is ready to both send or receive. No parameter is passed.

    REMOTESTAT

    The status function takes two parameters:

  • A pointer to a memory buffer where status must be reported.
  • A control word the defines which channel is to be tested (1=input, 0=output)
  • Only the first word of the status buffer is used:


    Serial devices

    For instance, the serial COM port on a PC.

    SERREAD

    Two parameters are passed:

  • The device number
  • The byte to be read
  • SERWRITE

    Two parameters are passed

  • The device number
  • The byte to send
  • SERCTRL

    One parameter is used:

  • Device number
  • SERSTAT

    Three parameters are required:

  • The drive number
  • A control word the defines which channel is to be tested (1=input, 0=output)
  • A pointer to a memory buffer where status must be reported

  • User-defined devices

    The BIOS implementation of a user-defined device is left to the user. The only requirments is that the function return a completion code when finished, and that an error code 2 ("illegal unit number") is returned if the UNITNUMBER parameter does not match any device. User-defined devices should have numbers greater than 127.

    USERREAD

    Five parameters are passed to this function. The BIOS can choose to ignore any of them, of course:

  • The logical block number
  • The number of byte to transfer (0 to 32767)
  • The address of the data buffer
  • The device number, which corresponds to UNITNUMBER in the RSP call.
  • The control word as defined by the RSP/IO (in UNITREAD and UNITWRITE).
  • USERWRITE

    This function takes the same parameters as USERREAD.

    USERINIT

    Only one parameter is passed to this function:

  • Device number
  • What it does is up to the user.

    USERSTAT

    This function is passed three parameters:

  • The device number
  • A pointer to a memory buffer where status must be reported
  • A control word the defines which channel is to be tested (1=input, 0=output)
  • What it reports is up to the user that defined the device.


    System BIOS

    Although the operating system is not an I/O device, the RSP can still use the standard I/O function to address the system BIOS. Each function has a particular meaning:

    SYSREAD

    This function is reserved for future extension. Calling it causes an HALT (i.e. exits the p-system). Five parameters are passed:

  • The logical block number
  • The number of byte to transfer (0 to 32767)
  • The address of the data buffer
  • The device number
  • The control word for the RSP/IO.
  • SYSWRITE

    Reserved for future extension. Causes an HALT (i.e. exits the p-system). The same five parameter are used than with SYSREAD.

    SYSCTRL

    The function is passed two parameters:

  • The device number
  • The EVENT vector
  • Resets the clock value as >00000000.

    SYSSTAT

    Two parameters are required:

  • A pointer to a memory buffer where status must be reported
  • The control word
  • Returns 3 words in the status buffer:




    The Operating system

    The operating system is in charge of loading and running p-code programs in the host machine. One of its main tasks is to react to segment faults: each time the p-code program calls a segment that is not currently in memory, a segment fault is issued. The p-system kicks in, loads the required segment in the code pool, (possibly shifting other segment to make room, or even trashing one), then it returns control to the p-code program.

    The p-system manages a special memory stack, called the heap, for its internal use. Under some circonstances, the user can also store data on the heap.

    System units
    Code Pool
    Faults
    Heap
    SIBs
    E-Recs and E-Vects
    Tasks
    Syscom area
    File access


    System compilation units

    The p-system is written in Pascal. It consists in the following compilation units:

    Units Usage
    HEAPOPS
    EXTRAHEAP
    PERMHEAP
    Heap operations
    SCREENOPS Screen operations
    FILEOPS File and directories operations
    PASCALIO
    EXTRAIO
    SOFTOPS
    File-level I/O
    SMALLCOMMAND
    COMMANDIO
    I/O redirection and chaining
    STRINGOPS String operations
    OSUTILS Conversion utilities
    CONCURRENCY Concurrency management
    REALOPS Floating point operations, real numbers I/O
    LONGOPS Long integer operations
    GOTOXY Screen cursor control (may be user-supplied)
    KERNEL Resident part of the OS (segment 0). Code pool maintenance, fault handling, segments loading.
    GETCMD Subsidiary segment of KERNEL ( swappable). Processes user input, builds run-time environment.
    USERPROG Ditto. Contains user program (at power-up: builds the initial environment). Segment 15.
    INITIALIZE Ditto. Processes SYSTEM.MISCINFO, locates code files, builds the device table.
    PRINTERROR Ditto. Prints run-time error messages.


    Code pool

    The code pool is where p-code segments are placed for execution. It is managed by the system which can move or unload segments if more room is needed. The system always keeps the loaded segment as a continous block in memory. When new segments are loaded, they are appended at the end of this area. Note that assembly language segments, which are not relocatable, are not placed in the code pool , but rather on the heap.

    Depending on the machine, the code pool can be placed in between the stack and the heap (internal pool), or have a memory area for itself (external pool). I'm not sure which is the case for the TI-99/4A.

    The following variables are used by the code-pool management routines:

    SP_Low: Only relevant for an internal pool. Pointer to the lowest possible bound of the stack, i.e. one word above the top of an internal code pool. SP_Low is located in the TIB (Task Information Block).

    HeapTop. Only relevant for an internal pool. Pointer to the top of the heap. HeapTop is part of the HeapInfo record.

    Seg_Pool. Pointer to the Pool_Descriptor. Seg_Pool is part of the SIB (Segment Information Block).

    The Pool_Descriptor comprises the following fields:

    Record
    PoolBase : fulladdress;
    PoolSize : integer;
    MinOffset : memptr;
    MaxOffset : memptr;
    Resolution : integer;
    PoolHead : SIB_Ptr;
    Perm_SIB : SIB_Ptr;
    Extended : boolean;
    end;

    PoolBase is a 32-bits address that points at the base of the code pool.
    PoolSize is the size in words of the code pool. Set by the SETUP utility.
    MinOffset is the lower boundary of the code pool.
    MaxOffset is the upper boundary of the code pool. Same as SP_Low for an internal pool.
    Resolution is the offset in bytes for segment alignment, set in SYSTEM.MISCINFO. Segments always start at an address which is a multiple of this number.
    PoolHead points at the SIB at the base of the code pool.
    Perm_SIB points at a SIB that is always resident (currently GOTOXY).
    Extended is TRUE if extended memory is used. Set in SYSTEM.MISCINFO.

    When a segment fault occurs, the pool managment routines must load the required segment(s) before returning to the PME. The routines successively try to:.

  • Load the segment at either end of the code pool.
  • If the pool is internal, shift it towards the stack (or the heap), to try to make room at the other end. The pool can be moved all the way to the heap, but never closer than 40 words from the top of the stack.
  • Swap out swappable segments and group the others together, to make room for the incoming segment(s).
  • If even this fails, report a stack overflow, and reinitialize the system.
  • With an internal code pool, the managment routines must also respond to stack fault and heap faults. To free the necessary space, the routines try to:

  • Move the pool toward the stack, to free the required number of words for the heap (or conversely).
  • Swap out one or more swappable segment(s), compact the others together and shift the pool towards the stack (or heap).
  • If this does not work, report a stack overflow and reinitialized the system.

  • Fault handling

    There is a high priority task called FaultHandler in the OS, whose job is to handle memory fault. This process is started at boostrap time, but immediately waits for a semaphore. When a routine detects a fault it just signals the semaphore and FaultHandler takes over. FaultHandler deals with the fault (hopefully), then resumes waiting for the semaphore, which returns control to the faulty task.

    The semaphore resides in the SYSCOM area and is declared as:

    Fault_Message = RECORD
    Fault_TIB : TIB_Ptr;
    Fault_E_Rec: E_Rec_Ptr;
    Fault_Words: integer;
    Fault_Type : SegFault..Pool_Fault;
    END;
    Fault_Sem: RECORD
    Real_Sem, Message_Sem: semaphore;
    Message: Fault_Message;
    END;

    As you may have guessed, the record "Message" is used by the routine that generated the fault to pass information to FaultHandler. The contents of this record is pretty much self-explanatory, but just in case:

    FaultType indicates the type of fault (segment, stack, heap, pool, etc).
    TIB_Ptr points to the Task Information Block of the faulting task.
    Fault_E_Rec points to the Environment record of the current segment or of the missing segment (for segment faults).
    Fault_Words is the number of words needed (e.g. for stack faults). It's 0 for segment faults.

    Faults can be signaled either by the PME or by the OS itself. The PME detects only stack and segment faults, which have been described in more details in the relevant chapter.

    The OS routes all faults through the EXECERROR routine. It can issue a segment fault in only one occasion: when you try to use MEMLOCK to lock a segment that's not in memory. It can also issue heap faults, when you are using MARK, NEW, VARNEW or PERMNEW and there isn't enough memory on the heap.

    As discussed above, FaultHandler then tries to move around segments in the code pool, and free some memory. When doing this, it must of course make sure that the current segment will not be swapped out of memory! In case of a stack fault created by calling a routine in a different segment, both segments must be locked in memory.


    The Heap

    The heap is a separate stack used by the OS for its own purposes. Typically, the heap and the stack grow toward each other in memory, possibly with the code pool in between them. Which means that stack faults, and possibly segment faults, can affect the amount of memory available for the heap. Conversely, the heap management routines can issue a heap fault to request additional space.

    The heap managment routines are the following:

    MARK();

    Saves the location of the top-of-heap.

    RELEASE()

    Cuts the heap at the position of the corresponding MARK, discarding any variable located in this space (except those declared with PERMNEW). MARK and RELEASE can be nested, and care should be taken to match them properly.

    NEW(PTR: TYPE)

    Allocates room for a variable on top of the heap. PTR is a pointer to a variable type, which allows NEW to know how many words should be reserved. If TYPE has variants, NEW allocates enough space for the largest variant, unless instructed otherwise.

    DISPOSE(PTR: TYPE)

    De-allocates the space reserved by NEW for the variable pointed by PTR. Afterwards, PTR will contain NIL. If a specific variant of TYPE was requested with NEW, then it should also be requested in the corresponding DISPOSE.

    VARNEW(PTR: TYPE; NWORDS: word): word

    Allocates NWORDS words on top of the heap and places the address of the first one in the variable pointed by P. Returns the number of words effectively allocated which should be equal to NWORDS, or to zero if there wasn't enough space.

    VARDISPOSE(PTR: TYPE; NWORDS: word)

    De-allocates NWORDS words from the position pointed at by PTR and places NIL in PTR. It's extremely important that NWORDS has the exact same value than was unsed in the corresponding VARNEW call.

    PERMNEW(PTR: TYPE)

    Allocates memory for a variable just like NEW, but this space can only be reclaimed with PERMDISPOSE: even RELEASE won't destroy it. This function is reserved for internal use by the OS (e.g to chain programs, so that the next command is still available after a program has released its heap allocation).

    PERMDISPOSE(PTR:TYPE)

    This is the only way to de-allocate space reserved by PERNEW. Again, this procedure is reserved for system use.

    These procedures are located in three compilation units:

    HEAPOPS contains MARK, RELEASE and NEW.
    EXTRAHEAP contains DISPOSE, VARNEW , VARAVAIL, MEMLOCK and MEMSWAP.
    PERMHEAP contains PERMNEW, PERMDISPOSE and PERMRELEASE.


    Heap management

    The heap is managed by placing marks on it. There are two types of marks:

  • Marks placed by the MARK routine and used by RELEASE to cut the heap
  • Marks placed by NEW and VARNEW to reserve heap space for data.
  • Both types of mark share the common type MemLink:

    TYPE
    MemLink = RECORD
    Avail_list: MemPtr;
    NWords: integer;
    CASE Boolean OF
    true: (Last_Avail, Prev_Mark : MemPtr);
    END

    With MARKed marks, the variant is true, and we have the following fields:

    NWords is always 0 because these marks do not reserve any space.
    Prev_Mark points to the previous mark down the heap.The bottom mark has NIL in this field and the topmost mark is pointed at by the global variable HeapInfo.TopMark. This allows to easily walk the whole chain of marks on the heap, for instance to remove a mark with RELEASE.
    Last_Avail points to the top of the available space above the mark. Typically, this space will be bound by the next mark, or by an allocated variable (or by the code pool, in the case of the topmost mark on the heap).
    Avail_list points to a list of data marks, created by either NEW or VARNEW. The first record in this list constitues the lowest unallocated space above the mark.


    With data marks, the variant field is false, so Last_Avail and Prev_Mark don't exist:

    NWords is the amount of space reserved for the variable, including the two words occupied by the mark record itself. Avail_list points to the memory-reserving mark for the next variable.


    Global variables

    HeapInfo is an important variable used for heap maintenance. It has the following structure:

    VAR
    HeapInfo: RECORD
    Lock: semaphore;
    Topmark, HeapTop: MemPtr;
    END
     PermList: MemPtr;

    Lock is a semaphore used by heap-modifying routines, to make sure that the heap will only be modified by one process at a time.
    TopMark point to the topmost mark installed by the MARK subroutine, as described above.
    HeapTop points at the highest allocated space on the heap. It is used by FaultHandler to determine how close the heap is from the stack (or from the code pool, if it's placed in between the heap and the stack).

    PermList is another variable used for heap management. It points at the top of a separate list of memory-reserving marks used uniquely for variables allocated by PERMNEW. PermList can be NIL if no variable was allocated with PERMNEW.

    Example:

    Here is a example of heap structure. It contains three MARKed marks (red) linked together through their Prev_Mark fields (red arrows). In each, the Last_Avail field points to the highest portion of the heap that they are allowed to use for their data marks (black arrows).

    The top mark's chain of data mark consist in only one mark, that reserves space for the integer Dummy.

    The bottom mark does not point to any data chain any more, but there is some free space above it, indicating that some data may have been there, but disposed of.

    The middle mark point to a chain of data marks (blue) linked together by their Avail_List fields (blue arrows). In this case, the chain contains only two data marks, that reserve memory for the integer Foo, and the 3-word long-integer Bar, respectively. Note that here also, there is some empty space between Foo and Bar, where variables may have been allocated, then disposed of.

    :                  :
    | STACK or CODE |
    +------------------+ <---,
    | //////////////// | |
    | //////////////// | |
    +------------------+ | <---HeapInfo.HeapTop
    | Int: Dummy | |
    +------------------+ |
    | Avail_List = NIL | |
    | NWords = 1 | |
    +------------------+ <-, |
    | Avail_List -----|---' |
    | NWords = 0 | |
    | Last_Avail -----|-----'
    | Prev_Mark -----|-------,
    +------------------+ |<---HeapInfo.TopMark
    | Long: Bar | |
    | ( 3 words ) | |
    +------------------+ |
    | Avail_List = NIL | |
    | NWords = 3 | |
    +------------------+<-,<-, |
    | //////////////// | | | |
    +------------------+ | | |
    | Int: Foo | | | |
    +------------------+ | | |
    | Avail_List -----|--' | |
    | NWords = 1 | | |
    +------------------+<-, | |
    | Avail_List -----|--' | |
    | NWords = 0 | | |
    | Last_Avail -----|-----' |
    | Prev_Mark -----|----, |
    +------------------+<-, |<-'
    | //////////////// | | |
    +------------------+ | |
    | Avail_List = NIL | | |
    | NWords = 0 | | |
    | Last_Avail -----|--' |
    | Prev_Mark = NIL | |
    +------------------+<---'


    Heap managment tactics

    HeapTop is set to point at the new top of heap, as soon as space on top of the heap is requested by one of the heap management routines. This way, if a stack fault occurs, there won't be any conflict between these routines and FaultHandler.

    The OS uses the heap for its own purposes, but the heap is also available for your programs. To avoid conflicts, the system puts a mark on the heap ( called EMPTYHEAP) after your program's runtime environment has been built, but before the program is actually started. Once the program terminates, the system releases EMPTYHEAP, therefore destroying anything you may have placed on the heap (unless you used PERMNEWs, which you shouldn't). Note that your program's SIBs, E_Recs and E_Vects are part of the run-time environment and will therefore appear before the EMPTYHEAP mark. So do global data in your program, and in any units it uses.

    Among other things, the OS maintains a disk directory on the heap. It is pointed at by the global variable SysCom^.GDirP, but it is meant to be invisible for you. So before any heap operation (except DISPOSE), the system will remove this directory with DISPOSE and make its space available again.


    SIB: Segment Information Blocks

    A segment is considered active if it may be used by the current program, whether it is currently loaded in memory or still on disk. Each active segment is described in a Segment Information Block (SIB) on the heap. When a program requires a segment that is not yet active, an SIB is created for it on the heap. All SIBs are arranged in a double-linked list and reference each others via their Prev_SIB and Next_SIB fields.

    The structure of a SIB is the following:

    RECORD
    Seg_Pool : ^Pool_Des;
    Seg_Base : memptr;
    Seg_Refs : integer;
    Time_Stamp : integer;
    Link_Count : integer;
    Residency : -1...maxint;
    Seg_Name : PACKED ARRAY [0..7] OF CHAR;
    Seg_Leng : integer;
    Seg_Addr : integer;
    Vol_Info : VIP;
    Data_Size : integer;
    Res_SIB : RECORD
    Next_SIB : SIB_Ptr;
    Prev_SIB : SIB_Ptr;
    CASE boolean OF
    TRUE: Next_Sort: SIB_Ptr;
    FALSE: New_Loc: memptr;
    END
    M_Type : integer;
    END

    Seg_Pool points to the Pool_Descriptor for the pool in which the segment resides. If the segment is on the heap (as is the case for non-relocatable segments) or if there is no external code pool, Seg_Pool is NIL.
    Seg_Base contains the byte offset of the segment within the code pool, or the address of the segment on the heap. If the segment is not in memory, Seg_Base is NIL.
    Seg_Refs contains the number of outstanding calls to the segment. It is incremented each time a routine in the segment is called by an external segment, it is decremented when this routine returns to its external caller.
    Time_Stamp is used by the system to determined which segment is the least recently used, when segments should be swapped out to make room in the code pool. The Syscom area contains a 16-bit variable that is incremented every time a segment is exited. This variable is copied in the Time_Stamp field of the segment exited.
    Link_Count indicates how many external variables reference the SIB. The operating system uses several such variables, including the Prev_SIB and Next_SIB fields of other SIBs. Link_Count is incremented each time a pointer is set to the SIB and decremented when this pointer is destroyed. When Link_Count becomes zero, the SIB is removed from the heap.
    Residency is 0 for segments that are swappable. Values greater than zero indicate that the segment is locked in memory. Residency is incremented each time a program locks the segment (with the MEMLOCK function), and decremented each time it is unlocked (with the MEMSWAP function). A value of -1 indicates a segment that was position-locked at loading time and cannot be made swappable.
    Seg_Name contains the first 8 characters of the segment name.
    Seg_Len is the number of code words in the segment, including the relocation lists but not the reference lists.
    Seg_Addr contains the number of the segment's first block on disk.
    Vol_Info points to a volume information record that contains the drive number and diskname for the disk where the segment resides.
    Data_Size is the number of data words in the code segment's data segment. This only applies to principal segments (for others Data_Size is 0).
    Res_SIB is used to maintain the code pool by waking the list of SIBs. Its Prev_SIB field points to the previous SIB in the list, Next_SIB points to the next SIB in the list. An extra field is used by the code pool management routine to store temporary values: either Sort_SIB or New_Loc.


    E_Rec and E_Vect

    Many p-code instructions require a segment number as an argument. For instance, CXL 12,3 calls procedure number 3 in segment 12. However, there is no such thing as a library of all existing segments and their corresponding numbers. In fact, only one segment has a fixed number: KERNEL, which is segment number 0. For all others, the compiler assigns each segment an arbitrary number, from 2 to 255 (number 1 is reserved for the current segment). This may be a problem when trying to link units that were compiled independently: the same segment may have been given completely different numbers during the various compilations.

    Rather than having the linker patch the code to match the segment numbers, the p-system designers went for a more flexible solution. Firstly, the compiler appends a segment reference list to the codefile it creates. This is a list of segment names, together with the numbers that the compiler has assigned to them. Then, when the file is loaded, the system creates an environment vector (or E_Vect) for each segment. An E_Vect is just an array of pointers to other segments. This way, the segment numbers used by the p-codes are just indexes into the E_Vect. To find the relevant segment, the PME just reads the pointer found at a given position in the E_Vect.

    The structure of an E_Vect is the following:

    RECORD
    Vec_Length : integer;
    Map : array [1..Vec_Length] of ^E_Rec;
    END

    Vec_Length indicates the number of entries in the E_Vect, i.e. the number of segments it references (including itself).
    Map is an array of pointers to other segments, arranged according to the segment numbers assigned at compilation time.

    If you thought that the pointers in the E_Vect would point at other segments' SIBs, you were wrong. They point at special stuctures, called "Environment Records" or E_Rec. Each segment has an E_Rec, that contains pointers to important structures in this segment. The structure of an E_Rec is the following:

    RECORD
    Env_Data : memptr;
    Env_Vect : ^E_Vect;
    Env_SIB : ^SIB;
    CASE boolean OF
    TRUE: ( Link_Count : integer;
    Next_Rec : ^E_Rec;
    )
    END

    Env_Data points at this segment's global data, allocated on the heap when the program is executed.
    Env_Vect points at this segment's E_Vect, an array of pointers to other segments' E_Recs.
    Env_SIB points at this segment's SIB, which is allocated on the heap when the program is executed.
    The next two fields are only present in the E_Rec of the principal segment in a compilation unit:
    Link_Count indicates the number of other units that are using this principal segment.
    Next_Rec is used to maintain a chain of all active compilation units. It points to the next unit in the chain.

    Example:

    Let's use three segments, "Tic", "Tac" and "Toe". Tic is a principal segment that references Tac and Toe. Toe is a principal segment in another compilation unit, it also references Tac. Tac is a subsidiary segment, it does not reference any other segment. Note that each segment has an entry for itself in the E_Vect, but I omited these arrows, for clarity.

         "Tic" E_Rec
    +------------+
    | Env_Data | E_Vect 1 2 3
    +------------+ +-----+-----+-----+-----+
    | Env_Vect |----->| 3 | Tic | Toe | Tac |
    +------------+ +-----+-----+--|--+--|--+
    | Env_SIB | | |
    +------------+ | |
    | Link_Count | ,---------|-----'
    +------------+ | |
    ,--| Next_Rec | | |
    | +------------+ | |
    | | '-----,
    | ,------------------------' |
    | | |
    | +----------------------------------, |
    | | | |
    | V "Tac" E_Rec | |
    | +------------+ | |
    | | Env_Data | E_Vect 1 | |
    | +------------+ +-----+-----+ | |
    | | Env_Vect |----->| 1 | Tac | | |
    | +------------+ +-----+-----+ | |
    | | Env_SIB | | |
    | +------------+ | |
    | | |
    | ,----------------------------------|-----'
    | | |
    | V "Toe" E_Rec |
    '->+------------+ |
    | Env_Data | E_Vect |
    +------------+ +-----+-----+--|--+
    | Env_Vect |----->| 2 | Toe | Tac |
    +------------+ +-----+-----+-----+
    | Env_SIB | 1 2
    +------------+
    | Link_Count |
    +------------+
    | Next_Rec =0|
    +------------+

    Note that Tac is referenced as segment number 2 in Toe , but as segment number 3 in Tic. Therefore, the instruction
    CXL Tac,5 (which calls procedure 5 in Tac ) will be compiled as >93, >03, >05 in Tic, but as >93, >02, >05 in Toe.

    When executing CXL Tac,5 in the segment Tic, the PME reads >93,>03,>05, gets the third enty in Tic's E_Vect and obtains a pointer to Tac's E_Rec. Then it locates procedure number 5 in Tac and branches to it.

    Similarly, when encountering a CXL Tac,5 in segment Toe (which is coded >93,>02,>05), the PME gets entry number 2 in Toe's E_Vect, which is again a pointer to Tac. You get the idea?

    Finally, note the chaining between the principal segments: Tic's Nec_Rec points to Toe (red arrow). The corresponding entry for Toe is NIL, since Toe is at the end of the list in our example.


    Tasks

    The p-system is a multitasking operating system. That's right, it turns your TI-99/4A into a multitasking machine!

    A task can be seen as a "line of execution" of a program. At any time there can only be one task running (since the TI-99/4A only has one processor), but there can be many more tasks pending. The p-system achieves multitasking by quickly switching from one task to the next, providing the illusion that all tasks run at the same time.

    The "main task" is the thread of execution that starts the p-system, runs any utility or user program, then exits the p-system. Anywhere in this thread, subsidiary tasks can be launched that will execute "concurently" with the main task.

    Three things are required for multitasking:

    1. The task(s) body, i.e. the routine(s) that must run in this task, and its associated data.
    2. A TIB, or Task Information Block, into which the p-system saves values critical for the task.
    3. A task queue used by the p-system to schedule the next task. There may be several queues: one for the tasks ready to run and one for each semaphore (whether there are tasks waiting for it or not).

    TIB: Task Information Blocks

    A TIB is a structure allocated on the heap into which the p-system saves some critical values from the present task, before it switches to another. For instance, it saves the current instruction pointer (IPC) and most of the other p-machine registers. These values will be restored when the system switches back to this task.

    The structure of a TIB is the following:

    RECORD
    Regs: PACKED RECORD
    Wait_Q : ^TIB;
    Prior : byte;
    Flags : byte;
    SP_Low : memptr;
    SP_Upr : memptr;
    SP : memptr;
    MP : ^MSCW;
    Reserved : integer;
    IPC : integer;
    Env : ^E_Rec;
    Proc_Num : byte;
    TI_BIOResult: byte;
    Hang_Ptr : ^Sem;
    M_Depend : integer;
    END
    Main_Task : Boolean;
    Start_MSCW : ^MSCW;
    END

    IPC is the p-code instruction pointer. It is a segment-relative byte pointer.
    Proc_Num is the number of the currently executing procedure in this task.
    MP points to the local activation record (MSCW) for the task, on the stack.
    SP is the p-machine stack pointer.
    SP_Low is the lower bound of the stack for this task.
    SP_Upr is the upper bound of the stack for this task. Note that each subsidiary task has its own private stack, allocated on the heap when the task is started. By default this stack is 200 words, but this can be specified when starting the task. Only the main task uses the system stack.
    Env is a pointer to the task's E_Rec (from which you can find the the task's SIB).
    TI_BIOResult is used to save an I/O result that is local to the task.
    M_Depend contains machine-dependent data maintained by the PME. It is initialized as 0.

    Prior is the task priority. It's a number from 0 (lowest priority) to 255 (highest priority).
    Wait_Q is used when the task is in a waiting queue. It points to the TIB of the next task in the list.
    Hang_Ptr is used when a task is blocked by a semaphore. It points to the semaphore (it's NIL if the task is not blocked).
    Flags are reserved for future use.

    Main_Task is TRUE for root ("parent") tasks, FALSE otherwise.
    Start_MSCW points to the MSCW of the routine that started this task.


    Queues and semaphores

    At any time, there is only one task running. All other tasks ready to run are placed in a waiting queue. When it's time to switch tasks, the p-system fetches the appropriate task from the queue, switches to it, and places the current task back into the queue.

    If necessary, tasks can be synchronized by using semaphores. A semaphore is a variable that can be "grabbed" by a task. If another task attempts to grab the same semaphore it will hang, i.e. the task will be suspended until the original task has "released" the semaphore. At which point the hanging task will be able to grab the semaphore and resume execution.

    Actually, a semaphore can be grabbed an arbitrary number of times before it causes a task to hang. Similarly, a semaphore may be required to be released several times before is actually frees the task(s) it has hanged. In most cases however, a "grab count" of one (mutual exclusion semaphore, or "mutex") will do the job.

    Each semaphore has an associated queue of tasks waiting for it, similar to the queue of tasks ready to run. Note that this queue may be empty if no task is waiting for this semaphore. Hanging and releasing a task simply means moving it from the semaphore queue to the "ready-to-run" queue, or conversely.

    To grab a semaphore, use the p-code WAIT. It decrements the semaphore "grab count" or hangs the current task if the count is already zero. Which means that the p-system places the current task in the semaphore queue and switches to another task from the "ready-to-run" list.

    To release a semaphore, use the p-code SIGNAL. It fetches the task at the head of the semaphore queue and places it in the "ready-to-run" queue. A task switch will only occur if this task has a higher priority than the current one. If the semaphore queue is empty, the count is incremented and nothing else happens. The same thing happens if the semaphore count is negative: this allows to implement semaphores that must be signalled several times before hanged tasks can be released.

    You can also use the standard procedure ATTACH to automatically link a semaphore to a preset PME event. When this event occurs, the semaphore will be signaled. The standard procedures QUIET and ENABLE allow to globally disable or enable event-attached signalisation.


    Concurrency unit

    The p-system comprises a compilation unit called CONCURENCY, which implements Pascal processes (a process is the Pascal word for a task). The unit contains 3 procedures called START, STOP, and BLK_EXIT. The Pascal compiler automatically calls START in the initialisation code of a process, and STOP in its exit code. The compiler also calls BLK_EXIT in the exit code of a main process.

    These procedures make use of a global variable called Task_Info, which has the following structure:

    RECORD
    Lock : semaphore;
    Task_Done : semaphore;
    N_Tasks : integer;
    END

    Lock is a semaphore used to ensure that only one task at a time will be able to modify Task_Info.
    Task_Done is a semaphore used to wait for the termination of subsidiary tasks.
    N_Tasks is the number of subsiduary tasks that have been created by START.

    START

    This procedure is called to create a new task. It reserves space on the heap for the task's TIB, stack, and activation record, and then executes a WAIT p-code. This normally causes the p-system to switch to the new process (unless there is one with a higher priority pending). The new process performs its initialisation (i.e. copying its parameters) then executes a SIGNAL to return to START, which in turn returns to its caller.

    STOP

    This procedure records the termination of a task. It decrements Task_Info.N_Tasks and signals Task_Info.Task_Done. It then creates a dummy semaphore and waits for it, which causes a permanent task switch for the terminated process.

    BLK_EXIT

    This procedure is called by the main tasks exit code and waits for termination of all subsidiary processes. It waits for Task_Info.Task_Done until Task_Info.N_Tasks becomes zero. In other words, it waits until each subsidiary task has called STOP.


    Syscom area

    The SYSCOM area is a memory location where the p-system stores importants variables that can be shared by the PME, the RSP, the BIOS and the OS, as well as by your program. From the OS point of view, the SYSCOM area can be considered as a Pascal packed record and is declared as such in the interface of the KERNEL unit.

    Low-level assembly routines on the other hand, must know the offset of each variable with respect the the SYSCOM^ pointer that is defined when the system powers up. The problem here is that some of the variables are bytes, so their location will vary according to the byte sex. For instance, here are the offsets of the some keyboard special codes:

    Ctrl char  Little endian  Big endian
    EOF >52 >53 }
    FLUSH >53 >52 }
    BREAK >54 >55 }
    STOP/START >55 >54 }
    CHARMASK >5C >5D }
    ALPHALOCK >5D >5C }
    NOBREAK >3A >3B

    The pairs defined with braces are bytes in the same word. When the byte sex changes, the two bytes are flipped whithin the word.


    File access

    We saw that the RSP/IO consider every unit as an array of blocks, an illusion implemented by the BIOS. The OS takes things a level higher and introduces the concept of files. There can be two types of files: standard files that follow the Pascal conventions and are accessed with the GET and PUT functions. And intereractive files that accept characters one at a time. From the OS point of view, both types of files share a common structure called the File Information Block, or FIB.

    Structure of a FIB

    FIB = RECORD
    FWindow: Window_P;
    FEOF, FEOLN: Boolean;
    FState: (FJanW, FNeedChar, FGotChar);
    FRecSize: integer;
    FLock: semaphore;
    CASE FIsOPen: Boolean OF
    true: (FIsBlkd: Boolean;
    FUNIT: UNITNUM;
    FVID: VID;
    FRptCnt: integer;
    FMaxBlk: integer;
    FModified: Boolean;
    FHeader: DirEntry;
    CASE FSoftBuf: Boolean of
    true: (FNxtByte: integer;
    FMaxByte: integer;
    FBufChngd: Boolean;
    FBuffer: PACKED ARRAY[0..FBlkSize] OF CHAR
    )
    )

    FWindow points to the current character in the file.
    FEOF is the end-of-file flag.
    FEOLN is the end-of-line flag.
    FState is the file type: FJanW is the standard Pascal type (Jensen & Wirth), FNeedChar is an interactive file waiting for a character, FGotChar is an interactive file with a character.
    FRecSize is the size of a record, in bytes. It is 0 for unentered files, -1 for interactive and text files.
    FLock is a semaphore used to ensure that only one process at a time can access a file.
    FIsOpen is true when the file is opened, which makes more fields relevant:

    Disk directory

    On a disk or another storage device is a directory, which is nothing more than an array of 78 directory record. The first record (number 0) defines the directory itself, the others entries define files.

    The structure of the directory record is:

    DFirstBlk
    DLastBlk
    - DFKind
    Lenght (7) D
    I R
    N A
    M E
    DeovBlk
    DNumFiles
    DLoadTime
    DLastBoot
    (Year)
    DLastBoot
    (Month)
    DLastBoot
    (Day)


    The directory entries for files look like:

    DFirstBlk
    DLastBlk
    StatusBit DFKind
    Length (15) N
    A M
    E _
    O F
    _ T
    H E
    F I
    L E
    DLastByte
    DAccess
    (Year)
    DAccess
    (Month)
    DAccess
    (Day)


    File formats

    This section describes the file formats used by the p-system.

    Text files

    Code files
    _Segment dictionary
    _Segment header
    _Procedure dictionary
    _Constant pool
    _Relocation list
    _Segment reference list
    _Linker information

    Assembly files

    Text files

    A text file should only contain ASCII characters. It starts with a 2-block header that is used by the Screen Oriented Editor. If any other program uses a text file, the OS ignores the header. When a new file is created, its header blocks are filled with zeros.

    Text files always have an even number of block (so the smallest text file is 4 blocks long, including the header). Each pair of blocks makes up a page. Text does not wrap from one page to the next, if necessary the end of the page is filled with zeros.

    Each page contains lines terminated by <return> (ascii >0D). Optionally, if a line starts with a number of spaces, these can be compressed to save disk space. In this case, the line will start with the blank compression code (ascii >10), followed by the number of spaces plus 32.

    If you want to access text files from Pascal programs, use READ, READLN, WRITE and WRITELN, although GET and PUT also work and follow the Jansen&Wirth rules for text files.


    Code files

    Code files contain programs, or independently compiled chunks of programs called UNITs. These can contain both p-code and assembly language. In addition, the code file contains a lot of information on the code it carries, how to link the various units, etc.

    All in all, the structure of a code file is quite complex. It consists in a segment dictionary, and a series of code segments. To make things worse, the structure of a code segment varies slightly between p-code and assembly segments.

    Segment dictionary

    The first block in a code file is the segment dictionary. It can describe upto 16 of the segments that are part of the file. If there are more than 16 segments, extra blocks can be added to the dictionary. These extra blocks will be interspersed with the segments within the code file. Each block contains a link to the following block, so that the chain can be travelled.

    A dictionary block basically consists in 6 arrays, which each upto 16 entries:

  • Disk info
  • Segment names
  • Miscellaneous segment info
  • Pointers to interface text
  • Segment information
  • Segment familly
  • Followed by some dictionary info.

    Lets examine these arrays in detail:

    Disk info
    Seg_Dict = RECORD
    Disk_Info: ARRAY[0..15] OF
    RECORD
    Code_Addr: integer;
    Code_Leng: integer;
    END

    Code_Address is the block where the segments starts, relative to the beginning of the code file. Thus, segments always start on a block boundary.
    Code_Length is the number of words in the segment, including the relocation list, but not the reference list.
    All unused entries in this array will be zeros.


    Segment names
              Seg_Name: ARRAY[0..15] OF
    PACKED ARRAY[0..7] OF CHAR;

    These are the segment names, upto seven characters in length (truncated if necessary). Unused entries are filled with spaces.


    Misc segment info
              Seg_Misc: ARRAY[0..15] OF
    PACKED RECORD
    Seg_Type: Seg_Types;
    Filler: 0..31
    Has_Link_Info: Boolean;
    Relocatable: Boolean;
    END
     Seg_Types=(No_Seg, Prog_Seg, Unit_Seg, Proc_Seg, Seprt_Seg);

    Seg_type is the segment type. It will be No_Seg if this directory entry is empty (e.g.if there are less than 16 segments). Prog_Seg indicates the outer segment of a program, Unit_Seg the outer segment of a separately compiled unit. Proc_Seg segments contain procedures belonging to a program or unit. Seprt_Seg segments contain assembly language.
    Has_Link_Info indicates that the segment needs to be linked. The necessary linker information is included whithin the segment, starting on the next block boundary after the reference list.
    Relocatable indicates whether the segment is dynamically relocatable.If it is, it can be loaded anywhere in the code pool, and moved around if necessary. This is always the case for p-code segments. By contrast, statically relocatable segments will be loaded on the OS heap and locked there throughout their existence. Unless special care has been taken in writing them, all assembly segments are statically relocatable.
    Filler just reserves the remaining 5 bits in the packed record for future extension.


    Pointers to interface text
            Seg_Text: ARRAY[0..15] OF integers;

    This array contains the number of the block where the segment interface (in plain text) begins. The interface can be anywhere whithin the segment, provided its starts on a block boundary. It's basically a chunk of text file and thus follows the text format, except that the number of blocks does not need to be even. Its size is specified in the Seg_Familly array. Note that only unit segments have an interface section, all other segments will have zero in this array.

    Segment info
            Seg_Info: ARRAY[0..15] OF
    PACKED RECORD
    Seg_Num: 0..255;
    M_Type: M_Types;
    Filler: 0..1;
    Major_Version: Versions;
    END
     M_Types = (M_Pseudo, M_6809, M_PDP_11, M_8080, M_Z_80, M_GA_440, M_6502, M_6800
    M_9900, M_8086, M_Z8000, M_68000, M_HP87);
     Versions = (Unknown, I, II, II_1, III, IV, V, VI, VII);

    Seg_Num, surprisingly enough, is the segment number, in the range 0 to 255.
    M_Type define the language the segment contains. If it's p-code, the type will be M_Pseudo, and the type of p-machine will be indicated in Major_Version. If the segment contains assembly language, the processor for which it was written is indicated here.
    Major_Version indicates which version of the p-machine the p-code was written for. So p-code is not that portable, after all...


    Segment family
            Seg_Famly: ARRAY[0..15] OF
    RECORD
    CASE Seg_Types OF
    Prog_Seg, Unit_Seg:
    (Data_Size: integer;
    Seg_Refs: integer;
    Max_Seg_Num: integer;
    Text_Size: integer);
      Proc_Seg, Seprt_Seg:
    (Prog_Name: PACKED ARRAY[0..7] OF CHARS);
    END

    This array contains information that depends on the segment type. For program and unit segments, we have:
    Data_Size is the number of words in this segment's base data segment.
    Ref_Size is the number of words in the segment's reference list.
    Max_Seg_Num is the highest segment number that was assigned within this compilation unit (regardless of whether the referenced segments are part of the code file or not).
    Text_Size, as we just saw, is the number of words in the interface section of a unit segment. For program segments this entry is zero.

    Procedure segments and assembly language segment have only one field here:
    Prog_Name is the name of their parent unit (truncated to 8 characters if necessary). If it is unknown, which will generally be the case for assembly segments, this entry is filled with spaces.


    Dictionary info

    After the 6 arrays are a few variables that contain dictionary related info.

            Next_Dict: integer;
    Filler: ARRAY[1..2] OF integer;
    Checksum: integer;
    Ped_Block;
    Ped_Block_Count;
    Part_Number: RECORD A, B: integer; END
    Copy_Note: string[77];
    Dict_Byte_Sex: integer;
    END {of Seg_Dict}

    Next_Dict is the number of the next dictionary block whithin the file, if any. If there are no more blocks, it is zero.
    Part_Number is an arbitrary number for the program, as defined by its programmer.
    Copy_Note contains an optional copyright string.
    Dict_Byte_Sex always contains one. According to the machine on which the file was created, it will be coded either as >0001 or as >0100. This allows the system to determine the byte sex of the dictionary.
    Checksum, Ped_Block and Ped_Block_Count are used by an utility called QUICKSTART, that does not come with the TI-99/4A p-code system. For this reason, it is not described here.
    Filler reserves the last words in the block for futur use.

    And if you think all this is complicated, wait till we take a look at the structure of a code segment...


    Code segments

  • Code segments can contain upto 255 routines, numbered from 1 to 255. Within the segment is a procedure dictionary which holds pointers to all the procedures in the segment.
  • Constants embedded within p-code are actually placed in a separate area in the segment, so that bytes can be flipped if the byte sex of the target machine is different from that of the machine the file was created on.
  • Segments also contain a relocation list containing the addresses of the words that should be patched when the segment is moved from one area of memory to another.
  • Segments are assigned a name (8 characters long) and a number at compilation time. The name is used by the OS to stitch together the various compilation units referenced by a program. The number is used to reference the segment at run time, as described above.
  • Finally, a segment can also contain informations for linking it with other segments.
  • By order of appearance, here are the sections that may be contained within a code segment (many are optional):

    Segment header
    Procedure dictionary
    Constant pool
    Relocation list
    Segment reference list
    Linker information


    Segment header

    Each segment starts with a mandatory header containing pointers to the various tables, and some segment-related info:

    Proc_Dic: integer;
    Reloc_List: integer;
    Seg_Name: PACKED ARRAY[1..8] OF CHAR;
    Byte_Sex: integer;
    Cte_Pool: integer;
    Real_Size: integer;
    Part_Number: RECORD A, B: integer; END

    Proc_Dic is a pointer to the procedure dictionary. It's an offset in words from the start of the segment.
    Reloc_List is a pointer to the relocation list. Zero if there is no relocation list.
    Seg_Name is the name of the segment.
    Byte_Sex is always 1. If the machine that loads the segment has a different byte sex than the one which wrote it, Byte_Sex will be read at 256 (>0100). This lets the OS know that bytes should be flipped whithin this segment. Of course, p-code is byte-sex independent, but constants embedded within the code are not and their bytes may need to be flipped. So are the various pointers in this header and in the procedure dictionary.
    Cte_Pool is a pointer to the constant pool. It can be zero if there is no constant pool in this segment.
    Real_Size is the size of floating point numbers used by this segment.
    Part_Number is a record of two integers reserved for future use (e.g. to hold a version number).


    Procedure dictionary

    The procedure dictionary is nothing more than a list of pointers to each and every procedure in the segment. The list grows downwards, from the address pointed at by Proc_Dic. In other words, procedure numbers can be considered as negative indices whithin the dictionary. The top word in the dictionary (index 0, pointed at by Proc_Dic) is the number of procedures in the dictionary. There may be empty entries in the dictionary, whose content is 0. These correspond to procedures that were declared as EXTERNAL or FORWARD routines.

    +------------+                   +-----------+
    | # of procs | | exit code |<-,
    +------------+ <-- Proc_Dic | | |
    | Proc #1 | : : |
    +------------+ | code | |
    | Proc #2 | +-----------+ |
    +------------+ | Data_Size | |
    | Proc #3 ---|-----------------> +-----------+ |
    +------------+ | Exit_IC --|--'
    | Proc #4 | +-----------+
    +------------+
    | etc |

    The procedures pointers are word offset relative to the start of the segment, just like Proc_Dic itself. Which means that they may need to be flipped if byte sexes do not match.

    Procedures are aligned on word boundaries and always begins with two words.

    Exit_IC: integer;
    Data_Size: integer;

    The pointer in the procedure dictionary points at Data_Size, which is immediately followed by the executable code, whether p-code, assembly or a mixture of both.

    Data_Size is the number of words of local space that the procedure needs for its variables. It does not include the calling parameters that are assumed to be already on stack when the procedure is called. No local space is reserved for pure assembly routines. For mixed language routines, Data_Size is negative if the first instruction is in assembly.

    Exit_IC points to the code that must be executed when a p-code procedure is terminated. It is a offset in bytes (not words!) from the start of the segment. Exit_IC is undefined for assembly language routines, and for mixed procedures that begin with assembly. For this reason, compilers that produce mixed code procedures generally start them with at least one p-code (e.g. NAT, or NOP and NAT), so that Exit_IC will be defined.


    Constant pool

    Multi-word contants embeded in p-code are pulled out by the compiler and placed in a dedicated area, just after the last procedure in the segment. The area is pointed at by Cte_Pool. All p-code instuctions that use constants refer to them as offsets from the bottom of the code pool. This strategy allows p-code to be completely byte-sex independent. Of course, the code pool is not and its bytes will need to be flipped if the machine on which the program is loaded does not have the same sex that the machine on which the program was written.

    Real numbers in floating point format represent a special case. These are coded in one of the two canonical format described above. The word Real_Size in the segment header indicates the real number format used by the machine that created the program (this word is valid even if no real number was actually defined). It will be 32 for the 3-words format (32 bits) and 64 for the 3-to-6 words format. Real numbers are clustered together in a separate region of the constant pool, which is pointed at by the very first word in the pool. The first word in the real sub-pool is the number of real constants that follow.

    Example:

    | Proc dict  |
    +------------+
    | Constant |
    : ... :
    | Constant |
    +------------+
    | Real |
    : ... :
    | Real |
    +------------+
    | # of reals |
    +------------+<-,
    | Constant | |
    : ... : |
    | Constant | |
    +------------+ |
    | Sub-pool -|--'
    +------------+ <--- Cte_Pool
    | Last proc |


    Relocation list

    The relocation list is generally placed after the procedure dictionary. It is pointed at by the Rel_List word in the segment header. It contains a list of all addresses that need to be patched when the segment is loaded in memory, or moved around. Note that only assembly needs a relocation list, since p-code is address-independent.

    The relocation list is made of any number of sublists. Each sublist consists in a header that specifies the size of the sublist and the type of relocation required, followed by a list of pointers to the addresses that need to be patched. Here is the structure of this header:

    List_Header = PACKED RECORD
    List_Size: integer;
    Data_Seg_Num: 0..255;
    Reloc_Type: Loc_Types;
    END
    Loc_Types = (Reloc_End, Seg_Rel, Base_Rel, Interp_Rel, Proc_Rel);

    List_Size is the number of pointers that follow the header in the sublist, i.e. the number of addresses to patch.
    Data_Seg_Num is only used by sublists of type Base_Rel and is zero for other types. It contains the local number of the data segment to which the pointers in the sublist are relative. Its value is normally assigned by the Linker.
    Reloc_Type specifies the type of relocation requires. Reloc_End (value = 0) serves as a terminator to indicated that there are no more sublists in the relocation list. The Proc_Rel type (procedure-relative) is produced by the Assembler but will be changed to Seg_Rel (segment-relative) by the Linker, so you should never encounter it in a linked code file.

    Example:

    Note that the relocation list grows from top to bottom: Reloc_List points at the sublist with the highest offset in the segment, whereas the sublist with a Reloc_Type of zero (Reloc_End) is at the bottom.


    Segment reference list

    The segment reference list is only needed by Unit segments. It is necessary because segment numbers are assigned arbitrarily by the compiler within a compilation unit. The segment reference list associates these unit-specific numbers the with the corresponding segment names, so that the OS can set up the proper E_Vect and E_Rec structures when building up the run-time environment. Once this is done, the segment reference list is discarded, so it won't use up any memory at execution time.

    The list is placed at the end of the segment, after the relocation list (if any). Its location can be deduced by using the field Code_Length in the segment dictionary as an offset from the beginning of the segment. The size of the list (in words ) is also found in the segment dictionary, in the word Ref_Size.

    The list consists simply in a series of records with the following structure:

    Seg_Reg = PACKED RECORD
    Seg_Name: PACKED ARRAY[0..7] OF CHAR;
    Seg_Num: 0..255;
    Filler: 0..255;
    END

    Seg_Name is the 8-character segment name.
    Seg_Num is the number this segment was assigned in this compilation unit.
    Filler is reserved for future extensions.

    The end of the list is marked by a record with a Seg_Name consisting in 8 spaces, and a Seg_Num of zero. It could also be calculated from the Ref_Size value (5 words per record).

    Dummy records with Seg_Name "***" are generated so that the OS can execute the initiation and termination code of a unit. When loading a program, the OS creates a list of all units that contain such a record, finishing with the main program. It then calls procedure number 1 in the first unit of the list. Procedure number 1 is reserved for the initialization/termination section of a unit. When done with initialization, the first unit calls the next one, which calls the next, etc, until the last unit calls the main program. When the program terminates, the *** list is popped and the same procedures are returned to, thereby travelling the chain back in reverse order.


    Linker informations

    Code files produced by the Assembler and the Compiler may contain some information for the Linker to use in linking assembly language segments to p-code segments. Segments produced by the Assembler always have linker information, segments produced by the Compiler only have some if they contain EXTERNAL routines. The Has_Linker_Info flag in the segment dictionary indicates whether a segment has such information or not.

    Linker information is always at the end of the segment, starting on a block boundary after the segment reference list (if any). There is no pointer to it in the segment header, but its address can be calculated from the segment entries in the segment dictionary:
    Code_Addr + ((Code_Length + Seg_Refs + 255) / 256).

    Linker information is a series of 8-word records, whose contents depends on the type of linking required. Records are padded with zeros if less than 8 words are actually used. Some records are followed by a list of pointers whithin the segment. All records can be defined with a single Pascal structure, even though they only share the common Name and LI_Type fields.

    LI_Types = (EOF_Mark, Glob_Ref, Publ_Ref, Priv_Ref, Const_Ref,
    Glob_Def, Publ_Def, Const_Def, Ext_Proc, Ext_Func, Sep_Proc, Sep_Func);
    LI_Entry = RECORD
    Name: PACKED ARRAY[0..7] OF CHARS;
    CASE LI_Type: LI_Types OF
    Glob_Ref, Publ_Ref, Const_Ref:
    (Format: (Word, Byte, Big);
    N_Refs: integer);
    Priv_Ref:
    (Format: (Word, Byte, Big);
    NRefs: integer;
    N_Words: integer);
    Ext_Proc, Ext_Func:
    (Src_Proc: integer;
    N_Params: integer);
    Sep_Proc, Sep_Func:
    (Sec_Proc: integer;
    N_Params: integer;
    Kool_bit: boolean);
    Glob_Def:
    (Home_Proc: integer;
    IC_Offset: integer);
    Publ_Def:
    (Base_Offset: integer;
    Pub_Data_Seg: integer);
    Const_Def:
    (Const_Val: integer);
    END;
    Ptr_List: ARRAY[0..ceiling(N_Refs/8)] OF
    ARRAY[0..7} OF integer;

    Ok, let's take these one at a time. Remember that they all have a Name field that hold the name of the identifier, and a LI_Type field that indicated the type of linking required.

    Glob_Ref is used to link identifiers between assembly routines.

     Glob_Ref:
    (Format: (Word, Byte, Big);
    N_Refs: integer);

    Name is the name of the identifier (a.k.a. label).
    Format is always 0 (i.e. "word").
    N_Refs is the number of references in the list that follows the record.

    Ptr_List: The reference list comes by chunks of 8 words. For Glob_Ref, the Linker will add the final segment offset to each of the words pointed at by the reference list. The offset could be in words or in bytes depending on the assembly language used.


    Publ_Ref is used to link a label in an assembly routine to a variable in a compilation unit (compiled into p-code).

     Publ_Ref:
    (Format: (Word, Byte, Big);
    N_Refs: integer);

    Name is both the name of the label and the name of the variable in the compilation unit.
    Format is always "word".
    N_Refs is the number of references in the list that follows the record.

    The linker will add the offset of the variable to all words pointed at by the reference list.


    Const_Ref is used to link a label in an assembly routine to a constant defined in a compilation unit.

     Const_Ref:
    (Format: (Word, Byte, Big);
    N_Refs: integer);

    Name is the name of the label, and the name of the constant.
    Format can be "word" (0) or "byte" (1).
    N_Refs is the number of references in the list that follows the record.

    The linker will place the constant byte or word at all locations specified in the reference list.


    Priv_Ref is used to allocate space is the global data segment.

     Priv_Ref:
    (Format: (Word, Byte, Big);
    NRefs: integer;
    N_Words: integer)

    Format is always "word".
    N_Words is the number of words to allocate.
    N_Refs is the number of references in the list that follows the record.

    The Linker will add the offset of the beginning of the allocated area to each word pointed at by the reference list.


    Ext_Proc and Ext_Func are generated by the Compiler to reference EXTERNAL routines. They are not followed with a pointer list.

     Ext_Proc, Ext_Func:
    (Src_Proc: integer;
    N_Params: integer);

    Name is the name of the routine.
    Src_Proc is the number assigned to the routine.
    N_Params is the number of words used for parameter passing.


    Sep_Proc and Sep_Func are generated by the Assembler to declare routines. They are not followed with a pointer list.

     Sep_Proc, Sep_Func:
    (Sec_Proc: integer;
    N_Params: integer;
    Kool_bit: boolean);

    Name is the name of the routine.
    Src_Proc is the number assigned to the routine.
    N_Params is the number of words used for parameter passing.
    Kool_Bit is true if the routine is relocatable ( created with .RELPROC and .RELFUNC) and false otherwise.


    Glob_Def is generated by the Assembler to declare a global identifier (with .DEF, .PROC, .FUNC, .RELPROC or .RELFUNC). There is no pointer list.

     Glob_Def:
    (Home_Proc: integer;
    IC_Offset: integer);

    Name is an identifier valid within the segment, that can be referenced by other routines from the same segment.
    Home_Proc is the number of the routine in which the identifier is defined.
    IC_Offset is the offset in bytes of the label whithin the routine that contains it.


    Publ_Def is generated by the Compiler to declare a global variable that should be visible to EXTERNAL routines. It is not followed with a pointer list.

     Publ_Def:
    (Base_Offset: integer;
    Pub_Data_Seg: integer);

    Name is the name of the variable.
    Base_Offset is the offset in words of the variable, from the start of the segment that contains it.
    Pub_Data_Seg is the local number of the data segment that contains the variable.


    Const_Def is generated by the Compiler to declare a global constant that should be visible to EXTERNAL routines. It does not have a pointer list.

     Const_Def:
    (Const_Val: integer);

    Name is the name of the constant.
    Const_Val is the value of the constant.


    EOF_Mark is a special type that marks the end of the linker information list. Name should be filled with spaces, the remainder of the record, as always, filled with zeros.

    Note that not all segments can have every type of linker information. Compiler generated Prog_Seg and Unit_Seg can :

  • Refer to Ext_Proc and Ext_Func defined by assembly routines
  • Define Publ_Def and Const_Def for assembly segments to refer to
  • And of course, issue an EOF_Mark to mark the end of the list.
  • Assembly generated Seprt_Seg can:

  • Define Sep_Proc and Sep_Func for p-code segments to call
  • Define Glob_Def and refer to Glob_Ref, for assembly-to-assembly linking
  • Refer to Publ_Ref and Const_Ref to link with p-code
  • Issue Priv_Ref to reserve space
  • Issue an EOF_Mark.

  • Assembly files

    The structure of a pure assembly file, generated by the Assembler, resembles a lot the Compiler-generated code files described above. There are a few differences however.

    Most importantly, each procedure has its own relocation list, located just after the body of the procedure. Since Exit_IC is not used by assembly routines, this word is used to point at the top of the relocation list (remembers, it grows downwards). These relocation lists are the only ones in which the Proc_Rel relocation type can be used. The Linker will convert all the procedure's lists into a big segment's list, in which the Proc_Rel will be replaced with Seg_Rel.

    For each procedure, the Data_Size word is >FFFF, which is zero in one's complement notation. It indicates 1) that no data space is required, 2) that the first instruction in the routine is an assembly opcode.

    Since the Assembler does not know the segment name, all routines that communicate with REFs and DEFs are assumed to be in the same segment.

    And since assembly segments don't know what program or unit they are linked to, the Prog_Name entry in the segment dictionary is left blank. For the same reason, the Data_Seg_Num field in the List_Header record of Base_Rel sublists will contain zeros. The proper value will be filled in by the Linker. The Linker also updates the Relocatable bit in Seg_Misc arrays according to what was supplied in the linker information section.

    At linking time, it is also possible that the code bodies of assembly routines will be copied into one of the segments of the compilation unit they are linked to.

    Next page: the TI-99/4A p-code card


    Revision 1. 9/21/01 Ok to release.

    Back to the TI-99/4A Tech Pages