Hey, I said "primer". Not "tutorial", ok?
Machine language
Assembly language
CPU registers
Addressing modes
Bytes vs words
Opcodes
Common assembly-time instructions
Encoding format
Hexadecimal notation
Basically, a computer consists of a microprocessor together with some memory. All the additional gear (keyboard, screen, mouse, etc) is just here for a secondary purpose: allow a human to interface with the computer.
The microprocessor is the brain of the computer: it controls the whole system according to the instructions contained in a program. But no matter how sophisticated, the microprocessor still does not understand english! The only thing it understands are numbers. In the case of the TI-99/4A, numbers from 0 to >FFFF, i.e. 65535 (if you don't know what I mean by >FFFF, thre will be an explanation of the hexadecimal notation at then end of this page).
Each operation that the microprocessor can execute has been assigned a number: >A000 (40960) for addition, >6000 for substraction, etc. A program is nothing else that a carefully arranged list of such numbers.
Actually, I slightly oversimplified the situation. Some instructions are encoded by a unique number, but most can be encoded by a range of numbers, depending on their targets. Let's thake the negation operation for instance: it can be encoded by any number between >0500 and >053F. Each number corresponds to a different target for the instruction: >0500 instructs the microprocessor to negate register 0, >0501 to negate register 1, etc.
The set of valid numbers accepted by the processor therefore constitues a language. It is called "machine language". In the case of the TI-99/4A, the microprocessor (the TMS9900) understands 69 different instructions, encoded by a total of 57792 numbers (i.e. some numbers do not correspond to any valid instruction).
As you probably realise, machine language is not very convenient for a human to use. Who's gonna recall 57792 numbers and know what they are used for? It is possible to program directly in machine language: I can do it to some extent, to patch a program with a sector editor for instance. But this is only convient for tiny bits of program using a small subset of instructions. If we want to do anything more complex, we'll need an assembler.
An assembler is a program that translates human managable mnemonics into machine language. For instance, the mnemonic for the addition is "A", the one for substaction is "S" and the one for negation is "NEG". Such mnemonics are called "opcode" for operation code.
Another set of mnemonics can be combined with the opcodes to define the target of an instruction (i.e. its operand). For instance, register 0 would be designated as "R0" and register 1 as "R1".
To negate register 0 you would write:
NEG R0 |
Similarly, to add R1 to R0:
A R1,R0 |
It's easier to handle than machine language isn't it?
This set of opcodes and operands is called "assembly language". It's nothing else than a litteral representation of machine language in a human-managable form.
Of course, since we are using an assembler, we could as well add some extra features that will make our life easier. The assembler may automatically check for syntax errors and tell us about it. It may also allow us to replace numbers with alphanumeric "labels". For instance, I could define a label for the number 10:
TEN EQU 10 |
EQU means "equates" and is not an opcode, in that it will not be translated to machine language. It is an assembly-time instruction, used to tell the assembler that we would like to use the word "TEN" as a synonym for the number ten. Wherever we write "TEN" in our program the assembler will replace it with 10 before it translates the program into machine language:
LI R0,TEN Is equivalent to: |
Whereas opcodes and operands are pretty much defined by the structure of the microprocessor, assembly instructions are only limited by the programers imagination. Therefore, different assemblers may used very different instructions. In all examples you'll find on this website, I have used the instruction set of the original assembler released by Texas Instruments with the module "Editor/Assembler".
Conceivably, an assembler could load machine code directly into memory at then end of the assembly process. But that would not be convenient because you would have to re-assemble a program every time you want to execute it. Therefore most assemblers are file-based programs.
The TI assembler takes its input from a Dis/Var 80 text file and produces a Dix/Fix 80 object file in tagged-object format. It also offers you the option to produce a list file, where all syntax error will be reported: this is easier than trying to jot them down as they flash on screen!
The object file contains machine language mixed with special instructions for the linker and the loader (the 'tags'). If you are curious, the tagged-object format of this file is described in my page on the Editor/Assembler cartridge.
A linker is a program that lets you piece together several object files assembled independently. This is a nice feature when you write really big assembly programs, because it means that you don't have to re-assemble a mammoth program every time you make a small change. You can split the program into smaller files and assemble them separately, then you only re-assemble the one you modified.
Also, you could have a library of usefull routines, whether written by you or not, and link them to your program as independently assembled files. This saves you the need to reinvent the wheel for every program.
Finally, some linkers will let you link assembly with other languages, such as GPL or Extended Basic.
A very good linker is the RAG linker, from R.G. Green. It takes one or more DF80 object files and produces an executable version of the program, in the form of a memory-image "program" file.
The memory-image file must then be loaded into memory by yet another program, called a loader. The Editor/Assembler cartridge contains such a loader in its option 5, which is why memory-image files are often refered to as EA5 files. The TI-writer option 3 also calls an EA5 loader, so do Funnelweb options 1 to 3. Finally, you will find a stand-alone loader on this site: my MILD loader that let's you load both machine language and GPL.
Texas Instruments also came up with several hybrid "linking-loaders" which you'll find in Editor/Assembler option 3, Mini-memory option 1 and Extended Basic CALL LOAD. As their name implies, these programs perform linking while the program is loaded. This is much slower then loading EA5 files, but it has the advantage that object files can be loaded anywhere there is room in memory, whereas EA5 files are meant to run at a predefined location. Also, linking with Extended Basic is easier this way.
Once the program is in memory, you can use a small utility called SAVE to dump it into an EA5 program file.
In summary, the process of creating assembly language programs can be represented as such:
Your brain
|
| Editor
|
V
Text File (DV80)
|
| Assembler
|
V
Object file (DF80) Other object files (DF80)
| |
| Linker |
| |
V V
Memory-image EA5 file (Program)
|
| EA5 loader
|
V
Program is executed in memory
Alternatively, using the TI Editor/Assembler cartridge:
Your brain
|
| Editor
|
V
Text File (DV80)
|
| Assembler
|
V
Object file (DF80) SAVE utility (DF80)
| :
| EA3 linking loader :
| :
V V
Program is executed in memory
:
: SAVE is called (optional)
:
V
Memory-image file (Program)
The format of the source file is the only one we are concerned with for the moment being, since this is the file that you must write. Fortunately, its format is very simple: each line in the file contains an opcode with its operands, generally in the form:
[label] Opcode [operand][,operand] [comments]
Spaces ^ ^ ^
The fields in [brakets] are optional. Note however that the number of operands is determined by the opcode and is therefore not optional for a given opcode: some have no operands, some have one, some have two. The syntax is a little more relaxed for instructions to the assembler, and some of these can take a variable number of operands.
Labels in the left margin are used to easily refer to a location in your program, for instance to branch to it. There must be at least one space between label and opcode, between opcode and operands, and between operands and comments (if any). If there are two operands, they should be separated with a comma, not with spaces.
Comments are very important with assembly language, since it can be quite difficult to figure out what a program is doing when reading the source file. This even applies to your own programs: you'd be surprised how difficult it is for you to understand what you wrote just a few month ago! Therefore, use comments a lot: leave messages for yourself.
If you need more room than just the end of the line, you can enter lines that contain only comment. Just make sure that such lines begin with a * that will instruct the assembler to ignore them.
Example:
* This is a comment line
MYTEST MOV R1,R2 This is a sample assembly language line
So what's the smallest assembly program that you can write? Something like this:
START RT return to caller
END START
RT is not a real opcode, it's an assembly alias for one of the most common return statement, which should actually be written B *R11 This statement returns to the caller, in this case the loader that will return control to you (hopefully).
END is an instruction that tells the assembler that the end of the file has been reached. An optional operand can also be used to indicate the entry point of your program: in our case this is the label START.
If you don't specify an entry point, an EA5 loader will begin execution at the top of your program. An EA3 loader will wait for you to enter the name of a label where you want to start execution. So you can do this, you must tell the assembler to include the label into the object file. This is done as follows:
DEF START
START RT return to caller
END
The DEF instruction makes one or more label available to the linker and/or the loader.
Before we discuss assembly language further, I would like to introduce you briefly to the structure of the TMS9900 microprocessor (a.k.a. CPU for "Central Processing Unit).. You'll find a more complete description in here in case you are interested.
First let's talk about registers. A register is nothing else than a memory cell that is integrated inside the microprocessor. The advantage is that it's much easier and faster for the microprocessor to access a register than to access the external memory. The TMS9900 has 3 main registers:
The program counter
The workspace pointer
The status register
The Program Counter (PC) keeps trace of the currently executed instruction. This allows the TMS9900 to fetch the next instruction to be executed. Some instructions allow you to load a different value inside PC, thereby performing a "jump" or "branch" inside your program (the equivalent of a Basic GOTO). The program counter is a 16-bit register that simply contains the memory address of the current instruction.
The Status register (ST) contains flag bits use by the TMS9900 to make decisions. Most operations affect one or the other of these bits. And conversely, some instructions behave differently according to the status of a given bit (e.g. conditional jumps). The structure of the status register is the following:
Bit | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 to 11 | 12 to 15 |
---|---|---|---|---|---|---|---|---|---|
Use | High | GT | Equ | Carry | Ovf | Par | Xop | not used | Interrupt mask |
The first bits deal with mathematical operations:
High means logically higher than, for unsigned operations. For instance, you could use the compare opcode "C" to compare R0 and R1:
LI R0,>0001 Loads number 1 into R0 |
After the comparison instruction, the "High" bit will be set to 1 since >FFFF is logically higher than 1.
GT (greater than) performs the same function, but consider that the operands contain signed values. By convention, >FFFF means -1, >FFFE means -2, >FFFD means -3, ... and >8000 means -32768.
In the above example, GT will be 0 since -1 is arithmetically smaller than 1.
Eq means equal and is set by the comparison operation when the two operands contain equal values.
And now, let's mention a very usefull feature of the TMS9900: after almost each operation that deals with numbers the processor will aurtomatically compare the result to zero and set the status bits accordingly. For instance, if you decrement R0 by 1:
DEC R0 |
the content of the status register will reflect the result of the comparison of R0 (after it was decremented) with zero. If you were using R0 as a countdown counter, there is no need to use a "C" instruction to compare it to zero: this was already done by the DEC instruction.
Carry is used to indicate a carry over. It can be viewed as a 17th bit, left of bit 0. For instance, if you do:
LI R0,>0001 Load number 1 into R0 |
The result will be >10000, which is too big to fit in 16 bits. R0 will therefore contain >0000 and the "Carry" bit will be set to indicate that the value of R0 "wraped over".
Ovf indicates an overflow during some operations. Mostly, ovf deals with the sign bit (i.e. bit 0). For instance, if you add two positive numbers and the result may be understood as a negative number, ovf will be set:
LI R0,>4000 Load number into R0 |
The result of the addition (to be placed in R0) is >8003, wich is perfectly correct for unsigned values: 16384+16385 = 32769. However, if we were to consider these numbers as signed, we would get: 16384+16385 = -32767. The TMS9900 sets ovf to warn us about a potential problem in this case, as well as in similar situations.
Par stands for parity. It is only used by a limited number of opcodes. The microprocessor checks all the bits involved in these operations and counts how many of them were "1". If the result is odd, the Par bit will be set. This can serve as a transmission error control mechanism for instance, but is generally of little use to the average programmer.
Xop is set during the execution of a XOP instruction.
The interrupt masks tells the TMS9900 upto which priority level it can accept interrupts. In the TI-99/4A console, all interrupts are hardwired at level 2. Therefore only two cases are of importance for us:
In case you wonder, an interrupt is a hardware-triggered event that causes the TMS9900 to temporarely stop execution of the current program and to execute another program instead: the interrupt service routine (ISR). Once the ISR is completed, the TMS9900 will (hopefully) resume execution of the program where is was interrupted.
Registers are also meant to be used by the programmer: since they are located on chip they are faster than external memory. Furthermore, since there is only a limited number of registers, they can be encoded within the instruction number itself (cf the example above for NEG R0 and NEG R1). By contrast, if we wanted to negate an address in the external memory, we wourld write:
NEG @>2000 Negates word at address >2000 |
But now we need 16 bits to specify the address of the word to negate, obiously it cannot fit together with the opcode into a 16-bit instruction word! Therefore, this instruction will require an extra word and will be encoded as: >0520 >2000 (the >0020 in >0520 indicates that there is an extra word in this instruction).
With instructions that deal with two operands (like the addition) it may be necessary to have an extra word for each of them. Therefore a TM9900 instruction can be 1, 2 or 3 words long.
Now how many registers are there for us to use inside the TMS9900? Well, none... or 16.... or as much as we want!
Let me explain that. Instead of placing registers inside the microprocessor, the TI designers decided that registers will be in the external memory. The TMS9900 only contains a workspace pointer, i.e. a register that contains the address of 16 pseudo-registers in the external memory.
This kind of defeats the principle of a register: it will take as much time to access a register that to access a memory address... since registers are memory addresses. Well, not quite. We saw above that an instruction requires an extra word to use an operand in the external memory. Thus, by using a register we make the program shorter, and we also make it faster: we save the time that would be required to read that extra word.
In addition, if we change the value of the workspace pointer, we automatically start to work with a fresh set of 16 registers. If we change it back to what it was before, we'll found our previous registers with their original content. This is very usefull for context switching, i.e. to switch from one task to another. This is what occurs during the execution of an interrupt for instance: the TMS9900 automatically switches to a workspace placed at >83C0 and "remembers" where was the workspace the main program was using. This allows to return from the interrupt with an undisturbed set of registers.
You can also cause such a context switch programmatically, using instructions like BLWP (Branch and Load a new Workspace Pointer), LWPI (Load Workspace Pointer from Immediate value) and RTWP (ReTurn using old Workspace Pointer).
If you followed the above, you will have now understood that a given instruction address memory in two different ways: as a register (i.e. on offset with respect the the workspace pointer) or as an absolute address. These different ways are known as "addressing modes". There is a total of eight addressing modes for the operands of the TMS9900 instructions, but out of these three are reserved for sets of special instructions.
Immediate operands: e.g. LI R0,12
Register addressing: e.g. CLR
R1
Indirect register addressing:
e.g.
CLR *R1
Auto-incrementing addressing: e.g. CLR
*R1+
Direct addressing: e.g. CLR @>2000
Indexed addressing: e.g. CLR
@>2000(R1)
PC relative addressing:
e.g.
JMP $+4
CRU relative addressing: e.g. SBO
0
These only apply to a restricted set of instructions. An immediate value is a number that immediately follows the instruction in the program. This value is to be used as an operand.
We already encountered one of these: the LI instruction that appeared in many examples above:
LI R0,>0001 Loads number 1 into R0 |
In machine language this corresponds to:
>0200
>0001.
Where >020x encodes the LI opcode, >xxx0 encodes register 0 and the extra word >0001 encodes the immediate value.
The "immediate" opcodes are:
LI R1,123 (load immediate value)
AI R1,-12 (add immediate value)
ANDI R1,>1FFF (and immediate value)
ORI R1,>8040 (or immediate value)
CI R1,25 (compare with immediate value)
LWPI >83E0 (load worskpace pointer with
immediate
value)
LIMI 2 (load interrupt mask with immediate
value)
As you can see, they all end with letter I (and no other opcode does). They are thus pretty easy to recognize and to remember.
Most have two operands. The first one must be a register (no indirect, no indexing, just a plain register addressing). LWPI and LIMI have only one operand because the register is implicitly defined by the instruction: the Worspace pointer register and the Status register of the TMS9900, respectively.
The second operand must be an immediate value: it's a requirement, not a choice.
By contrast, other opcodes generally have the choice between the 5 other addressing modes. There are a few exceptions to this rule though: some opcodes can only deal with registers for on apparent reason. That's because the TI designers ran out of numbers to encode all these possibilities in machine language.
NEG R0 Negates the content of R0 |
This one is already an old friend: the operand is simply one of the sixteen registers in the current workspace. They are abbreviated R0 through R15.
Note that the "R" is optional with the TI assembler. In fact, to use the "R" you must set a special option when launching the assembler. I didn't know that when I learned assembly (using a hacked assembler, with no manual) and I therefore got the habit of not using the R. I had a hard time remembering to add it while I was typing the examples for these pages...
NEG *R1 Negates the content of the word pointed at by R1 |
Here the register is not the final target of the instruction. Instead, it contains the address of the target (i.e. it is a pointer to it).
In the above example, if R0 contains >A123 the instruction will result in negating word at address >A123.
This allow you to operate on different addresses according to the situation, even though the exact address could not be predicted by the time you wrote the program.
NEG *R1+ Negates the content of the word pointed at by R1 |
These is almost exactly the same as the above, with one exception: once the instruction is completed, the content of R1 will be incremented (by two in this case, because we are dealing with words).
That comes extremely handy for copy loops for instance:
LI R1,>2000 We want to copy this memory area L1 MOV *R1+,*R2+ Copy one word |
Do you see how it works?
A.k.a. Symbolic addressing
NEG @>2000 Negates the word at address >2000 |
That one is easy to understand: you just use the memory address of the word you want to target. Optionally, you could designate this word with a label, hence the name "symbolic":
NEG @COUNTER Assuming you define COUNTER as >2000 |
Finally, most assembly will let you enter arithmetic expressions
like:
NEG @COUNTER+4 Negates the word at address COUNTER + 4 |
The assembler calculates the value of COUNTER+4 (in our case >2004) at assembly time and encodes the instruction as NEG @>2004. One more example of the usefull things an assembler can do for you.
NEG @>2000(R1) Negates the word at address: >2000 plus the value of R1 |
This one kind of combines @>2000 with *R1: the target address is calculated by adding the direct (a.k.a. symbolic) address and the content of R1.
For instance, if R1 contains 2, the above instruction negates
>2002.
If R1 contains >0548, it negates >2548 etc.
Note that R0 cannot be used as an index (this is because NEG @>2000 is in fact encoded as NEG @>2000(R0), the TMS9900 knows that in this case it must not use R0).
This adressing mode is usefull in two situations:
LI R1,8 Loads the number 8 into R1 |
In this case, the symbolic part of the operand refers to a table of values you placed in memory. R1 represents an index inside that table (hence the name of this addressing mode). Of course, we could have written NEG @TABLE+8, but in the above example, the content of R1 can vary. Therefore, the NEG instruction can apply to one or the other word in the table, according to the value of R1.
The second situation is the "mirror image" of the first one:
LI R1,LIST Makes R1 a pointer to LIST |
It does the same job as the above: which one you want to use is mainly a matter of taste. Sometimes you prefer to think "I'm targetting the 8th entry in the table" and sometimes you like better "I'm dealing with the 8th byte after the current position in the list".
There are two more addressing modes that are only used by well defined sets of opcodes (just like immediate operands are).
This one is used exclusively by jump instructions:
JMP (unconditional jump)
JEQ (Jump if Equal, i.e. if the Equ bit is 1 in the
status
register)
JNE (Jump if Not Equal, i.e. if the Equ bit is 0)
JGT (Jump if Greater Than)
JH (Jump if Higher)
JHE (Jump if Higher or Equal)
JL (Jump if Lower, i.e. if the "High"and the
"Equ" bits are 0)
JLE (Jump is Lower or Equal)
JLT (Jump if Less Than, i.e. if the "GT" and
the "Equ" bits are 0)
JNC (Jump if No Carry)
JOC (Jump On Carry)
JNO (Jump is No Overflow)
JOP (Jump on Odd Parity, i.e. if the "Par" bit
is set)
Again these are easy to recognise: they all begin with "J" (no other opcode does).
Youn will note however that there are some missing: what about JLTE and JGTE? And why isn't it a "jump if overflow" and a "jump on even parity". Again that's because the machine language is running out of valid numbers.
But the missing one are the logical complement of existing instructions: Therefore, to perform the equivalent of "Jump if Less Than of Equal" you could invert the comparison and use "JGT":
C R1,R4 Compare R1 to R4 C R4,R1 Do the inverse comparison |
Similarly, you could compensate for a defficient "jump if overflow"
by changing the structure of your program:
A R1,R4 Add R1 to R4 A R1,R4 Let's try again |
The main advantage of jump instructions, is that they can be encoded together with the opcode. How is this possible? Didn't I just mentionned that a memory address uses up 16 bits and therefore always requires an extra word in the program?
Yes, but you see jump to not use memory addresses. Instead they jump to a given distance from the current instruction: plus or minus 127 words. This allows to encode all jump values with a single byte and leaves the other 8 bits for the opcode. BUT it means that we cannot jump everywhere in the program: the destination address must be within the 127 words limit (actually it's 128 words forwards and 127 backwards). Here again, the assembler wil calculate this for you and issue an error message if the jump cannot be encoded. This "out of range" message is one of the most frequent you will encounter...
There are two ways to specify a jump in assembly. First you can type the displacement yourself:
C R1,R0 Compare R1 to R0 |
The $ sign stands for "current value of the program counter". The assemble then knows that the offset you specified is relative to the current instruction. (FYI: in machine language jumps are WORDS relative to the NEXT instruction: therefore a JMP $+4 is encoded as >1001, not as >1004. But the assembler takes care of this for you).
In the above example, it's relatively easy to figure that JLT $+4 jumps over the JMP. But who wants to count 120 bytes upwards to figure out where the JMP is going to land?
Therefore, the assembler lets you use a label to specify the destination. The value of this label is most simple defined by placing it in the left margin of the target instruction: the assembler automatically assigns the current value of the program counter (i.e. the memory location of the opcode) to such labels.
JNE OK Perform some kind of test |
Note that contrarily to braching instructions, you must not include the @ sign before a label in the case of a jump (as opposed to B @OK). This is usefull to remind you that you are restricted to a limited range of motion.
The CRU (Communication Register Unit) is a special hardware trick that allows the TMS9900 to communicate with pieces of equipment without using the data bus. Conceptually, it could be viewed as a set of 2048 wires linking the microprocessor to peripherals (in fact it uses only three lines, plus the address bus, but that's not the point).
You can also view it as a set of 2048 bits, one per each "wire". By reading a bit you can test the status of a wire, by writing to a bit you can change the status of a wire. Provided of course that the peripheral in question is designed to allow these operations: there could be read-only bits, write-only bits and not-used bits!
There are five CRU operations:
TB (Test Bit: transfer the bit into the Equ bit of
the
status register)
SBO (Set Bit to One)
SBZ (Set Bit to Zero)
LDCR (Load CRU: allows to modify upto 16 bits at a
time)
STCR (Store CRU: allows to read upto 16 bits at a time)
LI R12,>1300 CRU base of the RS232 card |
Note that TB is somewhat missleading: JEQ jumps it the bit is 1 and JNE if the bit is zero. That's because the bit is copied in the Equ bit (that normally stands for "equals zero" in automatic comparisons).
So what's the use of the LI R12,>1300 instruction? Well, instead of numbering the bits from zero, the TMS9900 numbers them from the value found in R12, divided by two (because the last addess line does not exist on a word-oriented microprocessor, we cannot encode uneven addresses).
That's usefull when dealing with peripherals: each card is assigned
a chunk of 128 bits at addresses starting at >1000, >1100,
>1200,
etc upto >1F00 (addresses >0000 to >0FFF are internal to the
console).
If you want to turn on the disk controller card, whose CRU address is
>1100
you may do:
LI R12,>0000 |
But that's kinda hard to remember right.
Now you could do the same with:
LI R12,>1100 |
Which is somewhat easier to use: the bits are relative to the card, not to the whole CRU address space
And when it comes to LDCR and STCR, you don't even have a choice: you must use the second style since you can't specify the starting bit in the instruction (it's always zero)
LI R12,>1102 |
Because >1102 accesses bit 1 of the disk controller card (remember: the content of R12 is the bit number times 2), the above example affects the values of bit 1, 2, and 3 of the controller card.
Note that with LDCR and STCR the only valid addressing mode is register (that's one of those case where numbers were missing...). The bits are always taken/stored from the right to the left of the register. If the operation deals with 1 to 8 bits, only the left byte of the register will be used: a byte-oriented operation is assumed. With 9 to 15 bits, the whole register is accessed.
For a more complere explanation of the CRU, see here.
The TMS9900, and the TI-99/4A that is built around it, is a word-oriented, byte-addressable processor.
What this means is that the processor mainly deals with 16-bit words. But that these words are stored in a memory that is addressed as bytes. It may seem silly, but that's what happened in most computer systems: even a 32-bit Pentium is still using a byte-addressable memory.
With 16 bits we can encode numbers from 0 to 65535, which means that the size of the memory has to be at most 64K bytes if we want to be able to deal with an address as a single value. The memory could be twice as large is it were word-addressable, but that's not the case.
For the processor, it means that the least significant address line (A15 in the TI numbering convention) is perfectly useless: it just represents the byte inside the word. Therefore, the TMS9900 has only 15 address lines: A0 through A14.
Nevertheless, the CPU can deal with byte values if necessary. It just uses an internal trick: when you write a value to a byte, the processor reads the whole word, modifies the byte you were accessing, and writes the word back. There is a bunch of opcodes that allow such byte-oriented operations: AB, SB, MOVB, CB, SOCB, SZCB and in some cases LDCR and STCR.
When the operand is a register, the only byte that you can access in this manner is the leftmost one, the most significant. That's the byte stored in memory at an uneven address. When using direct addressing, you can specify an odd or an even address indifferently for byte operations.
On the other hand word operations always require an even address, since the 16-bit value is transfered as such on the data bus (and the address is always even since A15 is missing).
LI R1,>0104 Loads >0104 into R1 |
Just remember: for word operations always use an even address.
For some mysterious reason, the TI-99/4A designers decided to cripple their machine by using an 8-bit data bus to access most peripherals. Only the console ROMs and the 256 bytes of console RAM known as the "scratch-pad" are accessed via a 16-bit data bus.
For the rest of the memory, a special circuitery in the console multiplexes the data bus and passes it as twice 8 bits. This requires the creation of an extra address line, A15, which indicates which byte is currently accessed. We therefore end up with a 16-bit address bus. Note that A15 is multiplexed with the CRU data output line, but this is not a problem during memory operations since this CRU line will be inactive.
The big drawback is that memory access is now twice slower, and the multiplexer circuit has to put the TMS9900 on hold until the transfer is completed!
Branching opcodes
Arithmetic opcodes
Bitwise logic opcodes
Shift opcodes
CRU opcodes
Various opcodes
Forbidden opcodes
Opcode | Value | Fmt | Operands | Status | |||
---|---|---|---|---|---|---|---|
A | >A000 | I | source,dest | HGECO | |||
AB | >B000 | I | source,dest | HGECOP | |||
ABS | >0740 | VI | dest | HGECO | |||
AI | >0220 | VIII | reg,immed | HGECO | |||
ANDI | >0240 | VIII | reg,immed | HGE | |||
B | >0440 | VI | dest | - | |||
BL | >0680 | VI | dest | - | |||
BLWP | >0400 | VI | dest | - | |||
C | >8000 | I | source,dest | HGE | |||
CB | >9000 | I | source,dest | HGE P | |||
CI | >0280 | VIII | reg,immed | HGE | |||
CKOF | >03C0 | VII | - | - | |||
CKON | >03A0 | VII | - | - | |||
CLR | >04C0 | VI | dest | - | |||
COC | >2000 | III | source,reg | E | |||
CZC | >2400 | III | source,reg | E | |||
DEC | >0600 | VI | dest | HGECO | |||
DECT | >0640 | VI | dest | HGECO | |||
DIV | >3C00 | IX | source,reg2 | O | |||
IDLE | >0340 | VII | - | - | |||
INC | >0580 | VI | dest | HGECO | |||
INCT | >05C0 | VI | dest | HGECO | |||
INV | >0540 | VI | dest | HGE | |||
JEQ | >1300 | II | PC-rel | E=1 | |||
JGT | >1500 | II | PC-rel | G=1 | |||
JH | >1B00 | II | PC-rel | H=1 E=0 | |||
JHE | >1400 | II | PC-rel | H=1 E=1 | |||
JL | >1A00 | II | PC-rel | H=0 E=0 | |||
JLE | >1200 | II | PC-rel | H=0 E=1 | |||
JLT | >1100 | II | PC-rel | G=0 E=0 | |||
JMP | >1000 | II | PC-rel | Always | |||
JNC | >1700 | II | PC-rel | C=0 | |||
JNE | >1600 | II | PC-rel | E=0 | |||
JNO | >1900 | II | PC-rel | O=0 | |||
JOC | >1800 | II | PC-rel | C=1 | |||
JOP | >1C00 | II | PC-rel | P=1 | |||
LDCR | >3000 | IV | source,nbits | HGECOP | |||
LI | >2000 | VIII | reg,immed | HGE | |||
LIMI | >0300 | VIII | immed | - | |||
LREX | >03E0 | VII | - | - | |||
LWPI | >02E0 | VIII | immed | - | |||
MOV | >C000 | I | source,dest | HGE | |||
MOVB | >D000 | I | source,dest | HGE P | |||
MPY | >3800 | IX | source,reg2 | - | |||
NEG | >0500 | VI | dest | HGECO | |||
ORI | >0260 | VIII | reg,immed | HGE | |||
RSET | >0360 | VII | - | - | |||
RTWP | >0380 | VII | - | All | |||
S | >6000 | I | source,dest | HGECO | |||
SB | >7000 | I | source,dest | HGECOP | |||
SBO | >1D00 | II | bit | - | |||
SBZ | >1E00 | II | bit | - | |||
SETO | >0700 | VI | dest | - | |||
SLA | >0A00 | V | reg,count | HGECO | |||
SOC | >E000 | I | source,dest | HGE | |||
SOCB | >F000 | I | source,dest | HGEC P | |||
SRA | >0800 | V | reg,count | HGEC | |||
SRC | >0B00 | V | reg,count | HGEC | |||
SRL | >0900 | V | reg,count | HGECO | |||
STCR | >3400 | IV | dest,nbits | HGE P | |||
STST | >02C0 | VIII | reg | - | |||
STWP | >02A0 | VIII | reg | - | |||
SWPB | >06C0 | VI | dest | - | |||
SZC | >4000 | I | source,dest | HGE | |||
SZCB | >5000 | I | source,dest | HGE P | |||
TB | >1F00 | II | bit | E | |||
X | >0480 | VI | dest | depends | |||
XOP | >2C00 | IX | source,xop# | X | |||
XOR | >2800 | III | source,reg | HGE | |||
Illegal | >0000-01FF, >0320-033F, >0780-07FF,>0C00-0FFF |
These opcodes allow you to control the flow of your program, by jumping from a point to another, possibly remembering where to come back.
We already discussed the jump opcodes. They are very usefull since they are conditional: your program can make decisions and follow one or the other path of execution according to the value of a given status bit.
The first one is very easy to understand: B just loads the value of the destination operand into the program counter register of the TMS9900. As a result, the program execution continues from that address. Note that contrarily to jumps there is no range limitation for B: the whole 64K range is within reach. This is because the address is passed as an extra word after the opcode.
Status bits affected: none
Branch-and-Link does exactly the same thing, except that it remembers where it came from: the address of the next instruction (the one that would be executed of the branch were not taken) is placed in register R11. Then the program execution continues from the specifed address: in this case it is marked by a label called "THERE".
Status bits affected: none
How to return from such a branch? Very easy: just use a B instruction with an R11 addressed as an indirect operand. Like that:
B *R11 Which can also be abbreviated as: |
Here we are just telling the TMS9900: branch at the address that you will find in R11. Which happens to be the return address, since BL automatically placed it in here.
Of course, you should carefully preserve this return address. If you plan to use R11, make sure you first transfer its content into another register or memory location. A very common mistake is to place another BL within a function called with a BL: the new return point will overwrite the old one.
BL @HERE Branch at "HERE", place address "BACK" into R11 HERE BL @THERE Incorrect example: branch at "THERE" and place "DONE" into R11 * Now here is the correct example HERE MOV R11,R10 Save return point |
Branch-and-Load-Workspace-Pointer is slightly trickier: here we want to branch, but also to change the workspace. To be able to return to the calling point, we must not only remember the return address (i.e. the content of the PC register), but also the location of the old workspace (i.e. the content of the WR register). And since we are at it, we may also record the value of the status register. This way, when we come back, we can restore the situation just like is is now.
You may think that BLWP is called with two operands, one for the address to branch at, the other for the new workspace. If that's the case, you came close but no cigar! BLWP has only one operand, a pointer: it contains the address of a pair of words that contain the address to branch at and the new workspace. It is more flexible this way since you only have to know one value (the pointer) to perform the branch.
BLWP @WHERE WHERE DATA >8300 New workspace to be used HERE MOV R0,R1 The BLWP branches here |
So where are the three values we wanted to remember upon branching? The pointer to the old worskpace is automatically placed into R13, the return address (the instruction that follows the BLWP) into R14, and the current value of the status register into R15.
Here also, we should be carefull not to overwrite them with our data. However, unlike BL, there is no risk of erasing them with a second BLWP, since that one will save its return values into R13-R15 of the NEW workspace. That's a very neat feature: it means that BLWPs can be nested as deep as we want! The drawback is that this operation is fairly slow, as the TMS9900 must do a lot of memory transfer to complete it...
Status bits affected: none
Now how do we return to the caller and restore the three TMS9900 registers? There is a dedicated instruction for that purpose: Return-with-Worskpace-Pointer. Upon execution it will branch to the address found in R14, with the workspace specified in R13 and place the content of R15 in the status register. And we just have restored the situation as it was before the BLWP was taken.
That's an ideal feature for a multi-tasking operating system. Too bad there isn't one around... Actually, there is one multi-tasking feature in the TI-99/4A: the interrupt service routine. Upon reception of an interrupt, the TMS9900 performs an implicit BLWP @>0004, i.e uses the values found at >4000 and >4002 in the console ROM as workspace (>83C0) and address (>0900). At this address you'll find the interrupt service routine, the subprogram that handles all the interrupts. This subprogram returns to the main program with a simple RTWP and the main program never knows it was interrupted.
Know, if you have an "assembly freak" mind, you'll probably have thought of an alternative use for RTWP: change the content of the registers in the TMS9900. All we have to do is to load values into R13, R14 and R15 and execute a RTWP.
LI R13,>83E0 This will be the new workspace LWPI >83E0 We could alter the workspace this way |
This trick is often used to return at a differnt address in case an error occured within a procedure: just place the address of the error handling routine into R14 and the next RTWP will branch to it. It is good practice to save the previous value of R14 somewhere, in case the error handling routine would need it, after all
Extended operations are implicit BLWPs that use vectors located at precise addresses in the console ROMs:
XOP 0 uses addresses >0040 and >0042 (worskpace >280A,
address
>0C1C)
XOP 1 uses addresses >0044 and >0046 (workspace >FFD8, address
>FFF8)
XOP 2 uses addresses >0048 and >004A (values may vary according
to
the ROM version), etc
XOPs constitute an advanced feature and are discussed in the TMS9900 page.
While we are taking about branching and calling subroutines, I would like to briefly discribe the three classical ways for a subroutine to get parameters from the calling program: global variables, registers and immediate data.
* This snippet illustrate three ways to pass parameters CLR @COUNTER Place the value >0000 into the word at address COUNTER SUB1 MOV @COUNTER,R2 Use the value in COUNTER: global parameter |
It's not considered as good programming practice to make to much a wide use of global parameters. You should reserve them for critical values that are needed by many subroutines in your program: using the as global avoids having to pass them with a register every time.
Whether to use a register or a data word largely depends on what kind of information is passed by this parameter. If it can contain a wide range of values that vary often, better use a register. If it just allows to differentiate between a few situation uses a data word: this will save you a register, and a word in memory (the one occupied by the LI instruction in the caller).
For instance: a routine that places a character somewhere onscreen should probably pass the character and the screen position in registers. This will make it more versatile. By contrast, a routine used to display pre defined error messages may well use dat words as there won't be that many messages and they will probably always be displayed at the same address.
Note the way the data word is retrieved in the subroutine: it automatically increments R11 by two, which will skip the data word upon return
The corresponding operations for a subroutine called with BLWP would be the following:
* This snippet illustrate three ways to pass parameters CLR @COUNTER Place the value >0000 into the word at address COUNTER SUB4 DATA WREGS,SUB40 Use this workspace and this address SUB40 MOV @COUNTER,R2 Use the value in COUNTER: global parameter |
Did you get the trick with R13? Since R13 contains the address of the old workspace, *R13 points at the old R0, @2(R13) at the old R1, @4(R13) at the old R2, etc.
If the subroutine wants to return a value, it can place it in a register, or in a global variable. Generally, one does not return values in a DATA statement (because the program could be placed in ROM, in which case the data word won't take the new value). On the other hand, the following trick is often use to indicate a special condition upon return:
BL @SUB2 Call a subroutine SUB2 CI R0,>0111 Compare R0 to >0111 |
SUB2 indicates to the caller that everything went OK by skipping the JMP upon return. Another way to do it would be to place the test in the caller, since the B instruction does not affect the status register:
BL @SUB3 Call a subroutine SUB3 CI R0,>0111 Compare R0 to >0111 |
Now what about routines called with BLWP? The first solution is identical, just replace R11 with R14:
BLWP @SUB8 Call a subroutine SUB8 DATA WREGS,SUB80 New workspace and branching address SUB80 CI R0,>0111 Compare R0 to >0111 |
The second solution is trickier since the RTWP discards the content of the status register and replaces it with R15. We therefore have to store the status in R15 beforehand (or some value of our choice).
BLWP @SUB9 Call a subroutine SUB90 DATA WREGS,SUB80 New workspace and branching address SUB80 CI R0,>0111 Compare R0 to >0111 |
Finally, let me remind you that you can change the return address by loading any value in R11 (or R14 for RTWP)
BL @ENTRY1 Call a procedure ENTRY2 LI R11,ERROR Special entry point: always return to error |
The following opcodes can be used to perform integer math. Unless otherwise indicated, the source and destination operands can be accessed in any of the five main addressing modes: Rx, *Rx, *Rx+, @xxxx and @xxxx(Rx).
Copies the content of the source operand into the destination
operand
and compares it to zero.
Status bits affected: High, Gt, Equ
Same as MOV, but affects only the leftmost (most significant) byte
of
the operands.
Status bits affected: High, Gt, Equ, Parity
Loads the immediate value into the destination register and compares
it to zero.
Status bits affected: High, Gt, Equ
Adds the content of the source operand to the destination operand
and
compares the result to zero.
Status bits affected: High, Gt, Equ, Carry, Ovf
Same as A, but affects only the leftmost (most significant) byte of
the operands.
Status bits affected: High, Gt, Equ, Carry, Ovf, Parity
Adds an immediate value to the destination register.
Status bits affected: High, Gt, Equ, Carry, Ovf
NB. There is no SI (substract immediate) instruction, but you can use AI with negative values: AI R0,-5
Substracts the content of the source operand from the destination
operand
and compares the result to zero.
Status bits affected: High, Gt, Equ, Carry
Same as S, but affects only the leftmost (most significant) byte of
the operands.
Status bits affected: High, Gt, Equ, Carry, Ovf, Parity
Compares the content of the source operand to that of the
destination
operand.
Status bits affected: High, Gt, Equ
Same as C, but affects only the leftmost (most significant) byte of
the operands.
Status bits affected: High, Gt, Equ, Parity
Compares the register to an immediate value.
Status bits affected: High, Gt, Equ,
Decrements the destination operand by 1 and compares the result to
zero.
Status bits affected: High, Gt, Equ, Carry, Ovf
Decrements the destination operand by 2 and compares the result to
zero
Status bits affected: High, Gt, Equ, Carry, Ovf, Parity
Increments the destination operand by 1 and compares the result to
zero
Status bits affected: High, Gt, Equ, Carry, Ovf
Increments the destination operand by 2 and compares the result to
zero
Status bits affected: High, Gt, Equ, Carry, Ovf, Parity
Negates the destination operand and compares the result to zero.
Status bits affected: High, Gt, Equ, Carry, Ovf, Parity
Takes the absolute value of the destination operand (i.e. negates it
if it is less than 0) and compares the result to zero.
Status bits affected: High, Gt, Equ, Carry, Ovf, Parity
NB There are no DECB, DECTB, INCB, INCTB, NEGB, nor ABSB byte-oriented operations.
Multiplies the source operand and the destination register. The result will probably be 2-word long and will therefore be placed into the destination register and the following register! (If R15 is the destination, the next memory word after the workspace is used).
Example: R0 * R1 --> [R1-R2]
LI R0,5 |
Divides the 2-word value found in the destination regiser and the
following
register by the content of the source operand. The result is placed in
the destination register. The remainder in the next register.
Status bits affected: Ovf
Example: [R1-R2] / R0 --> R1, Remainder in R2
LI R0,4 |
The following operations deal with operands on a bitwise basis, i.e. the operands are not considered as words or bytes but as a collection of individual bits. The change of a given bit will never affect neighbouring bits.
Reset (clears) all bits in the destination operand to zero. The
result
is >0000.
Status bits affected: none
Sets all bits in the destination operand to one. The result is
>FFFF.
Status bits affected: none
Inverts all bits in the destination operand (logical NOT) and
compares
the result to zero.
A 0 bit becomes a 1, a 1 bit becomes a 0.
inv 0 = 1
inv 1 = 0
Status bits affected: High, Gt, Equ
Ands the bit in the destination register with those in the immediate
value and compares the result to zero.
If both the source and destination bit are 1, the resulting bit will be
1. Otherwise it will be 0.
0 andi 0 = 0
0 andi 1 = 0
1 andi 0 = 0
1 andi 1 = 1
Status bits affected: High, Gt, Equ
Ors the bit in the destination register with those in the immediate
value and compares the result to zero.
If both the source and destination bit are 0, the resulting bit will be
0. Otherwise it will be 1.
0 ori 0 = 0
0 ori 1 = 1
1 ori 0 = 1
1 ori 1 = 1
Status bits affected: High, Gt, Equ
Exclusive OR of the bits in the source operand with those in the destination register. The result is compared to zero.
If a bit is 1 in either the source or the destination operand, but
not
both, it will be one. Bits that are identical in both operands are
reset
to 0:
0 xor 0 = 0
0 xor 1 = 1
1 xor 0 = 1
1 xor 1 = 0
Status bits affected: High, Gt, Equ
NB There is no XORI instruction
Sets to 1 all bits in the destination operand that correspond to a 1 in the source operand and compares the result to 0. The other bits are unchanged.
Set Ones Corresponding is therefore the equivalent of a logical OR:
0 soc 0 = 0
0 soc 1 = 1
1 soc 0 = 1
1 soc 1 = 1
Status bits affected: High, Gt, Equ
Example:
LI R0,>0401 |
Same as SOC, but affects only the leftmost (most significant) byte
of
the operands.
Status bits affected: High, Gt, Equ, Parity
Resets to 0 all bits in the destination operand that correspond to a 1 in the source operand and compares the result to 0. The other bits are unchanged.
Set Zero Corresponding is therefore the equivalent of a logical
NOT-A
AND B:
0 szc 0 = 0
0 szc 1 = 1
1 szc 0 = 0
1 szc 1 = 0
Status bits affected: High, Gt, Equ
Example:
LI R0,>0401 |
Same as SOC, but affects only the leftmost (most significant) byte
of
the operands.
Status bits affected: High, Gt, Equ, Parity
Checks whether all bits in the destination operand that correspond to a 1 in the source operand are 1. Sets the Equ bit if this is the case. All bits are unchanged.
Compares Ones Corresponding therefore performes a masked comparison
with >FFFF:
0 coc 0 --> not considered
0 coc 1 --> not considered
1 coc 0 --> Equ will be 0 if this happens at least once
1 coc 1 --> ok so far.
Status bits affected: Equ
Example:
LI R0,>0401 |
Checks whether all bits in the destination operand that correspond to a 1 in the source operand are 0. Sets the Equ bit if this is the case. All bits are unchanged.
Compares Ones Corresponding therefore performes a masked comparison
to with >0000:
0 coc 0 --> not considered
0 coc 1 --> not considered
1 coc 0 --> ok so far
1 coc 1 --> Equ will be 0 if this happens at least once.
Status bits affected: Equ
Example:
LI R0,>0401 |
NB These is no COCB, nor CZCB byte-oriented instruction.
The following operations shift the content of a registed, i.e. move all bits in it towards the left or the right.
Shifting one position to the left corresponds to a multiplication by two (just as moving the decimal point by one position corresponds to a multiplication/division by 10). Shifting to the right corresponds to a division by two. However, in the latter case we may have to take the sign bit into account, if we are dealing with signed values: therefore there are two shift-right instructions: one for logical operands, one for arithmetic operands.
For each you can specify by how many positions you want to shift the bits: legal values are 1 to 15. Shifting by 0 has of course no effect, so this value is interpreted differently: If you specify a shift by 0, the number of positions is taken from the value in R0 (rounded to 15). If this value is zero, the bits will be shifted by 16 positions.
Shifts the content of the register by count positions to the left, filling up the bits on the right with 0. The result is compared to zero, the last bit shifted out is placed in the carry bit.
Status bits affected: High, Gt, Equ, Carry
NB There is no SLL (Shift Left Logically) because it's equivalent to SLA (Shift Left Arithmetic).
Example:
LI R0,>0401 |
Shifts the content of the register by count positions to the right, filling up the bits on the left with 0. The result is compared to zero, the last bit shifted out is placed in the carry bit.
Status bits affected: High, Gt, Equ, Carry, Ovf (=1 if sign bit is changed).
Example:
LI R1,>8401 |
Shifts the content of the register by count positions to the right, filling up the bits on the right with copies of the sign bit. The result is compared to zero, the last bit shifted out is placed in the carry bit.
Status bits affected: High, Gt, Equ, Carry
Example:
LI R1,>8002 (i.e -32766) |
Shifts the content of the register by count positions to the right, filling up the bits on the left with those ejected on the right. The result is compared to zero, the last bit shifted out is placed in the carry bit.
Status bits affected: High, Gt, Equ, Carry
NB There is no SLC (Shift Left Circular), but you can do it with SRC (Shift Right Circular): just substact the desired displacement from 16: SLC R1,5 is encoded by SRC R1,11.
Example:
LI R1,>0401 |
These opcodes affect the CRU. See above for the CRU addressing mode. I also have a whole page that discusses CRU issues. You may want to have a look at it.
Sets a CRU bit to 1. The bit number can be from 0 to 15 and is
relative
to the CRU address in R12.
Status bits affected: none
Sets a CRU bit to 0. The bit number can be from 0 to 15 and is
relative
to the CRU address in R12.
Status bits affected: none
Tests a CRU bit. The bit number can be from 0 to 15 and is relative
to the CRU address in R12. The bit is copied into the Equ bit in the
status
register.
Status bits affected: Equ
Loads nbits bits to the CRU, starting from the CRU address in R12. If there are 1 to 8 bits, they bits are taken (from right to left) from the most significant byte of the source register. If there are 9 to 16 bits, they are taken (from right to left) from the whole register.
Status bits affected: High, Gt, Equ, Carry. (Parity for 1-byte operations)
Reads nbits bits from the CRU, starting from the CRU address in R12. If there are 1 to 8 bits, they bits are stored (from right to left) into the most significant byte of the source register. If there are 9 to 16 bits, they are stored (from right to left) into the whole register.
Status bits affected: High, Gt, Equ, Carry. (Parity for 1-byte operations)
Finally, here are some opcodes that I couldn't fit in any of the above categories.
Loads an immediate value into the workspace pointer register of the TMS9900, effectively changing the workspace. Just make sure that immediate does not correspond to an address in ROM !
Status bits affected: none
Stores the content of the workspace pointer register of the TMS9900 (i.e. the address of the current workspace) into the destination register.
Status bits affected: none
NB. There is no LDWP, so if you want to change the value of the workspace you have two solutions:
* Assume we want to use the value in R0 for our new workspace * 1) Put the value in our program and perform a LWPI MOV R0,HERE+2 Replaces the >0000 below with our value * 2) Put the value in R13 and perform a dummy RTWP STST R15 Make sure the status won't be affected |
Stores the status register of the TMS9900 into the destination register.
Status bits affected: none
NB There is no LDST to load a value in the status register. But you can put it in R15 and perform a RTWP (just make sure that R13 contains the proper workspace, and R14 a valid address).
* Assume we want to put the value in R0 into the status register STWP R13 Make sure thje workspce won't change |
Loads a value from 0 to 15 into the interrupt mask part of the status register. Only interrupts with a level equal or smaller than this value will be recognized. On the TI-99/4A all interrupts are hardwired as level 1. Therefore:
LIMI 0 and LIMI 1 enable interrupts
LIMI2 to LIMI 15 disable interrupts
Status bits affected: interrupt mask
Swap the most significant and the least significant bytes of the destination operand.
Status bits affected: none.
Example:
LI R1,>A52B |
Executes the machine language instruction whose value is found in the destination operand. If the instruction requires additional memory words for operands, they will be taken after the X instruction, not after the address of the operand.
Status bits affected: depends on the instruction executed.
Example (also see the page on the TMS9900)
* This performs: B *R11 * This performs: LI R1,>1234 * This performs: NEG @>2000 TEST NEG @>0000 |
There are five opcodes that should never be used with the TI-99/4A, because they would be mistaken for CRU operations. These are:
CKON
CKOF
LREX
RSET
IDLE
See the TMS9900 page for details.
Besides opcodes, you can also include instructions to the assembler whithin your source file. These instructions may or may not generate code, some deal with a printout feature, some issue commands for the linker, etc.
They may vary from assembler to assembler, therefore I only included here the most common ones.
Instuctions that generate data: DATA, BYTE,
TEXT or reserve room for it BSS,
BES, EVEN
Instructions for the linker: DEF, REF
Instructions for the loader: AORG, RORG,
END
Others: COPY
This instuction is used to generate code that is not an opcode. Generally, to place data in your program.
value can be any number (or math expression) between 0 and >FFFF. It can also be entered as a 2-character string constant. If needed, many data words can be place on the same line, separed with comas.
Note that a data word will always be loaded at an even address, on a word boundary.
Example:
DATA 5,6,>8001,0 |
Pretty much like DATA, except that it generates only one byte, and is therefore not limited by word boundaries.
Value can be any number between 0 and 255, or a single-character string.
This instruction is used to insert data into your program. The data is specified in the form of a quoted string.
Example:
TEXT 'This is a test' |
These two instructions are used to reserve space into your program. Generally you will place data there. Contrarily to the above instructions, BSS (block starting with symbol) and BES (block ending with symbol) don't place values into memory: they just tell the loader to skip nbytes before to load the next instruction.
BSS only differs from BES when you use a label: the label value corresponds to the current address with BSS and to the current address plue nbytes with BES (i.e. the end address).
Example
CLRBUF LI R1,BUFFER Point to our buffer BUFFER BSS 256 Reserve 256 bytes of memory MOV R0,R1 Next procedure |
Is used to make sure the loading will continue from a word boundary. This instruction is usefull afer a TEXT statement, when you don't want to count the number of characters to know whether you must add a BYTE statement or not. EVEN issues a >00 data byte if the current address is uneven, it does nothing otherwise.
This instructions is used to pass one or more labels to the loader. The values of these labels will thereby be available for other files, or for the loader itself (to launch you program for instance).
label must be a valid label (1 to 6 characters, the first one being not a digit) and must be defined in the source file.
Example:
DEF START,INIT START MOV R11,R10 Entry point of my program INIT CLR @COUNT Procedure available to another file COUNT DATA 10 Data word available to another file END |
This is the mirror-image of DEF. It allows your source file to reference labels that are part of another source file. You can thereby access its variables, call its procedures, etc.
You are not allowed however to perform arithmetic on REF labels since these are undefined at assembly time
Example:
REF COUNT,INIT BL @INIT Call a procedure in another file |
This instruction will force the loader to load the program at a precise memory location. In general you won't use it, except to modify a precise data word. E.g. to place values in the non-maskable interrupt vectors at >FFFC-FFFF.
This instruction allows the loader to determine itself where the program should be loaded. The TI loader starts from >A000 towards >FFD8, and continues with the low-memory expansion (in which there is very limited space, since that's were the loader is located, and it also contains a symbol table for the REFs and DEFs).
RORG can be used to cancel the effect of an AORG.
If an offset is specified, it will be added to the current address. The effect is similar to that of the BSS instruction, except that you could specify a negative offset, thereby causing the loader to overwrite what it just loaded. (Sounds like a silly thing to do but it may be usefull with paged-memory devices).
This is the only instruction that must be part of any program. It tells the assembler to stop processing the source file.
If an optional label is specified, it will be used by the loader to automatically start the program once this file is loaded. Otherwise, you must DEFine a label and enter its name when the loader asks you where to start.
This allows for writing very long programs. It instructs the assembler to switch to the file specified in double-quotes. Once this file is completely assembled, the assembler will resume with the current one. Generally, assemblers don't let you nest COPY statements, i.e. you cannot have a COPY in a copied file. But you can have as many COPY as you want in the initial file.
Assembly language opcodes and operands are encoded into machine language according to nine fundamental formats. Each uses up a word of memory per opcode, possibly together with one or two extra words for operands.
Format | Operands | >80 | >40 | >20 | >10 | >08 | >04 | >02 | >01 | >80 | >40 | >20 | >10 | >08 | >04 | >02 | >01 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
I | source,dest | Opcode | B | Td | destination (reg) | Ts | source (reg) | ||||||||||
II | PC-relative | Opcode | PC-relative offset in words | ||||||||||||||
III | source,register | Opcode | register | Ts | source (reg) | ||||||||||||
IV | source,nbits | Opcode | nbits | Ts | source (reg) | ||||||||||||
V | register,count | Opcode | count | register | |||||||||||||
VI | dest | Opcode | Ts | source (reg) | |||||||||||||
VII | - | Opcode |
0 0 0 0 0 |
||||||||||||||
VIII | register,immed | Opcode |
0 |
register | |||||||||||||
IX | source,register | Opcode | register | Ts | source (reg) |
Ts and Td define the type of addressing for the source and destination operand respectively.
00: Rx
01: *Rx
10: @yyyy(Rx) or @yyyy if Rx = 0 This requires an additional memory
word
to store the yyyy value
11: *Rx+
Source and dest contain the workspace register, used in the way indicated be the addressing mode.
Immed operands also require an additional word to store the immediate value.
In almost all cultures throughout the ages, the numbering system is based on 10 digits. This is of course because we have 10 fingers (finger is "digitum" in latin...). The only exception I know of are that Mayas who used 20 digits. Guess why?
In our culture these ten digits are represented by ten symbols derived from arabic: "0", "1", "2", "3", "4", "5", "6", "7", "8", and "9".
With 10 digits we can represent ten numbers: zero through nine. Now what if we want to write down a larger number? Well, we just combine two digits: one for the tenth and one for the units. 23 means "two times ten, plus three", which is twenty-three. Similarly, we can add a third digit for the undredth, a fourth for the thousands, etc.
But nothing prevents us to use more (or less) than 10 digits. Let's say we want to use sixteen digits instead of ten. First we need to find names and symbols for the extra six. We could come up with goofy names (borg, spam, taku, etc) and fancy symbols, but we wouldn't be able to type them from a computer keyboard. Therefore let's keep it simple and decide that the extra digits will be "A", "B", "C", "D", "E" and "F" (lower case may or may not be ok). In our new system, "A" has the value ten, "B" is eleven, etc upto "F" which is fifteen.
To represent numbers greater than fifteen, we must combine two digits: one for the "sixteenth" and one for the units. >23 means "two times sixteen, plus three", which is thirty-five. This way we can write down 16*16=256 numbers.
To go further, we need a third digit whose value will be 256, a fourth one whose value will be 4096 (i.e. 16*16*16), etc.
For instance: >123B means "1 times 4096, plus 2 times 256, plus 3 times 16, plus 11" which is 4667.
We also need a way to distinguish our new numbering system, that we'll call "hexadecimal", from the good old decimal one. Obviously, any number containing digits from "A" through "F" has to be hexadecimal. But if I write 10, do I mean ten or sixteen?
Texas Instruments adopted the following convention: any hexadecimal number must be preceded by a "greater sign" sign. E.g. >1234
Most PC people use another convention: append a h to the number: 1234h (which also allows to append a d for decimal numbers, a b for binary, etc).
High-level languages like C and C++ use yet another convention: start any hexadecimal number with "0x". E.g. 0x1234. The "x" stands of course for hexadecimal and the leading 0 is only here so that the compiler won't mistake the number for a variable name (variable names cannot start with a digit from 0 to 9 in C/C++).
As mentionned above, we could also use less than 10 digits, if we wanted two. Let's say we use only two: "0" and "1". This will allow us to write down the numbers zero and one. To go further we'll need an extra digits for the pairs: 10 means "one pair, plus no unit" wich is two. And 11 means "one pair, plus one unit" which is three.
The next digits will a a weithg of 4, the next one a weight of 8, etc.
For instance, the number 11001101 reads as "1 times 128, plus 1 times 64, plus 0 times 32, plus 0 times 16, plus 1 times times 8, plus one times 4, plus 0 times 2, plus 1" which is 205 in decimal.
Why would we want to use such a clumsy numbering system? Because it's easy to translate in terms of hardware: 0 means "no current" and 1 means "current flowing" for instance. Or 0 could mean "0 volts" and 1 mean "5 volts".
That's how today's computers are built: they represent all numbers in binary and encode them as two voltage levels. In the early times of computing, there were also "anolog" computers, that were using 10 different voltage levels to encode decimal numbers. But these were technically difficult to built and were completely superceeded by digital computers.
As you have probably noted from the examples above, converting a hex number into a decimal number is not easy, especially with very large numbers (You have 30 seconds to calculate the decimal value of >123456789ABCDEF0).
The problem is even worse with binary numbers: how much is 10010110101101001001010 in decimal?
Converting decimal numbers to another base is also annoying: How much is 3333 is hexadecimal? We must do the following:
How many times >1000 (i.e. 4096) in 3333? Zero.
How many time >0100 (i.e. 256) in 3333? Thirteen, which is >D in
hex notation. The remainder is 3333-(13*256)=5.
How many times >0010 (i.e. 16) in 5? Zero.
How many times >0001 (i.e. 1) in 5. Five.
The result is thus: >0D05
By contrast, conversions between hexadecimal and binary are very easy. Do you see why? Because 16 is a power of 2, whereas 10 is not. Just compare the "weight" of the digits in the various bases:
Decimal: 1, 10, 100, 1000, 10000, etc.
Hexadecimal: 1, 16, 256, 4096, 65536, etc.
Binary: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
1024,
2048, 4096, 8192, 16384, 32768, 65536, etc.
Now do you see the pattern? Every fourth binary digit has the same weight than an hexadecimal digit (this never occurs with decimal). This means that we can split any binary number in groups of four digits and convert them individually into an hex digit.
To come back to the example above (how much is 10010110101101001001010 in hexadecimal?). Piece of cake:
100 1011 0101 1010 0100 1010 is
>4 B 5 A 4 A i.e. >4B5A4A
And conversely, to translate >1234 in binary:
>1 2 3 4
0001 0010 0011 0100 Done in five seconds!
All you need to know are the 16 first binary values. That's easier to memorize than the powers of two!
Hex | Binary | Hex | Binary |
---|---|---|---|
>0 | 0000 | >8 | 1000 |
>1 | 0001 | >9 | 1001 |
>2 | 0010 | >A | 1010 |
>3 | 0011 | >B | 1011 |
>4 | 0100 | >C | 1100 |
>5 | 0101 | >D | 1101 |
>6 | 0110 | >E | 1110 |
>7 | 0111 | >F | 1111 |
And this is why computer people use hexadecimal a lot. Note that we could also have used another power of two as a base. Eight for instance, using only digits "0" through "7". This is known as "octal" and is sometimes used, but much less often than hexadecimal. It has the advantage than you do not need extra digits.
To perform hexadecimal operations, we'll follow the exact same rules than for decimal operations:
>1234
+ >96FB
Lets start from the rightmost digit: "4" plus "B" (four plus eleven) is "F" (sixteen):
>1234
+ >96FB
F
Second digit: "3" plus "F" (three plus fifteen) is
eighteen (>12). We thus have a carry of sixteen and put down two.
1
>1234
+ >96FB
2F
Now, "2" plus "6", plus the carried "1" is "9".
>1234
+ >96FB
92F
And finally "9" plus "1" is "A". Et voila!
>1234
+ >96FB
>A92F
We can do the same thing in base 2:
1 11 1
100010 100010 100010 100010 100010 100010
+010110 +010110 +010110 +010110 +010110 +010110
0 00 000 1000 01000 101000
carry 1 carry report
again carry
I'll let you do substractions as an exercise...
It's often necessary to deal with negative numbers. Therefore several conventions have been established to represent a negative number in binay format. Generally, the leftmost bit is used a a sign bit: e.g."0" means positive and "1" means negative.
The remaining bits may represent the number, and they do in some conventions. However, the most common convention is "two's complement". It is the one used by Texas Instruments for the TI-99/4A.
In two's complement notation, negative numbers are represented as follows:
>FFFF is -1
>FFFE is -2
>FFFD is -3
...
>8001 is -32767
>8000 is -32768
The big advantage of this notation is that >FFFF is greater than >FFFE, which is mathematically correct (-1 is greater than -2), and also true for unsigned numbers.
Problem only occur when comparing a negative number with a positive one: >FFFF is greater than >0001, but -1 is smaller than 1.
That's why the TMS9900 status register contains two status bits for number comparisons: "high" that deals with unsigned values, and "Gt" that considers the values as signed.