This page describes an assembly-language implementation of indexed AVL trees.

Introduction

Rotations

Inserting nodes

Deleting nodes

Binary trees are very usefull to perform ultra-fast dictionnary-type searches. The principle is very simple: the data are arranged in a balanced, binary tree in which each node contains two pointers, to a larger and a smaller value. Here is an example:

"DETERMINE"

/ \

"BULL" "FROG"

/ \ / \

"ADAM" "CLOUD" "GUTS"

To search for a string, start from the top and compare the target string with that in the node. If it is smaller take the left link, if it is larger follow the right link. In this way, you can search through a thousand words with 10 comparisons, through a million with 20, a billion with 30, etc.

This is fine for sets of data that can be arranged in advance: start with the middle value and put it at the top. On level two, put the values that are at 1/4 and 3/4, then those at 1/8, 3/8, 5/8 and 7/8, etc.

The problems begin when the user is allowed to enter new values, or to delete existing ones. Suppose the user enters values in alphabetical order, if we just follow a "go left / go right" strategy, we'll end up with:

"ADAM"

/ \

"BULL"

/ \

"CLOUD"

/ \

"ERIC"

Which defeats the purpose of a binary tree.

AVL trees were created by G.M. Adel'son-Velskii and E.M. Landis to overcome this problem. The idea is very elegant: each node contains an additional value, the balance, that indicates which child subtree is heaviest (i.e. the depth in levels of the right subtree is substracted from that of the left subtree). A value of 0 indicates that this node is perfectly balanced. Values of +1 and -1 are unavoidable, but anything beyond that is not tolerated and will cause the tree to rearrange.

But first, let's see how we can search a binary tree:

* Node structure: *---------------------------------------------------------- *---------------------------------------------------------- *---------------------------------------------------------- |

You may wonder what's with the stack pointed by R8. I'm using it to save the route from the root of the tree to the current node: at each step I save a pointer to the node and a flag saying whether we followed the left link or the right link. This is not strictly required for the above routines, but it will be essential for inserting and deleting nodes. Rather than writing two versions of FIND, one with and one without route tracking, I decided to implement it anyhow.

Rotations

A tree can be rearranged to restore balance by just rotating a node and its child:

A balance = +2 ----LL------> B balance = 0

/ \ rotate A-B / \

B balance = +1 to the A C balance = 0 0

/ \ left / \ / \

C balance = 0

Now we should check what the effect on the balance will be. There are two possibilities, as node B can be balanced or not (B could be balanced if the rotation was caused by the deletion of a node at the left of A, for instance).

Old balance | New balance | Tree depth | |||
---|---|---|---|---|---|

A | B | A | B | Insertion | Deletion |

+2 | 0 | +1 | -1 | (never) | Unchanged |

+2 | +1 | 0 | 0 | Unchanged | Changed |

In fact, one can generalize the above situation and consider the cases when A and B are not terminal nodes (aka "leaves") but have subtrees themselves. In the following sheme, subtrees are denoted by lower-case letters, while individual nodes are represented by upper-case letters.

A ----LL------> B

/ \ rotate / \

a1 B to the A b2

/ \ left / \

b1 b2 a1 b1

Obviously, A must have a balance of +2 to cause a rotation, but B could be balanced or not. If this situation was cause by an insertion (in one of the B subtrees) , then B cannot be balanced.

What will be the effect of such a rotation on the overall tree depth? It depends on the balance of B and on the operation that caused imbalancing: deletion or insertion. An insertion will cause subtree b2 to become longer (insetions in b1 require another type of rotation, see below). As we are moving b2 one level up, it will still reach the same depth. But what about a1? We know that a1 must be exactly one node shorter than b2 before the insertion: if it were 2 nodes shorter a rotation would have occured before, it it were the same length (or larger) no rotation would be required. We are now moving a1 one level down, which means that it will reach the same depth than b2 prior to the insertion. In conclusion, the tree depth is not affected by a rotation caused by an insertion.

Things are trickier with a deletion since B could be imbalanced prior to the deletion: b2 could be larger than b1 (the opposite is also possible but calls for another type of rotation). After the deletion, the rotation moved b2 one level higher, which shortened the overall depth of the tree (of course a1 went down, but remember we deleted a node in it, so it reaches the same depth as before). If B was balanced before the deletion, the overall tree depth remains unchanged because b1 did not move and reaches the same depth as before.

Why are we so concerned about tree depth? Because we must consider the possibility that our "tree" could in fact be a subtree of a larger tree. In which case, if the depth changes, we must step one level up and see what will be the effect it has on the balance of the parent node. But this will be discussed later.

For now, I'd like to discuss the other types of rotations: firstly, we have the mirror-image of the above situation, which can be solved by a rotation to the right:

B ----RR------> A

/ \ rotate / \

A b to the a1 B

/ \ right / \

a1 a2 a2 b

Old balance | New balance | Tree depth | |||
---|---|---|---|---|---|

C | B | C | B | Insertion | Deletion |

-2 | 0 | -1 | +1 | (never) | Unchanged |

-2 | -1 | 0 | 0 | Unchanged | Changed |

This situation is a bit trickier and requires a double rotation, first
right between B and C, then left between A and B:

A A --L--> B

/ \ / \ / \

a1 C --R--> a1 B A C

/ \ / \ / \ / \

B c1 b1 C a1 b1 b2 c1

/ \ / \

b1 b2 b2 c1

There are three possible cases, depending on the balance of B (in the case of an insertion, there are only two cases, as B cannot be balanced after insertion).

Old balance | New balance | Tree depth | |||||
---|---|---|---|---|---|---|---|

A | C | B | A | B | C | Insertion | Deletion |

+2 | -1 | +1 | -1 | 0 | 0 | Unchanged | Changed |

+2 | -1 | 0 | 0 | 0 | 0 | (never) | Changed |

+2 | -1 | -1 | 0 | 0 | +1 | Unchanged | Changed |

In the case of an insertion, in either b1 or b2, the overall tree length will not be affected: the expanded subtree is brought one level up, so it will reach the same depth as before. Subtree c1 does not move so we don't need to worry about it. And subtree a1 must be exactly one node shorter than b1 and b2, prior to the insertion (otherwise a rotation would have occured before, or would not be necessary now). The rotation brings a1 down by one level, which means that it now reached the same depth than b1 and b2 prior to the rotation. In conclusion: a double-rotation caused by an insertion does not change the depth of the tree.

What about deletions? If a1 is trucated, and b1 and b2 are brought one level up, the only subtree that could maintain the depth of the tree would be c1. But we know that c1 does not reach as deep as b1/b2, otherwise a double-rotation wouldn't be necessary: we could just rotate A and C. So double-rotations caused be a deletion always change the depth of the tree.

Finally, the mirror-image of the above situation requires a left-right
double rotation:

C C --R--> B

/ \ / \ / \

A c1 --L--> B c1 A C

/ \ / \ / \ / \

a1 B A b2 a1 b1 b2 c1

/ \ / \

b1 b2 a1 b1

Did you notice? We ended up with the very same situation than in the case of a right-left rotation. Now isn't that convenient? It means that the effects on the balance and tree depth will be the same.

Old balance | New balance | Tree depth | |||||
---|---|---|---|---|---|---|---|

C | A | B | A | B | C | Insertion | Deletion |

-2 | +1 | +1 | -1 | 0 | 0 | Unchanged | Changed |

-2 | +1 | 0 | 0 | 0 | 0 | (never) | Changed |

-2 | +1 | -1 | 0 | 0 | +1 | Unchanged | Changed |

Here are the four assembly routines that perform these rotations.

NB. Don't worry about the "left count" lines, this is a feature
of indexed trees, it will be discussed later.

*-------------------------------------------- A B |

Insertion

Now that we know how to rearrange a tree by rotating a node, we can insert new elements without imbalancing the tree. First, we insert the element where it belongs. Then we check the effect on its parent node's balance. Two cases are possible

E --> E E ---> E

/ \ / \ / \ / \

N D D N

E was balanced E was imbalanced

now it's not now it's balanced

In both cases, the balance of E remains in the acceptable range. However, in the first case we have increased the depth of the 'E' subtree by one level. This may result in imbalancing a node above the level of E. We must thus go one level up, and repeat our checks. This time, a third situation may occur, when the node was already imbalanced and is now even more tilted. This will require a rotation.

B ---> B --LL--> E

/ \ / \ / \

E E B N

/ \ / \ / \ / \

N

B was imbalanced Now it's not aceptable We must rotate

but acceptable (balance B = 2) to rebalance this level

Theoretically we should check what effect the rotation has on the upper levels, but rotations caused be insertions never change the depth of the tree, so we can dispense with it.

The insertion algorithm thus become:

*---------------------------------------------------------- *--------------------------------- * |

Deletion

Deletions are more complicated than insertions, because the target node is not necessarily at the end of a branch. In fact, tree cases are possible:

B ---> B B ---> B B ---> ???

/ \ / \ / \ / \ / \

R R T R

/ \ / \ / \ / \

T M T

R is a leaf (no children) R is a branch (1 child) R is a subtree (2 children)

Remove it Connect child + parent We are stuck!

The last case is a bummer, because there is no way we can connect both children to the parent node. We must use a little trick here:

B ---> B ---> B

/ \ / \ / \

R S S

/ \ / \ / \

M T M T M T

/ \ / \ \

S X S X X

We want to remove R Install S in place of R Now delete from were S was

Next node in order is S

When deleting a node, we must determine the effect it has on its parent node. Three cases are possible:

C ---> C C ---> C C ---> C

/ \ / \ / \ / \ / \ / \

A R A R A R A

/ \ / \

B B

Balanced Imbalanced Imbalanced Balanced Imbalanced Not acceptable

Depth unchanged Tree shortened We must rotate

*---------------------------------------------------------- |

A faster way to install the next in-order node in place of the current one would be to just swap keys between the nodes. This however may be dangerous: if the calling program memorized the location of the nodes, swapping them in memory could create havoc. Thus, it is safer to swap pointers but to leave the nodes where they are.

Threaded trees

You may have noted in the above drawings that when a node has less than two children, the remaing links are empty. It seems a pity not to make use of these. And that's precisely what threaded trees are for.

In a threaded tree, when a node has no right child, this pointer is replaced with a pointer to the next in-order node (wherever in the tree this node may be). Similarly, when there is no left child, the left link is replaced with a pointer to the previous node in order. This requires some extra code to maintain the pointers, but it allows for faster travel whithin the tree: finding the next or previous in-order node does not require tracing back our route to previous levels. As a compromise, there are right-threaded trees that only have pointers to the next in-order, and left-threaded trees that only have pointers to the previous in-order node.

Of course this requires using two extra bits to tell whether a link points to a child node or to an in-order successor, My suggestion would be to use the least significant bit: since all addresses are even on the TI-99/4A, an odd address will indicate an in-order pointer (that last bit will be cleared when using this pointer).

Here is an exemple of a threaded tree (lower-case letters denote in-order pointers):

D D D

/ \ / \ / \

B F B F B F

/ \ / \ / \ / \ / \ / \

A C E H A C E H A C E H

/ \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \

G I b b d d f G I b d f G I

/ \ / \ / \ / \ / \ / \

f h g h

Not threaded Fully threaded Right-threaded

Note how easy it is to walk the treaded tree:

- Start from the top of the tree, or from a given node.
- From the current node, go all the way to the left, to find the smallest node and return it.
- Get its right link. If it's an in-order pointer, return this node. Then continue at 2.
- If it's a regular child pointer, go to 2.

By contrast, here are the routines required to walk a non-threaded tree. As you see, we must walk our way up the calling stack each time the node does not have a child on the proper side.

*----------------------------------------------------------- |

The problem with a tree structure is that it is quite difficult to answer questions like: "Which is the 87th node in the tree?", or "What is the rank of this node whitin the in-order list?". Arrays are much better at this, but then again arrays are slow to search. Well, maybe we can have the best of two worlds? All we need to do is to add an extra variable in each node: the number of nodes in the left subtree. This will require some overhead code to maintain the count, but it allows us very fast indexing strategies.

To find the node at position x:

- Start from the top of the tree.
- Compare the target value with its left-count.
- If it's equal we found it. Return this node
- If it's lower take the left link and try again
- If it's higher take the right link, after substracting the left-count (+1 for the current node) from the target value.

To find the rank of a node:

- Initialize a counter as 0.
- Add the left-count of the current node to the counter
- Go up one level. Was the node a left-child or a right-child?
- If it was a left child, ignore the current node and go to 3.
- If it was a right-child, increment the count and go to 2.
- If there is no parent node, we are done.

*---------------------------------------------------------- |

Testing

Here is a sample test program. You can also download the whole file from here.

*----------------------------------------------------------- |

The AVL Page. On Ben Pfaff website at http://www.msu.edu/user/pfaffben/avl Comprises many usefull links.

Brad Appleton's AVL library in C language. The doc includes an
excellent
explanation of AVL trees.

ftp://ftp.cdrom.com/pub/algorithms/c/avllib/

Revision 1. 6/17/00. Ok to release.

Revision 2. 9/3/00. Discussed tree depth, subtrees. Corrected bugs in code.