Rank Ordering of Large Heterogeneous Data

George J. Miel
University of Nevada
Miel@cleanhead.cs.unlv.edu

Abstract

A problem of interest in quantitative finance deals with establishing a ranking of elements of a given set based on pairwise comparisons of the elements. Applications of this nature include investment planning, portfolio evaluation, resource allocation, and others. We deal with computational issues that arise for a type of model based on rank deficient least squares when the data set is very large and heterogeneous.

Consider a graph with n nodes and m directed arcs, with tex2html_wrap_inline141 , in which each node represents an item with a numerical measure of worth attached to it, and where an arc from node j to node k means that the two items have been pairwise compared. Each row of the tex2html_wrap_inline147 incidence matrix C has exactly two nonzero entries, with -1 in column j and +1 in column k indicating that there is an arc from node j to node k. If tex2html_wrap_inline163 is the worth of the j-th item and tex2html_wrap_inline167 denotes the observed difference in worth attached to the i-th paired comparison then the tex2html_wrap_inline147 overdetermined system

  equation15

asks to estimate the worths tex2html_wrap_inline173 knowing pairwise differences of observed performances represented by the arcs of the graph. One can show that in the corresponding tex2html_wrap_inline175 normal system

  equation20

the matrix tex2html_wrap_inline177 has rank n-q where q is the number of connected components of the graph. Diverse methods are available for finding the general least squares solution of the system (1). However the computation can be carried out more efficiently by solving at most q nonsingular systems of smaller sizes.

The case with q=1, when the graph is fully connected, can be handled as follows: replace anyone equation of the tex2html_wrap_inline175 normal system (2) by an appropriate equation

displaymath133

thus getting a full-rank tex2html_wrap_inline175 auxilliary system

  equation32

whose unique solution can then be used to get the general solution. Theorem. If the graph is connected and the auxilliary equation satisfies tex2html_wrap_inline191 then the matrix tex2html_wrap_inline193 is nonsingular and the set of least squares solution of Cx=d is given by

displaymath134

where tex2html_wrap_inline197 and tex2html_wrap_inline199 is an arbitrary scalar. The arbitrary vector tex2html_wrap_inline201 shows that since the general solution depends on observed differences of worths it is invariant to an additive shift. In the general case with rank n-q, there will be q linearly independent arbitrary vectors. If the system (3) is extremely large, a promising means of solution for some applications, is to apply a Discrete Wavelet Transform (DWT) so as to make a large fraction of coefficients so small as to be negligible, and then use a specialized algorithm for sparse systems. In other words, if the matrix A compresses well under a transform D, the sparse system

  equation56

where tex2html_wrap_inline207 and tex2html_wrap_inline209 , is solved to get the desired (approximate) solution tex2html_wrap_inline211 .

The incidence matrix C is stored as two integer m-vectors MINUS and PLUS whose j-th elements denote the column index of -1 and +1 respectively in the j-th row of C. Use of the outer product yields a remarkably simple algorithm for evaluating the matrix tex2html_wrap_inline177 :

 
 tex2html_wrap_inline229    tex2html_wrap_inline231 

for ¯j=1:m do:

tex2html_wrap_inline235

tex2html_wrap_inline237

tex2html_wrap_inline239

tex2html_wrap_inline241

tex2html_wrap_inline243

tex2html_wrap_inline245

enddo

The vector tex2html_wrap_inline247 is evaluated with a similar algorithm also based on the outer product.

The above algorithm points to useful properties of the matrix A. The diagonal elements are nonnegative and the off-diagonal elements are nonpositive. The element tex2html_wrap_inline251 is the total number of arcs to or from node i whereas tex2html_wrap_inline255 , is the number of arcs between nodes i and k. These properties of A combined with a labelling procedure from graph theory yield an efficient algorithm for identifying the components of the graph.

Putting the pieces together, the general least squares solution to the system (1) is determined as follows. The tex2html_wrap_inline175 matrix A is computed and it is then used in the labelling procedure to partition the n nodes of the graph into q components with tex2html_wrap_inline271 items respectively. Observe that tex2html_wrap_inline273 . A component with tex2html_wrap_inline275 consists of an isolated node which has not been compared with any other node. Items in separate components are also non-rankable. For each component with tex2html_wrap_inline277 , form an tex2html_wrap_inline279 system tex2html_wrap_inline281 of rank tex2html_wrap_inline283 by extracting proper elements from A and b, replace one equation of this system by an appropriate auxilliary equation in order to get a nonsingular system tex2html_wrap_inline289 , solve this system using a conventional algorithm such as tex2html_wrap_inline291 factorization followed by front/back substitution, or if tex2html_wrap_inline293 is extremely large use a DWT to get a sparse system tex2html_wrap_inline295 followed by an appropriate solver, and then apply the theorem to get the general least squares solution.


Society of Computational Economics
Second International Conference on Computing in Economics and Finance
Geneva, Switzerland, 26-28 June 1996