Analysis and Forecast of Time Series on the Base of Principal Components

Dmitrii Danilov, Vladislav Solnsev and Anatoly Zhigljavsky
Department of Mathematics, St. Petersburg University
com@trend.niimm.spb.su

Abstract

We describe a method of analysis and forecast of time series based on the use of the principal component method applied to a multivariate sample which is obtained from the initial sample by the method of delays. The main idea of the method is as follows.

Let tex2html_wrap_inline155 be a numerical sequence, or time series, and let tex2html_wrap_inline157 tex2html_wrap_inline159 be an integer. Define a collection of tex2html_wrap_inline161 -dimensional vectors tex2html_wrap_inline163 , tex2html_wrap_inline165 by the formula tex2html_wrap_inline167 , where tex2html_wrap_inline169 , and define the matrix

eqnarray22

Define the mean vector tex2html_wrap_inline171 Subtracting this vector from each of tex2html_wrap_inline163 we get the matrix tex2html_wrap_inline175 of centered vectors tex2html_wrap_inline177 .

Consider the covariance matrix of the vectors tex2html_wrap_inline177 considered as a n-sample of tex2html_wrap_inline161 -dimensional vectors and apply the principal component method to this sample. Let

eqnarray44

be the matrix of eigen-vectors of the covariance matrix of tex2html_wrap_inline177 .

The standard for the principal component analysis operations of computing principal components: tex2html_wrap_inline187 and reconstruction of the initial (centered) sample based on a selected number r of principal components:

eqnarray65

can be applied as usual. After reconstruction of the matrix tex2html_wrap_inline191 , the initial sequence is reconstructed by averaging over the diagonals of tex2html_wrap_inline193 :

eqnarray82

We have applied this method of time series analysis to many practical problems, and the method proved to be very powerful even in the cases when the time series were nonstationary and short, with the value of N starting at 20. The method has been generalized to multivariate time series and random fields. A lot of work has been done to study theoretical properties of the method. Below we describe an application of the method to the forecast problem.

Consider functions f of a discrete argument tex2html_wrap_inline155 that generate multivariate samples tex2html_wrap_inline177 lying in hyperspaces of a dimension smaller than tex2html_wrap_inline161 . It can be shown that the class of these functions contains solutions of finite difference linear equations with constant coefficients, that is   tex2html_wrap_inline205 where tex2html_wrap_inline207 are polynomials in i of degree k, and tex2html_wrap_inline213 tex2html_wrap_inline215 are arbitrary.

If the tex2html_wrap_inline161 -dimensional centered sample tex2html_wrap_inline175 , generated by the numerical sequence tex2html_wrap_inline155 , belongs to an r-dimensional hyperplane tex2html_wrap_inline225 with tex2html_wrap_inline227 then the sequence tex2html_wrap_inline155 is said to have rank r. The numerical sequence tex2html_wrap_inline233 of rank r is called an expansion of the sequence tex2html_wrap_inline155 (which also has rank r) if the multivariate sample tex2html_wrap_inline241 generated by the former lies in the same hyperplane as tex2html_wrap_inline243 generated by the latter.

Let us now briefly describe the main idea of the forecast algorithm we propose. Let a numerical sequence tex2html_wrap_inline155 of rank r be given. Consider the basis of the hyperplane containing the multivariate sample tex2html_wrap_inline175 generated by this sequence: tex2html_wrap_inline251 . We can assume that the vectors tex2html_wrap_inline251 are eigen-vectors of the covariance matrix of tex2html_wrap_inline175 that correspond to the positive eigen-values. Let the equation of this hyperplane be given in the parametric form: tex2html_wrap_inline257 where tex2html_wrap_inline259 are parameter values corresponding to a point tex2html_wrap_inline261 . Any point of a centered multivariate sample relates to a parameter value tex2html_wrap_inline263 , that is tex2html_wrap_inline265 Let a numerical sequence tex2html_wrap_inline233 be an expansion of tex2html_wrap_inline155 . In this case the last, ( tex2html_wrap_inline271 )-th, point of the sample corresponds to a parameter value tex2html_wrap_inline273 . Since tex2html_wrap_inline227 , it is always possible to compute the value tex2html_wrap_inline273 through tex2html_wrap_inline279 , and therefore all components of tex2html_wrap_inline281 are expressed as tex2html_wrap_inline283 we can therefore pass to the noncentered sample and, furthermore, to the original sample.

On the base of the above idea the algorithm and corresponding software have been developed to forecast the behavior of general time series and systems of correlated time series, which is of primary importance in many problems of finance and economics. The algorithm has been theoretically studied and numerically tested on many data sets including certain market research data, the results are going to be displayed and discussed in the report. The algorithm has been found to be very accurate and applicable to a wide range of stationary and nonstationary data, alike.

Keywords: Non-stationary processes, forecast of time series, principal components.


Society of Computational Economics
Second International Conference on Computing in Economics and Finance
Geneva, Switzerland, 26-28 June 1996