Abstract
Let
be a numerical sequence, or time series, and let
be an integer. Define a collection of
-dimensional vectors
,
by
the formula
, where
, and define the matrix
Define the mean vector
Subtracting this vector from each of
we get
the matrix
of centered vectors
.
Consider the covariance matrix of the vectors
considered as a n-sample of
-dimensional vectors and apply the principal component method to
this sample. Let
be the matrix of eigen-vectors of the covariance matrix of
.
The standard for the principal component analysis operations
of computing principal components:
and reconstruction of the initial (centered) sample
based on a selected number r of principal components:
can be applied as usual. After reconstruction of the matrix
, the initial sequence is
reconstructed by averaging over the diagonals of
:
We have applied this method of time series analysis to many practical problems, and the method proved to be very powerful even in the cases when the time series were nonstationary and short, with the value of N starting at 20. The method has been generalized to multivariate time series and random fields. A lot of work has been done to study theoretical properties of the method. Below we describe an application of the method to the forecast problem.
Consider functions f of a discrete argument
that
generate multivariate samples
lying in
hyperspaces of a dimension smaller than
. It can be shown that
the class of these functions contains solutions of finite difference
linear equations with constant coefficients, that is
where
are polynomials in i of degree k, and
are arbitrary.
If the
-dimensional centered sample
, generated by the
numerical sequence
, belongs to an r-dimensional
hyperplane
with
then the sequence
is
said to have rank r. The numerical sequence
of
rank r is called an expansion of the sequence
(which
also has rank r) if the multivariate sample
generated
by the former lies in the same hyperplane as
generated by
the latter.
Let us now briefly describe the main idea of the forecast algorithm we
propose. Let a numerical sequence
of rank r be
given. Consider the basis of the hyperplane containing the multivariate
sample
generated by this sequence:
.
We can assume that the vectors
are
eigen-vectors of the covariance matrix of
that correspond to
the positive eigen-values. Let the equation of this hyperplane be given
in the parametric form:
where
are parameter values corresponding to
a point
. Any point of a centered multivariate sample
relates to a parameter value
, that is
Let a numerical sequence
be an expansion of
. In this case the last, (
)-th, point of
the sample corresponds to a parameter value
. Since
, it is always possible to compute the value
through
, and therefore all components of
are expressed as
we can therefore pass to the noncentered sample and, furthermore, to
the original sample.
On the base of the above idea the algorithm and corresponding software have been developed to forecast the behavior of general time series and systems of correlated time series, which is of primary importance in many problems of finance and economics. The algorithm has been theoretically studied and numerically tested on many data sets including certain market research data, the results are going to be displayed and discussed in the report. The algorithm has been found to be very accurate and applicable to a wide range of stationary and nonstationary data, alike.
Keywords: Non-stationary processes, forecast of time series, principal components.