Abstract
This paper addresses an aspect of this approach to policy analysis which has received little attention in the recent literature, namely the fact that estimation and control are treated separately. Once the parameters of the structural model have been estimated, the search for an optimal policy rule is conducted as if the parameter values were known and equal to the estimates. Thus, the standard approach neglects the feedback between the control rule and the policymaker's beliefs about the unknown parameters, which arises because current policy actions not only affect current target performance but may provide useful information and, therefore, improve future performance. This paper models the learning process of the policymaker explicitly, studies the time series implications of a policy that separates control and estimation, and characterizes the optimal policy that takes advantage of the information-accumulating effect of current policy actions for a simple model with two unknown parameters.
First, this paper shows that a policy that separates control and estimation may result in a suboptimal target performance. Even though the model is reestimated in every period and the policy updated accordingly, the standard approach frequently results in a persistent bias in the policy and target variables. The reason is policy actions that are chosen on the basis of incorrect beliefs about the unknown parameters may, in turn, reinforce these incorrect beliefs. This problem becomes particularly relevant for monetary policy, whenever financial innovations result in structural changes in money demand and supply parameters that play an important role in the policymaker's targeting procedure.
Secondly, this paper solves the simultaneous control and estimation problem of a policymaker who lacks knowledge about two structural parameters that affect the relationship between policy instrument and policy target. This constitutes a large-scale nonlinear dynamic programming problem for which there exist no analytical solutions. In order to calculate the optimal policy as a function of the policymaker's beliefs we develop a numerical dynamic programming algorithm which can deal with a five state variable problem. These five state variables arise because the beliefs about the unknown parameters are modelled as a bivariate distribution which is characterized by the means, the variances and the covariance and updated every period according to Bayes rule. We find that the optimal policy differs drastically from a policy which separates control and estimation. It results in much faster learning and avoids the emergence of long-term biases in the policymaker's beliefs and actions and, therefore, dramatically improves target performance.
Due to its scale, this problem can only be solved by exploiting
computational resources to the limit. We explore parallel computing
techniques that break up the problem into multiple parts and assign
the computational tasks to different computers. Dynamic programming
problems lend themselves to this technique. They require repeated
iterations over functional equations implying that a specific set of
calculations is executed many times. A need for communication between
processors only arises at long intervals, every time that a value
iteration step is completed and the tables containing value and policy
function approximations are updated.