Difference Equations

1. Difference Equations

1.1. Introduction

Time series analysis deals with a series of random variables.

1.2. First Order Difference Equations

We will study time indexed random variables {y_t}.

Let {y_t} be a linear function of {y_{t-1}} and {w_t}.

\displaystyle  y_t = \phi{y_t} + w_t  \ \ \ \ \ (1)

Equation 1 is a linear first-order difference equation. It is a first-order difference equation because {y_t} only depends on {y_{t-1}} and not on other previous {y_t}s.

In this chapter, we treat {w_t} as a deterministic number and later on we will analyse the effects of treating it as a random variable.

1.3. Solution by Recursive Substitution

The equations are:

\displaystyle  \begin{array}{rcl}  y_0 = \phi{y_{-1}} + w_t \\ y_1 = \phi{y_0} + w_t \\ \vdots \\ y_{t-1} = \phi{y_{t-2}} + w_{t-1} \\ y_t = \phi{y_t} + w_t \end{array}

By recursively substituting we obtain:

\displaystyle  y_t = \phi^{t+1}y_{-1} + \phi^tw_1 + \phi^{t-2}w_2 + \dotsc + \phi{w_{t-1}} + w_t  \ \ \ \ \ (2)

1.4. Dynamic Multipliers

We want to know the effect of increasing {w_t} on {y_{t+j}}. This can be obtained by differentiating equation 10 with respect to {w_t}.

\displaystyle  \frac{\partial y_{t+j}}{\partial w_t} = \phi^{j} \ \ \ \ \ (3)

2. pth-Order Difference Equations

We generalize the above dynamic system to let the value of y to depend on p of its own lags in addition to the current value of the input variable {w_t}.

\displaystyle  y_t = \phi_{1}y_{-1} + \phi^tw_1 + \phi^{t-2}w_2 + \dotsc + \phi{w_{t-1}} + w_t  \ \ \ \ \ (4)

We will rewrite the above p-th order equation to a vector first order equation.

We define,

\displaystyle  \xi_t= \begin{bmatrix} y_t \\ y_{t - 1} \\ \vdots \\ y_{t - p + 1} \end{bmatrix},\quad \ \ \ \ \ (5)

\displaystyle  F = \begin{bmatrix} \phi_{1} & \phi_{2} & \phi_{3} & \cdots & \phi_{p - 1} & \phi_{p} \\ 1 & 0 & 0 & \cdots & 0 & 0\\ 0 & 1 & 0 & \cdots & 0 & 0\\ \vdots & \vdots & \vdots & \cdots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & 0 & 0 \end{bmatrix},\quad \ \ \ \ \ (6)

\displaystyle  v_t= \begin{bmatrix} w_t \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix},\quad \ \ \ \ \ (7)

Then,

\displaystyle  \xi_t = F\xi_{t-1} + v_t  \ \ \ \ \ (8)

or,

\displaystyle  \begin{bmatrix} y_t \\ y_{t - 1} \\ y_{t - 3;} \\ \vdots \\ y_{t - p + 1} \end{bmatrix}\quad = \quad \begin{bmatrix} \phi_{1} & \phi_{2} & \phi_{3} & \cdots & \phi_{p - 1} & \phi_{p} \\ 1 & 0 & 0 & \cdots & 0 & 0\\ 0 & 1 & 0 & \cdots & 0 & 0\\ \vdots & \vdots & \vdots & \cdots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & 0 & 0 \end{bmatrix}\quad \begin{bmatrix} y_{t - 1} \\ y_{t - 2} \\ y_{t - 3} \\ \vdots \\ y_{t - p} \end{bmatrix}\quad + \quad \begin{bmatrix} w_t \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix}\quad \ \ \ \ \ (9)

Following the approach taken for solving first order difference equation and applying it to the vector equation, we get,

\displaystyle  \xi_t = F^{t+1}\xi_{-1} + F^tv_0 + F^{t-1}v_1 + F^{t-2}v_2 + \dotsc + F{v_{t-1}} + v_t  \ \ \ \ \ (10)

3. General Solution of a p-th Order Difference Equation

If the eigenvalues of F matrix are distinct then we can write F as

\displaystyle  F = T\Lambda T^{-1} \ \ \ \ \ (11)

\displaystyle  \Lambda = \begin{bmatrix} \lambda_{1} & 0 & 0 & \cdots & 0 \\ 0 & \lambda_{2} & 0 & \cdots & 0 \\ 0 & 0 & \lambda_{3} & \cdots & 0 \\ \vdots & \vdots & \vdots & \cdots & \vdots \\ 0 & 0 & 0 & \cdots & \lambda_{p} \end{bmatrix},\quad \ \ \ \ \ (12)

where T is a non-singular matrix.

Thus,

\displaystyle  F^2 = T\Lambda^2 T^{-1} \ \ \ \ \ (13)

and

\displaystyle  \Lambda^2 = \begin{bmatrix} \lambda_{1}^2 & 0 & 0 & \cdots & 0 \\ 0 & \lambda_{2}^2 & 0 & \cdots & 0 \\ 0 & 0 & \lambda_{3}^2 & \cdots & 0 \\ \vdots & \vdots & \vdots & \cdots & \vdots \\ 0 & 0 & 0 & \cdots & \lambda_{p}^2 \end{bmatrix},\quad \ \ \ \ \ (14)

In general,

\displaystyle  F^n = T\Lambda^n T^{-1} \ \ \ \ \ (15)

and

\displaystyle  \Lambda^n = \begin{bmatrix} \lambda_{1}^n & 0 & 0 & \cdots & 0 \\ 0 & \lambda_{2}^n & 0 & \cdots & 0 \\ 0 & 0 & \lambda_{3}^n & \cdots & 0 \\ \vdots & \vdots & \vdots & \cdots & \vdots \\ 0 & 0 & 0 & \cdots & \lambda_{p}^n \end{bmatrix},\quad \ \ \ \ \ (16)

Ordinary Least Squares Under Standard Assumptions

Suppose that a scalar {y_t} is related to a {(k \times 1)} vector, {x_t} and a disturbance term {u_t} according to the regression model.

\displaystyle y_t = x_t^T \beta + u_t \ \ \ \ \ (1)

In this article, we will study the estimation and hypothesis testing of {\beta} when {x_t} is deterministic and {u_t} is i.i.d. Gaussian.

1. The Algebra of Linear Regression

Given a sample of T values of {y_t} and the vector {x_t}, the ordinary least squares (OLS) estimate of {\beta}, denoted as {b}, is the value of {\beta} which minimizes the residual sum of squares (RSS).

\displaystyle RSS = \sum_{t=1}^T (y_t - x_t^T b)^2 \ \ \ \ \ (2)

The OLS estimate of {\beta}, b, is given by:

\displaystyle b = \bigg(\frac{1}{T}\sum_{t=1}^T x_tx_t^T \bigg)^{\!-1} \!\!\cdot\, \frac{1}{T}\sum_{t=1}^n x_ty_t. \ \ \ \ \ (3)

The model is written in matrix notation as:

\displaystyle y = X\beta + u. \ \ \ \ \ (4)

\displaystyle \bold{y}= \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_T \end{bmatrix},\quad \bold{x}= \begin{bmatrix} x_1^T \\ x_2^T \\ \vdots \\ x_T^T \end{bmatrix},\quad \bold{u}= \begin{bmatrix} u_1 \\ u_2 \\ \vdots \\ u_T \end{bmatrix}. \ \ \ \ \ (5)

where {y} is a {T \times 1} vector, {X} is a {T \times k} matrix, {\beta} is a {k \times 1} vector and {u} is a {T \times 1} vector.

Thus,

\displaystyle b = (X^TX)^{-1}X^Ty. \ \ \ \ \ (6)

Similarly,

\displaystyle \hat u = y - Xb = y - X(X^TX)^{-1}X^Ty = [I_T - X(X^TX)^{-1}X^T]y = M_Xy. \ \ \ \ \ (7)

where {M_X} is defined as:

\displaystyle M_X = [I_T - X(X^TX)^{-1}X^T]. \ \ \ \ \ (8)

{M_X} is a projection matrix. Hence it is symmetric and idempotent.

\displaystyle M_X = M_X^T. \ \ \ \ \ (9)

\displaystyle M_XM_X = M_X. \ \ \ \ \ (10)

Since {M_X} is the projection matrix for the space orthogonal to {X},

\displaystyle M_X^TX = M_XX = 0. \ \ \ \ \ (11)

Thus, we can verify that the sample residuals are orthogonal to {X}.

\displaystyle u^TX = y^TM_X^TX = 0. \ \ \ \ \ (12)

The sample residual is constructed from the sample estimate of {\beta} which is {b}. The population residual is a hypothetical construct based on the true population value of {\beta}.

\displaystyle u_t = y_t - x_t^T \beta. \ \ \ \ \ (13)

\displaystyle \hat u_t = y_t - x_t^T b. \ \ \ \ \ (14)

\displaystyle \hat u = y - Xb = [I_T - X(X^TX)^{-1}X^T]y = M_Xy = M_XX b + u. \ \ \ \ \ (15)

\displaystyle b = (X^TX)^{-1}X^Ty = (X^TX)^{-1}X^T(X\beta + u) = \beta + (X^TX)^{-1}X^Tu. \ \ \ \ \ (16)

The fit of OLS is described in terms of {R_u^2}, which is defined as the ratio of squares of the fitted values ({x_t^Tb)} to the observed values of {y}.

\displaystyle R_u^2 = \frac{\sum_{t=1}^T b^Tx_tx_t^Tb}{\sum_{t=1}^T y_t^2} = \frac{b^TX^TXb}{y^Ty}. \ \ \ \ \ (17)

2. Assumptions on X and u

We shall assume that

(a) X will be deterministic

(b) {u_t} is i.i.d with mean 0 and variance {\sigma^2}.

(c) {u_t} is Gaussian.

2.1. Properties of Estimated b Under Above Assumptions

Since,

\displaystyle b = (X^TX)^{-1}X^Ty = (X^TX)^{-1}X^T(X\beta + u) = \beta + (X^TX)^{-1}X^Tu. \ \ \ \ \ (18)

Taking expectations of both sides, we have,

\displaystyle \mathop{\mathbb E}(b) = \beta + (X^TX)^{-1}X^T\mathop{\mathbb E}(u) = \beta. \ \ \ \ \ (19)

And the variance covariance matrix is given by,

\displaystyle \mathop{\mathbb E}[(b - \beta)(b - \beta)^T] = \mathop{\mathbb E}[((X^TX)^{-1}X^Tu)((X^TX)^{-1}X^Tu)^T] = \sigma^2(X^TX)^{-1}. \ \ \ \ \ (20)

Thus b is unbiased and is a linear function of y.

2.2. Distribution of Estimated b Under Above Assumptions

As u is Gaussian,

\displaystyle b = \beta + (X^TX)^{-1}X^Tu. \ \ \ \ \ (21)

implies that b is also Gaussian.

\displaystyle b \sim N(\beta, \sigma^2(X^TX)^{-1}). \ \ \ \ \ (22)

2.3. Properties of Estimated Sample Variance Under Above Assumptions

The OLS estimate of variance of u, {\sigma^2} is given by:

\displaystyle s^2 = RSS / (T - k) = {\hat u}^T\hat u / (T - k) = u^TM_X^TM_Xu / (T - k) = u^TM_Xu / (T - k). \ \ \ \ \ (23)

Since {M_X} is a projection matrix and is symmetric and idempotent, it can be written as:

\displaystyle M_X = P\Lambda P^T. \ \ \ \ \ (24)

where

\displaystyle P P^T = I_T. \ \ \ \ \ (25)

and {\Lambda} is a diagonal matrix with eigenvalues of {M_X} on the diagonal.

Since,

\displaystyle M_XX = 0. \ \ \ \ \ (26)

that is, since the two spaces that they represent are orthogonal to each other, it follows that:

\displaystyle M_Xv = 0. \ \ \ \ \ (27)

whenever v is a column of X. Since we assume X to be of full rank, there are k such vectors and their eigenvalue is the right hand side, which is 0.

Also since

\displaystyle M_X = I_T - X(X^TX)^{-1}X^T. \ \ \ \ \ (28)

Thus, it follows that

\displaystyle M_Xv = v. \ \ \ \ \ (29)

whenever v is orthogonal to X. Since there are (T – k) such vectors, {M_X} has (T – k) eigenvectors with eigenvalue 1.

Thus {\Lambda} has k zeroes and (T – k) 1s on the diagonal.

\displaystyle u^TM_Xu = u^TP\Lambda P^Tu. \ \ \ \ \ (30)

Let

\displaystyle w = P^Tu \ \ \ \ \ (31)

Then,

\displaystyle u^TM_Xu = u^TP\Lambda P^Tu = w^T\Lambda w = w_1^2 \lambda_1 + w_2^2 \lambda_2 + \dots + w_T^2 \lambda_T. \ \ \ \ \ (32)

\displaystyle u^TM_Xu = w_1^2 \lambda_1 + w_2^2 \lambda_2 + \dots + w_{T - k}^2 \lambda_{T - k}. \ \ \ \ \ (33)

As these {\lambda}s are all unity, we have:

\displaystyle u^TM_Xu = w_1^2 + w_2^2 + \dots + w_{T - k}^2 . \ \ \ \ \ (34)

Also,

\displaystyle \mathop{\mathbb E}(w^Tw) = \mathop{\mathbb E}(P^Tu u^T P) = \sigma^2I_T. \ \ \ \ \ (35)

Thus, elements of w are uncorrelated with each other, have mean 0 and variance {\sigma^2}.

Since each w has expectation of {\sigma^2},

\displaystyle \mathop{\mathbb E}(u^TM_Xu) = (T - k)\sigma^2. \ \ \ \ \ (36)

Hence,

\displaystyle \mathop{\mathbb E}(s^2) = \sigma^2. \ \ \ \ \ (37)

2.4. Distribution of Estimated Sample Variance Under Above Assumptions

Since

\displaystyle w = P^Tu \ \ \ \ \ (38)

when u is Gaussian, w is also Gaussian.

Then,

\displaystyle u^TM_Xu = w_1^2 \lambda_1 + w_2^2 \lambda_2 + \dots + w_{T - k}^2 \lambda_{T - k}. \ \ \ \ \ (39)

implies that {u^TM_Xu} is the sum of squares of (T – k) independent {N(0, \sigma^2)} random variables.

Thus,

\displaystyle RSS^2 / \sigma^2 = u^TM_Xu / \sigma^2 \sim \chi^2 ( T - k). \ \ \ \ \ (40)

Also, b and {\hat u} are uncorrelated, since,

\displaystyle \mathop{\mathbb E}[\hat u(b - \beta)^T] = \mathop{\mathbb E}[M_Xu u^T X (X^TX)^{-1} = 0. \ \ \ \ \ (41)

Since b and {\hat u} are independent, b and {s^2} are also independent.

2.5. t Tests about {\beta} Under Above Assumptions

We wish to test the hypothesis that the ith element of {\beta}, {\beta_i}, is some particular value {\beta_i^0}.

The t-statistic for testing this null hypothesis is

\displaystyle t = \frac{b_i - \beta_i^0}{\hat \sigma_{b_i}} = \frac{b_i - \beta_i^0}{s (\xi^{ii})^2} \ \ \ \ \ (42)

where { \xi^{ii}} denotes the ith column and ith row element of {(X^TX)^{-1}} and {\hat \sigma_{b_i}} is the standard error of the OLS estimate of the ith coefficient.

Under the null hypothesis,

\displaystyle b_i \sim N(\beta_i^0, \sigma^2 \xi^{ii}). \ \ \ \ \ (43)

Thus,

\displaystyle \frac{b_i - \beta_i^0}{\sqrt{\sigma^2 \xi^{ii}}} \sim N(0, 1). \ \ \ \ \ (44)

Thus,

\displaystyle t = \frac{{(b_i - \beta_i^0)} / {\sqrt{\sigma^2 \xi^{ii}}}}{\sqrt{s^2 / \sigma^2 }} \ \ \ \ \ (45)

Thus the numerator is N(0, 1) and the denominator is the square root of a chi-square distribution with (T – k) degrees of freedom. This gives a t-distribution to the variable on the left side.

2.6. F Tests about {\beta} Under Above Assumptions

To generalize what we did for t tests, consider that we have a matrix {R} that represents the restrictions we want to impose on {\beta}, that is {R\beta} gives a vector of the hypothesis that we want to test. Thus,

\displaystyle H_0 \colon R\beta = r \ \ \ \ \ (46)

Since,

\displaystyle b \sim N(\beta, \sigma^2(X^TX)^{-1}). \ \ \ \ \ (47)

Thus, under {H_0},

\displaystyle Rb \sim N(r, \sigma^2R(X^TX)^{-1}R^T). \ \ \ \ \ (48)

Theorem 1 If {z} is a {(n \times 1)} vector with {z \sim N(0, \Sigma^2)} and non singular {\Sigma}, then {z^T\Sigma^{-1} z \sim \chi^2(n)}.

 

Applying the above theorem to the {Rb - r} vector, we have,

\displaystyle (Rb - r)^T (\sigma^2R(X^TX)^{-1}R^T)^{-1}(Rb - r) \sim \chi^2 (m). \ \ \ \ \ (49)

Now consider,

\displaystyle F = (Rb - r)^T (s^2R(X^TX)^{-1}R^T)^{-1}(Rb - r) / m. \ \ \ \ \ (50)

where sigma has been replaced with the sample estimate s.

Thus,

\displaystyle F = \frac{[(Rb - r)^T (\sigma^2R(X^TX)^{-1}R^T)^{-1}(Rb - r)] / m}{[RSS / (T - k)]/ \sigma^2} \ \ \ \ \ (51)

In the above, the numerator is a {\chi^2(m)} distribution divided by its degree of freedom and the denominator is a {\chi^2(T - k)} distribution divided by its degree of freedom. Since b and {\hat u} are independent, the numerator and denominator are also independent of each other.

Hence, the variable on the left hand side has an exact {F(m , T - k)} distribution under {H_0}.