row
Alex Towell (atowell@siue.edu)
Part 1
Problem 1.1
Suppose that [\(\bm{Z} \coloneqq (Z_1,Z_2,Z_3)'\)]{.math .inline} is a random vector with a mean vector [\(\bm{\mu} = \operatorname{E}(\bm{Z}) = (0,1,-1)'\)]{.math .inline} and a variance-covariance matrix [\[ \bm{\Sigma} = \operatorname{Var}(\bm{Z}) = \begin{pmatrix} 1 & -0.5 & 0 \\ -0.5 & 2 & 1.5 \\ 0 & 1.5 & 3 \\ \end{pmatrix}. \]]{.math .display}
Part (a)
First, given a matrix [\(\bm{A}\)]{.math .inline}, we denote the [\((i,j)\)]{.math .inline}-th element by [\(A_{i j}\)]{.math .inline}. If [\(\bm{A}\)]{.math .inline} is a vector, we simplify this notation and denote the [\(i\)]{.math .inline}-th element by [\(A_i\)]{.math .inline}.
The expectation of [\(Z_1 - 3 Z_2 - 2 Z_3\)]{.math .inline} is given by [\[\begin{align*} \operatorname{E}(Z_1 - 3 Z_2 - 2 Z_3) &= \operatorname{E}(Z_1) - \operatorname{E}(3 Z_2) - \operatorname{E}(2 Z_3)\\ &= \operatorname{E}(Z_1) - 3 \operatorname{E}(Z_2) - 2 \operatorname{E}(Z_3)\\ &= \mu_1 - 3 \mu_2 - 2 \mu_3. \end{align*}\]]{.math .display}
It is given that [\(\bm{\mu} = (0,1,-1)\)]{.math .inline} and thus [\[\begin{align*} \operatorname{E}(Z_1 - 3 Z_2 - 2 Z_3) &= 0 - 3(1) - 2(-1)\\ &= 0 - 3 + 2\\ &= -1. \end{align*}\]]{.math .display}
Part (b)
We use the theorems [\[ \operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y) + 2 \operatorname{Cov}(X,Y), \]]{.math .display} and [\[ \operatorname{Cov}(a X, b Y) = a b \operatorname{Cov}(X,Y). \]]{.math .display} The variance of [\(2 Z_1 + Z_3\)]{.math .inline} is given by [\[\begin{align*} \operatorname{Var}(2 Z_1 + Z_3) &= \operatorname{Var}(2 Z_1) + \operatorname{Var}(Z_3) + 2 \operatorname{Cov}(2 Z_1,Z_3)\\ &= 2^2 \operatorname{Var}(Z_1) + \operatorname{Var}(Z_3) + 2 \left(2 \operatorname{Cov}(Z_1,Z_3)\right)\\ &= 4 \Sigma_{1 1} + \Sigma_{3 3} + 4 \Sigma_{1 3}\\ &= 4(1) + (3) + 4(0)\\ &= 7. \end{align*}\]]{.math .display}
Part (c)
We use the computational variance theorem [\(\operatorname{Cov}(X,Y) = \operatorname{E}(XY) - \operatorname{E}(X)\operatorname{E}(Y)\)]{.math .inline}, which also means [\(\operatorname{E}(XY) = \operatorname{Cov}(X,Y) + \operatorname{E}(X)\operatorname{E}(Y)\)]{.math .inline}.
The covariance of [\(3 Z_1 - Z_2\)]{.math .inline} and [\(Z_2 + 2 Z_3\)]{.math .inline} is given by [\[\begin{align*} \operatorname{Cov}(3 Z_1 - Z_2, Z_2 + 2 Z_3) &= \operatorname{E}\left[(3 Z_1 - Z_2)(Z_2 + 2 Z_3)\right] - \operatorname{E}(3 Z_1 - Z_2)\operatorname{E}(Z_2 + 2 Z_3)\\ &= \operatorname{E}(3 Z_1 Z_2 + 6 Z_1 Z_3 - Z_2^2 - 2 Z_2 Z_3) - (3 \operatorname{E}(Z_1) - \operatorname{E}(Z_2))(\operatorname{E}(Z_2) - 2 \operatorname{E}(Z_3))\\ &= 3 \operatorname{E}(Z_1 Z_2) + 6\operatorname{E}(Z_1 Z_3) - \operatorname{E}(Z_2^2) - 2 \operatorname{E}(Z_2 Z_3) - (3 \mu_1 - \mu_2)(\mu_2 - 2 \mu_3). \end{align*}\]]{.math .display}
Since [\(\operatorname{E}(Z_i Z_j) = \Sigma_{i j} + \mu_i \mu_j\)]{.math .inline}, we may rewrite the above as [\[ \begin{split} \operatorname{Cov}(3 Z_1 - Z_2, Z_2 + 2 Z_3) = 3 (\Sigma_{1 2} + \mu_1 \mu_2) & + 6(\Sigma_{1 3} + \mu_1 \mu_3) - (\Sigma_{2 2} + \mu_2^2) - 2 (\Sigma_{2 3} + \mu_2 \mu_3)\\ &- (3 \mu_1 - \mu_2)(\mu_2 - 2 \mu_3). \end{split} \]]{.math .display} We are given the values of [\(\Sigma_{i j}\)]{.math .inline} and [\(\mu_i\)]{.math .inline} for [\(i,j \in \{1,2,3\}\)]{.math .inline}. We may rewrite the above by making these substitutions, [\[ \begin{split} \operatorname{Cov}(3 Z_1 - Z_2, Z_2 + 2 Z_3) = 3 (-0.5 + 0 \cdot 1) & + 6(0 + 0\cdot(-1)) - (2 + 1^2) - \\ &2 (1.5 + 1 (-1)) - (3 \cdot 0 - 1)(1 - 2 (-1)). \end{split} \]]{.math .display} The final calculation results in [\[ \operatorname{Cov}(3 Z_1 - Z_2, Z_2 + 2 Z_3) = -3/2 - 3 - 1 + 3 = -3/2 - 2/2 = -5/2. \]]{.math .display}
Problem 1.2
Let [\(\{e_t\}\)]{.math .inline} be a normal white noise process with mean zero and variance [\(\sigma^2\)]{.math .inline}. Consider the process [\(Y_t = e_t e_{t-1}\)]{.math .inline}.
Part (a)
The expectation of [\(Y_t\)]{.math .inline} is given by [\[ \operatorname{E}(Y_t) = \operatorname{E}(e_t e_{t-1}). \]]{.math .display} By independence, [\(\operatorname{E}(Y_t)\)]{.math .inline} may be rewritten as [\[ \operatorname{E}(Y_t) = \operatorname{E}(e_t) \operatorname{E}(e_{t-1}) = 0. \]]{.math .display}
The variance of [\(Y_t\)]{.math .inline} is given by [\[ \operatorname{Var}(Y_t) = \operatorname{E}(Y_t^2) - \operatorname{E}^2(Y_t). \]]{.math .display} We showed that [\(\operatorname{E}(Y_t) = 0\)]{.math .inline}, thus the variance of [\(Y_t\)]{.math .inline} may be rewritten as [\[ \operatorname{Var}(Y_t) = \operatorname{E}((e_t e_{t-1})^2) = \operatorname{E}(e_t^2 e_{t-1}^2). \]]{.math .display} Since [\(e_t\)]{.math .inline} and [\(e_{t-1}\)]{.math .inline} are independent, [\(e_t^2\)]{.math .inline} and [\(e_{t-1}^2\)]{.math .inline} are independent, and thus the variance of [\(Y_t\)]{.math .inline} may be rewritten as [\[ \operatorname{Var}(Y_t) = \operatorname{E}(e_t^2) \operatorname{E}(e_{t-1}^2) = (\sigma^2 + 0)(\sigma^2 + 0) = \sigma^4. \]]{.math .display} We observe that [\(\{Y_t\}\)]{.math .inline} has constant mean [\(0\)]{.math .inline} and constant variance [\(\sigma^4\)]{.math .inline}.
Part (b)
The autocovariance of [\(Y_t\)]{.math .inline} and [\(Y_{t-\ell}\)]{.math .inline}, [\(\ell \geq 0\)]{.math .inline}, is given by [\[ \operatorname{Cov}(Y_t,Y_{t-\ell}) = \operatorname{E}(Y_t Y_{t-\ell}) - \operatorname{E}(Y_t) \operatorname{E}(Y_{t-\ell}). \]]{.math .display} We have already shown in part (a) that [\(\operatorname{E}(Y_k) = 0\)]{.math .inline} for any [\(k\)]{.math .inline}, thus [\[ \operatorname{Cov}(Y_t,Y_{t-\ell}) = \operatorname{E}(Y_t Y_{t-\ell}). \]]{.math .display} Substituting the definition of [\(Y_t\)]{.math .inline} and [\(Y_{t-\ell}\)]{.math .inline} into the above covariance gives [\[ \operatorname{Cov}(Y_t,Y_{t-\ell}) = \operatorname{E}(e_t e_{t-1} e_{t-\ell} e_{t - \ell - 1}). \]]{.math .display} We perform a case analysis to derive the autocovariance for different values of [\(\ell\)]{.math .inline}.
If [\(\ell=0\)]{.math .inline}, then [\(\operatorname{Cov}(Y_t,Y_t) = \operatorname{Var}(Y_t) = \sigma^4\)]{.math .inline}.
If [\(\ell=1\)]{.math .inline}, then [\(\operatorname{Cov}(Y_t,Y_{t-1}) = \operatorname{E}(e_t e_{t-1} e_{t-1} e_{t-2})\)]{.math .inline}. Let [\(W \coloneqq e_{t-1}^2\)]{.math .inline}, in which case [\(\operatorname{Cov}(Y_t,Y_{t-1}) = \operatorname{E}(e_t W e_{t-2})\)]{.math .inline}. Since these are independent random variables, [\[ \operatorname{Cov}(Y_t,Y_{t-1}) = \operatorname{E}(e_t) \operatorname{E}(W) \operatorname{E}(e_{t-2}) = 0 \cdot \operatorname{E}(W) \cdot 0 = 0. \]]{.math .display}
If [\(\ell=2\)]{.math .inline}, then [\[ \operatorname{Cov}(Y_t,Y_{t-2}) = \operatorname{E}(e_t e_{t-1} e_{t-2} e_{t-3}). \]]{.math .display} Since these are independent random variables, [\[ \operatorname{Cov}(Y_t,Y_{t-1}) = \operatorname{E}(e_t) \operatorname{E}(e_{t-1}) \operatorname{E}(e_{t-2}) \operatorname{E}(e_{t-3}) = 0. \]]{.math .display} Any [\(\ell \geq 2\)]{.math .inline} resutls in two elements of [\(\{Y_t\}\)]{.math .inline} that have no random noise elements in common and thus also have a covariance of zero.
Since the autocovariance function is symmetric about [\(t\)]{.math .inline}, [\(\operatorname{Cov}(Y_t,Y_{t+\ell}) = \operatorname{Cov}(Y_t,Y_{t-\ell})\)]{.math .inline}, and thus we see that the autocovariance function is strictly a function of the lag [\(\ell\)]{.math .inline}.
We reparameterize the autocovariance function [\(\gamma\)]{.math .inline} with respect to [\(\ell\)]{.math .inline}, [\[ \gamma_\ell = \begin{cases} \sigma^4 & \ell = 0\\ 0 & \ell \neq 0. \end{cases} \]]{.math .display} The autocorrelation function ls[\(\rho_\ell\)]{.math .inline} is defined as [\(\gamma_\ell / \gamma_0\)]{.math .inline}, thus [\[ \rho_\ell = \begin{cases} 1 & \ell = 0\\ 0 & \ell \neq 0. \end{cases} \]]{.math .display}
Part (c)
The process is weakly stationary. It is weakly stationary because its mean is a constant [\(0\)]{.math .inline}, its variance is a constant [\(\sigma^4\)]{.math .inline}, and its autocorrelation function is strictly a function of lag [\(\ell\)]{.math .inline}.
Problem 1.3
First, we find an estimator [\(\hat\beta_1\)]{.math .inline} given a random sample [\(\{Y_i\}\)]{.math .inline} generated from the model [\[ Y_i = \beta_1 x_{i 1} + \beta_2 x_{i 2} + \epsilon_i, \]]{.math .display} except we incorrectly or approximately assume [\(Y_i = \beta_1 x_{i 1} + \epsilon_i\)]{.math .inline}. The true statistical error of the [\(i\)]{.math .inline}-th random variable [\(Y_i\)]{.math .inline} is given by [\[ \epsilon_i = Y_i - \beta_1 x_{i 1} - \beta_2 x_{i 2} \]]{.math .display} and we are interested in minimizing the sum of the squares of the statistical errors [\[ \operatorname{L} = \sum_{i=1}^{n} \epsilon_i^2. \]]{.math .display} We (incorrectly or approximately) assume [\(\epsilon_i = Y_i - \beta_1 x_{i 1}\)]{.math .inline} and parameterize [\(\operatorname{L}\)]{.math .inline} with respect to [\(\beta_1\)]{.math .inline}, resulting in the function [\[ \operatorname{L}(\beta_1) = \sum_{i=1}^n (Y_i - \beta_1 x_{i 1})^2. \]]{.math .display} We obtain an estimator for [\(\beta_1\)]{.math .inline} by solving for [\(\hat{\beta_1}\)]{.math .inline} in [\[ \left. \frac{\partial \operatorname{L}}{\partial \beta_1} \right\vert_{\hat{\beta_1}}= 0. \]]{.math .display} Thus, [\[\begin{align*} -2 \sum_i (Y_i - \hat\beta_1 x_{i 1}) x_{i 1} &= 0\\ \sum_i \left(Y_i x_{i 1} - \hat\beta_1 x_{i 1}^2 \right) &= 0\\ \hat\beta_1 \sum_i x_{i 1}^2 &= \sum_i Y_i x_{i 1} \end{align*}\]]{.math .display} which finally simplifies to [\[ \hat\beta_1 = \frac{\sum_{i=1}^n Y_i x_{i 1}}{\sum_{i=1}^n x_{i 1}^2} \]]{.math .display}
We are interested in the bais of [\(\hat\beta_1\)]{.math .inline}, denoted by [\[ \operatorname{b}(\hat\beta_1) \coloneqq \operatorname{E}(\hat\beta_1 - \beta_1). \]]{.math .display} By the linearity of expectation, the bias may be rewritten as [\[ \operatorname{b}(\hat\beta_1) = \operatorname{E}(\hat\beta_1) - \beta_1. \]]{.math .display} The expectation of [\(\hat\beta_1\)]{.math .inline} is given by [\[\begin{align*} \operatorname{E}(\hat\beta_1) &= \operatorname{E}\left(\frac{\sum_i Y_i x_{i 1}}{\sum_i x_{i 1}^2}\right)\\ &= \frac{\operatorname{E}\left(\sum_i Y_i x_{i 1}\right)}{\sum_i x_{i 1}^2}\\ &= \frac{\sum_i \operatorname{E}(Y_i x_{i 1})}{\sum_i x_{i 1}^2}\\ &= \frac{\sum_i x_{i 1} \operatorname{E}(Y_i) }{\sum_i x_{i 1}^2}. \end{align*}\]]{.math .display} The true model of [\(\{Y_i\}\)]{.math .inline} is given by [\(Y_i = \beta_1 x_{i 1} + \beta_2 x_{i 2} + \epsilon_i\)]{.math .inline}, so [\[\begin{align*} \operatorname{E}(\hat\beta_1) &= \frac{\sum_i x_{i 1} \operatorname{E}(\beta_1 x_{i 1} + \beta_2 x_{i 2} + \epsilon_i)}{\sum_i x_{i 1}^2}\\ &= \frac{\sum_i x_{i 1} (\beta_1 x_{i 1} + \beta_2 x_{i 2})}{\sum_i x_{i 1}^2}\\ &= \frac{\sum_i \beta_1 x_{i 1}^2 + \beta_2 x_{i 1} x_{i 2}}{\sum_i x_{i 1}^2}\\ &= \frac{\sum_i \beta_1 x_{i 1}^2}{\sum_i x_{i 1}^2} + \frac{\sum_i \beta_2 x_{i 1} x_{i 2}}{\sum_i x_{i 1}^2}\\ &= \beta_1 \frac{\sum_i x_{i 1}^2}{\sum_i x_{i 1}^2} + \beta_2 \frac{\sum_i x_{i 1} x_{i 2}}{\sum_i x_{i 1}^2}\\ &= \beta_1 + \beta_2 \frac{\sum_i x_{i 1} x_{i 2}}{\sum_i x_{i 1}^2} \end{align*}\]]{.math .display} The bias is defined as [\(\operatorname{b}(\hat\beta_1) = \operatorname{E}(\hat\beta_1) - \beta_1\)]{.math .inline}, thus [\[ \operatorname{b}(\hat\beta_1) = \beta_2 \frac{\sum_{i=1}^n x_{i 1} x_{i 2}}{\sum_{i=1}^n x_{i 1}^2}. \]]{.math .display}
The bias of [\(\hat\beta_1\)]{.math .inline} is a function of the observed values [\(\{x_{i 1}\}\)]{.math .inline} and [\(\{x_{i 2}\}\)]{.math .inline} and the magnitude of [\(\beta_2\)]{.math .inline}.
Problem 1.4
Consider the simple linear regression model [\(y = \beta_0 + \beta_1 x + \epsilon\)]{.math .inline}, where [\(\beta_0\)]{.math .inline} is known. [\(\epsilon\)]{.math .inline}’s are i.i.d. with mean [\(0\)]{.math .inline} and variance [\(\sigma^2\)]{.math .inline}.
Part (a)
Since [\(\beta_0\)]{.math .inline} is known, we only need to find an estimator for [\(\beta_1\)]{.math .inline}. The least-squares estimator of [\(\beta_1\)]{.math .inline} is defined as
[\[ \hat\beta_1 = {\mathop{\mathrm{argmin}}\limits}_{\beta_1} \operatorname{L}(\beta_1 | \beta_0) \]]{.math .display} where [\(\operatorname{L}(\beta_1 | \beta_0) = \sum_{i=1}^n \left(Y_i - \beta_0 - \beta_1 x_i\right)^2\)]{.math .inline}. We obtain the estimator by solving for [\(\hat\beta_1\)]{.math .inline} in [\[ \left. \frac{\partial \operatorname{L}}{\partial \beta_1} \right\vert_{\hat\beta_1} = 0. \]]{.math .display}
Thus, [\[\begin{align*} -2 \sum_{i=1}^n (Y_i - \beta_0 - \hat\beta_1 x_i)x_i &= 0\\ \sum_{i=1}^n \left(x_i Y_i - \beta_0 x_i - \hat\beta_1 x_i^2\right) &= 0\\ \sum_{i=1}^n x_i Y_i - \beta_0 \sum_{i=1}^n x_i - \hat\beta_1 \sum_{i=1}^n x_i^2 &= 0\\ \hat\beta_1 \sum_{i=1}^n x_i^2 &= \sum_{i=1}^n x_i Y_i - \beta_0 \sum_{i=1}^n x_i, \end{align*}\]]{.math .display} which finally simplifies to [\[ \hat\beta_1 = \frac{\sum_{i=1}^n x_i Y_i - \beta_0 \sum_{i=1}^n x_i}{\sum_{i=1}^n x_i^2}. \]]{.math .display}
Of course, when [\(\{Y_i\} = \{y_i\}\)]{.math .inline}, [\(\hat{\beta_1}\)]{.math .inline} realizes the particular vlaue [\[ \hat\beta_1 = \frac{\sum_{i=1}^n x_i y_i - \beta_0 \sum_{i=1}^n x_i}{\sum_{i=1}^n x_i^2}. \]]{.math .display}
The expectation of [\(\hat\beta_1\)]{.math .inline} is given by [\[\begin{align*} \operatorname{E}(\hat\beta_1) &= \operatorname{E}{\frac{\sum_i x_i Y_i - \beta_0 \sum_i x_i}{\sum_i x_i^2}}\\ &= \left(\sum_i x_i^2\right)^{-1}\operatorname{E}\left(\sum_i x_i Y_i - \beta_0 \sum_i x_i\right)\\ &= \left(\sum_i x_i^2\right)^{-1}\left(\sum_i x_i \operatorname{E}(\beta_0 + \beta_1 x_i + \epsilon_i) - \beta_0 \sum_i x_i\right)\\ &= \left(\sum_i x_i^2\right)^{-1}\left(\beta_0 \sum_i x_i + \beta_1 \sum_i x_i^2 - \beta_0 \sum_i x_i\right)\\ &= \beta_1 \left(\sum_i x_i^2\right)^{-1}\left(\sum_i x_i^2\right)\\ &= \beta_1, \end{align*}\]]{.math .display} which shows that it is unbiased (as expected).
Part (b)
To construct the confidence interval, we must find the standard deviation of [\(\hat\beta_1\)]{.math .inline}. The variance is given by [\[\begin{align*} \operatorname{Var}(\hat\beta_1) &= \operatorname{Var}\left(\frac{\sum_{i=1}^n x_i Y_i - \beta_0 \sum_{i=1}^n x_i}{\sum_{i=1}^n x_i^2}\right)\\ &= \left( \sum_{i=1}^n x_i^2 \right)^{-2} \operatorname{Var}\left( \sum_{i=1}^n x_i Y_i - \beta_0 \sum_{i=1}^n x_i \right) \end{align*}\]]{.math .display}
Since [\(\beta_0\)]{.math .inline} and [\(x_i\)]{.math .inline} for [\(i=1,\ldots,n\)]{.math .inline} are constants, the above simplies to
[\[\begin{align*} \operatorname{Var}(\hat\beta_1) &= \left(\sum_{i=1}^n x_i^2\right)^{-2} \operatorname{Var}\left(\sum_{i=1}^n x_i Y_i\right)\\ &= \left(\sum_{i=1}^n x_i^2\right)^{-2} \sum_{i=1}^n \operatorname{Var}(x_i Y_i)\\ &= \left(\sum_{i=1}^n x_i^2\right)^{-2} \sum_{i=1}^n x_i^2 \operatorname{Var}(Y_i)\\ &= \left(\sum_{i=1}^n x_i^2\right)^{-2} \sum_{i=1}^n x_i^2 \operatorname{Var}(\beta_0 + \beta_1 x_i + \epsilon_i)\\ &= \left(\sum_{i=1}^n x_i^2\right)^{-2} \sum_{i=1}^n x_i^2 \operatorname{Var}(\epsilon_i)\\ &= \left(\sum_{i=1}^n x_i^2\right)^{-2} \sum_{i=1}^n x_i^2 \sigma^2\\ &= \sigma^2 \left(\sum_{i=1}^n x_i^2\right)^{-2} \sum_{i=1}^n x_i^2 \end{align*}\]]{.math .display} which finally yields the result [\[ \operatorname{Var}(\hat\beta_1) = \frac{\sigma^2}{\sum_{i=1}^n x_i^2}. \]]{.math .display} The standard error is therefore [\[ \operatorname{SE}(\hat\beta_1) = \frac{\sigma}{\sqrt{\sum_{i=1}^n x_i^2}}. \]]{.math .display}
Since [\(\hat\beta_1\)]{.math .inline} is a linear combination of standard normal deviates, [\(\hat\beta_1\)]{.math .inline} is normally distributed with a mean [\(\beta_1\)]{.math .inline} and a variance [\(\operatorname{Var}(\hat\beta_1)\)]{.math .inline}. Thus, a [\(100(1-\alpha)\%\)]{.math .inline} confidence interval for [\(\beta_1\)]{.math .inline} is [\[ \beta_1 \in \left[\hat\beta_1 - z_{\alpha/2} \operatorname{SE}(\hat\beta_1),\hat\beta_1 + z_{\alpha/2} \operatorname{SE}(\hat\beta_1)\right] \]]{.math .display} or, substituting the expression for the standard deviation, [\[ \beta_1 \in \left[\hat\beta_1 - \frac{z_{\alpha/2} \sigma}{\sqrt{\sum_i x_i^2}}, \hat\beta_1 + \frac{z_{\alpha/2} \sigma}{\sqrt{\sum_i x_i^2}}\right]. \]]{.math .display}
To compare this estimator with the estimator for unknown [\(\bm{\beta }= (\beta_0, \beta_1)\)]{.math .inline}, we choose to use the matrix equations for this part as a demonstration of a simpler alternative approach. The unbiased least-squares estimator of [\(\bm{\beta}\)]{.math .inline} is given by [\[ \hat{\bm{\beta}}_f = (\bm{X}' \bm{X})^{-1} \bm{X}' \bm{Y}. \]]{.math .display} where [\[ \bm{X} = \begin{pmatrix} 1 & x_1\\ 1 & x_2\\ \vdots & \vdots\\ 1 & x_n\\ \end{pmatrix} \]]{.math .display} and [\[ \bm{Y} = \begin{pmatrix} Y_1\\ \vdots\\ Y_n\\ \end{pmatrix}. \]]{.math .display} Performing the matrix calculations, we see that [\[ \left(\bm{X}' \bm{X}\right) =\ \begin{pmatrix} n & \sum_i x_i\\ \sum_i x_i & \sum_i x_i^2 \end{pmatrix} \]]{.math .display} and [\[ \left(\bm{X}' \bm{X}\right)^{-1} =\ \frac{1}{n \sum_i x_i^2 - \left(\sum_i x_i\right)^2} \begin{pmatrix} \sum_i x_i^2 & -\sum_i x_i\\ -\sum_i x_i & n \end{pmatrix} \]]{.math .display} The variance-covariance matrix of [\(\hat\beta_f\)]{.math .inline} is therefore [\[ \bm{\Sigma} = \operatorname{Var}(\hat\beta_f) = \sigma^2 \left(\bm{X}' \bm{X}\right)^{-1} \]]{.math .display} and therefore the variance of [\(\hat\beta_{1,f}\)]{.math .inline} is given by [\[\begin{align*} \operatorname{Var}(\hat\beta_{1,f}) &= \sigma^2 \Sigma_{2 2}\\ &= \frac{n \sigma^2}{n \sum_i x_i^2 - \left(\sum_i x_i\right)^2}\\ &= \frac{\sigma^2}{\sum_i x_i^2 - \frac{1}{n}\left(\sum_i x_i\right)^2}\\ \end{align*}\]]{.math .display}
Recall that [\(\operatorname{Var}(\hat\beta_1) = \frac{\sigma^2}{\sum_i x_i^2}\)]{.math .inline}. The ratio of [\(\operatorname{Var}(\hat\beta_1)\)]{.math .inline} to [\(\operatorname{Var}(\hat\beta_{1,f})\)]{.math .inline} is therefore [\[\begin{align*} w &= \frac{\sum_i x_i^2 - \frac{1}{n}\left(\sum_i x_i\right)^2}{\sum_i x_i^2}\\ &= 1 - \frac{\left(\sum_i x_i\right)^2}{\sum_i x_i^2}\\ &= 1 - \frac{n \bar{x}^2}{\overline{x^2}}, \end{align*}\]]{.math .display} which implies [\(0 < w < 1\)]{.math .inline}, i.e., [\(\operatorname{Var}(\hat\beta_{1,f})\)]{.math .inline} is larger than [\(\operatorname{Var}(\hat\beta_1)\)]{.math .inline}. In particular, since [\(w = \frac{\operatorname{Var}(\hat\beta_1)}{\operatorname{Var}(\hat\beta_{1,f})}\)]{.math .inline}, [\[ \sigma_{\hat\beta_1} = \sqrt{w} \sigma_{\hat\beta_{1,f}} \]]{.math .display} and thus we may conclude that the confidence interval for [\(\hat\beta_1\)]{.math .inline} is a factor [\(\sqrt{w}\)]{.math .inline} as wide as the confidence interval for [\(\hat\beta_{1,f}\)]{.math .inline}, where [\(\sqrt{w} < 1\)]{.math .inline}.
It was immediately obvious that knowing the value of [\(\beta_0\)]{.math .inline} should reduce the uncertainty of [\(\beta_1\)]{.math .inline} given a sample, but I thought this proof was fairly interesting.
Finally, to answer the question, since a larger variance yields a larger confidence interval, the confidence interval for [\(\hat\beta_1\)]{.math .inline} is narrower than the confidence interval for [\(\hat\beta_{1,f}\)]{.math .inline}.
Part 2
Problem 2.1
The EmployeeData data set gives the number of employees (in thousands) for a metal fabricator and one of their primary vendors for each month over a 5-year period. You may find the data in .txt file on blackboard and read the data into R using read.table command.
Part (a)
emp_data <- read.table("EmployeeData.txt", header=TRUE)
# fit a multiple regression model
ols.fit <- lm(metal~vendor, data=emp_data)
# get details from the regression output
summary(ols.fit)
##
## Call:
## lm(formula = metal ~ vendor, data = emp_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2348 -1.2393 -0.0311 1.0022 3.7077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.847911 3.299962 0.863 0.392
## vendor 0.122442 0.009423 12.994 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.59 on 58 degrees of freedom
## Multiple R-squared: 0.7443, Adjusted R-squared: 0.7399
## F-statistic: 168.8 on 1 and 58 DF, p-value: < 2.2e-16
# get the anova table
anova(ols.fit)
## Analysis of Variance Table
##
## Response: metal
## Df Sum Sq Mean Sq F value Pr(>F)
## vendor 1 426.72 426.72 168.83 < 2.2e-16 ***
## Residuals 58 146.59 2.53
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Part (b)
# vendor and metal seem to be positively correlated.
with(emp_data,plot(vendor,metal))
abline(ols.fit)
{width=“672”}
#plot(vendor,metal)
#lines(vendor,fitted(ols.fit))
Part (c)
Here’s the time-ordering plot of the residuals.
N=nrow(emp_data)
t=1:N
plot(t,ols.fit$residual)
{width=“672”}
acf(ols.fit$residual)
{width=“672”}
The residuals do not appear to be i.i.d. normally distributed around [\(0\)]{.math .inline}.
Part (d)
library("lmtest")
dwtest(ols.fit)
##
## Durbin-Watson test
##
## data: ols.fit
## DW = 0.35924, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0
# with(emp_data,dwtest(metal~vendor))
If the test statistic [\(\rm{DW}\)]{.math .inline} is around [\(2\)]{.math .inline}, there is strong evidence that there is no autocorrelation. In this case, the statistic is quite small, and so we have strong evidence to reject the null hypothesis of no autocorrelation.
Part (e)
# calculte phi fot the Cochrane Method
phi.hat=lm(ols.fit$residual[2:N]~0+ols.fit$residual[1:N-1])$coeff
# transform y and x according to the Cochrane Method
y.trans=emp_data$metal[2:N]-phi.hat*emp_data$metal[1:N-1]
x.trans=emp_data$vendor[2:N]-phi.hat*emp_data$vendor[1:N-1]
# fit OLS regression with transformed data
coch.or=lm(y.trans~x.trans)
summary(coch.or)
##
## Call:
## lm(formula = y.trans ~ x.trans)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1944 -0.4425 0.1461 0.5125 1.2218
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.87560 0.78655 6.199 6.78e-08 ***
## x.trans 0.04795 0.01300 3.688 0.000505 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7342 on 57 degrees of freedom
## Multiple R-squared: 0.1927, Adjusted R-squared: 0.1785
## F-statistic: 13.6 on 1 and 57 DF, p-value: 0.0005054
acf(coch.or$residual)
{width=“672”}
The standard error of the simple linear regression model for the intercept was [\(3.299962\)]{.math .inline}. The standard error for the intercept in the Cochrane-Orcutt model is significantly smaller at [\(0.78655\)]{.math .inline}.
The standard error of the simpler linear regression model for the slope is [\(0.009423\)]{.math .inline}. The standard error for the intercept in the Cochrane-Orcutt model is slightly larger at [\(0.01300\)]{.math .inline}.
The ACF for the Cochrane-Orcutt fit exhibits far less correlation, most of the lag times being within the bands.
Problem 2.2
The following analysis are based on the data in HomePrice.txt file on blackboard. You may read the data into R using read.table command. This HomePrice dataset has the following variables:
Part (a)
hp_data <- read.table("HomePrice.txt", header=TRUE)
colnames(hp_data) = c("t","Y","X1","X2")
# fit a multiple regression model
hp_model <- lm(Y~X1+X2, data=hp_data)
# get details from the regression output
summary(hp_model)
##
## Call:
## lm(formula = Y ~ X1 + X2, data = hp_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -228421 -38178 -5506 25494 383423
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.027e+05 1.265e+04 -8.121 3.39e-15 ***
## X1 1.560e+02 4.871e+00 32.019 < 2e-16 ***
## X2 1.151e+00 2.964e-01 3.882 0.000117 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 78070 on 519 degrees of freedom
## Multiple R-squared: 0.6808, Adjusted R-squared: 0.6796
## F-statistic: 553.5 on 2 and 519 DF, p-value: < 2.2e-16
# get the anova table
anova(hp_model)
## Analysis of Variance Table
##
## Response: Y
## Df Sum Sq Mean Sq F value Pr(>F)
## X1 1 6.6555e+12 6.6555e+12 1091.875 < 2.2e-16 ***
## X2 1 9.1880e+10 9.1880e+10 15.073 0.0001168 ***
## Residuals 519 3.1635e+12 6.0955e+09
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Part (b)
# studentized residuals
r.unweighted = rstudent(hp_model)
# the following plot of the fitted values vs the residuals suggests
# non-constant variance. the variance seems to be increasing with respect
# to y. the residuals do seem to have a zero mean though.
plot(r.unweighted,hp_model$fitted)
{width=“672”}
# variance seems more constant with respect to home price (X1)
plot(r.unweighted,hp_data$X1)
{width=“672”}
plot(r.unweighted,hp_data$X2)
{width=“672”}
The studentized residuals of the regression seems to have a non-costant variance with respect to the fitted values [\(\hat{y}\)]{.math .inline}, [\(X_1\)]{.math .inline}, and [\(X_2\)]{.math .inline}. Primarily, they exhibit the fanning out characteristic that you typically see with non-constant variance.
Part (c)
# the absolute value of the OLS residuals.
abs_residuals = abs(residuals(hp_model))
# we're fitting
# s(i) = gamma0 + gamma1 y(i) + zeta(i)
# where zeta(i) is the random error.
abs_residuals_fit=lm(abs_residuals~hp_model$fitted)
Part (d)
# weighted least squares. we're taking the squared reciprocal of the estimated
# residuals from the regression model as the weight matrix.
wts=1/(fitted(abs_residuals_fit))^2
# fit the weighted regression model to the data
hp_model.weighted=lm(Y~X1+X2, data=hp_data,weights=wts)
anova(hp_model.weighted)
## Analysis of Variance Table
##
## Response: Y
## Df Sum Sq Mean Sq F value Pr(>F)
## X1 1 4095.8 4095.8 1754.0376 <2e-16 ***
## X2 1 0.7 0.7 0.3068 0.5799
## Residuals 519 1211.9 2.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(hp_model.weighted)
##
## Call:
## lm(formula = Y ~ X1 + X2, data = hp_data, weights = wts)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -9.4644 -0.9364 -0.2118 0.6141 8.0706
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8918.7876 3619.3749 -2.464 0.0141 *
## X1 123.1438 3.3186 37.107 <2e-16 ***
## X2 -0.1274 0.2300 -0.554 0.5799
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.528 on 519 degrees of freedom
## Multiple R-squared: 0.7717, Adjusted R-squared: 0.7708
## F-statistic: 877.2 on 2 and 519 DF, p-value: < 2.2e-16
Part (e)
# weighted residual analysis
# studentized residuals
r.weighted = rstudent(hp_model.weighted)
# plot studentized residual vs fitted values
plot(r.weighted,hp_model.weighted$fitted)
{width=“672”}
plot(r.weighted,hp_data$X1)
{width=“672”}
plot(r.weighted,hp_data$X2)
{width=“672”}
The studentized residuals of the weighted regression seems to have a more constant variance with respect to the fitted values [\(\hat{y}\)]{.math .inline}, [\(X_1\)]{.math .inline}, and [\(X_2\)]{.math .inline}. Primarily, they do not exhibit the fanning out characteristic that you typically see with non-constant variance.
In conclusion, the weighted regression seems to generate residuals that are more presentative of random white noise with zero mean and constant variance.