Computing Standard Errors
The Standard Error of the Method of Moments Estimator
We have used the method of moments to obtain an estimator \(\hat{\theta}_n\) for the parameter of interest \(\theta\), based on an IID random sample \(X_1, \ldots, X_n\) from the distribution. Once we have an estimator, we would like to know how stable or reliable it is, that is, if we draw another sample, how much would the value of the estimator change? To understand this, we usually need to derive the sampling distribution of the estimator (not easy, unless we are dealing with the sample mean), or approximate its sampling distribution. The standard deviation of the sampling distribution is called the standard error of the estimator, and we need to derive or approximate this standard error.
It might be that the sampling distribution is explicitly of a functional form depending on the parameter values. We can just use our estimates and get an estimate for the standard error. If this functional form is too complicated to write down explicitly, we could simulate it, that is, use the bootstrap method to approximate the sampling distribution of the estimator.
Note that the WLLN implies that the sample moments converge in probability to the true (population) moments. This means that if the functions relating the estimator \(\hat{\theta}\) to the sample moments is continuous, the estimator will converge to the parameter as the sample moments converge to the population moments.
Computing the Standard Error of \(\hat{\theta}\)
We have three possible cases when we want to compute \(SE(\hat{\theta})\):
- \(\hat{\theta}\) is a linear function of the first sample moment \(\hat{\mu}_1\), in which case we can compute it directly.
- \(\hat{\theta}\) is a nonlinear function of \(\hat{\mu}_1\). In this case we will use the univariate delta method, that we will describe shortly.
- \(\hat{\theta}\) is more complex, perhaps uses more moments, and in this case we could use the bootstrap.
Case 1: \(\hat{\theta}\) is a linear function of \(\hat{\mu}_1\)
We will do this with an example. Consider an IID random sample from the Poisson distribution, whose rate \(\lambda\) is unknown: \[ X_1, \ldots, X_n \overset{IID}{\sim} Poisson(\lambda) \Rightarrow E(X) = Var(X) =\lambda \] The method of moments estimator of \(\lambda\) is therefore just \(\hat{\lambda}_n = \hat{\mu}_1 = \overline{X}_n\).
This implies that we can use the Central Limit Theorem to approximate the distribution of \(\hat{\lambda}_n\).
Note that \(SE(\hat{\lambda}_n) = SE( \overline{X}_n) = \dfrac{\sigma}{\sqrt{n}} = \sqrt{\dfrac{\lambda}{n}}.\)
This tells us that the sampling distribution of \(\hat{\lambda}_n\) becomes more concentrated about the true value \(\lambda\) as \(n\) gets large. In order to get an idea of the standard error, we can just substitute \(\hat{\lambda}_n\) for \(\lambda\), and therefore, our estimated standard error of \(\hat{\lambda}_n\) is given by \[ s_{\hat{\lambda}_n} = \sqrt{\dfrac{\hat{\lambda}_n}{n}}. \]
Note that actually what is true is that \[ \sigma_{\hat{\lambda}_n} = \sqrt{\dfrac{\lambda}{n}}. \]
But because our estimator is consistent, we know that \(\hat{\mu}_1 \rightarrow \mu = \lambda\). Since \(\sqrt{\dfrac{\hat{\lambda}_n}{n}}\) is a continuous function of \(\hat{\lambda}_n\), we will have
\[ \sqrt{\dfrac{\hat{\lambda}_n}{n}} \rightarrow \sqrt{\dfrac{\lambda}{n}}. \]
Case 2: \(\hat{\theta}\) is a nonlinear function of \(\hat{\mu}_1\)
For example, consider the following situation: \(\displaystyle X_1, \ldots, X_n \overset{i.i.d.}\sim Exp(\lambda) \Rightarrow E(X) = \frac{1}{\lambda} \Rightarrow \hat\lambda = \frac{1}{\overline{X}_n},\) using the method of moments.
This type of situation is common in statistics. We have a random variable \(X\) that we know the mean and variance of, or in the case above, we could use the CLT on \(\overline{X}_n\). But we are actually interested in different random variable, \(Y = g(X)\), and we want the mean and variance of \(Y\). Above, the function \(g\) is the reciprocal of the sample mean.
The problem is that \(g\) is not a linear function, so \(E\left(g(X)\right) \ne g\left(E(X)\right)\).
We could use the bootstrap, but we are going to discuss a different method called the delta method or the propagation of error formula.1 We use a Taylor series expansion of \(g\) about \(\mu_X\) and linearize \(Y = g(X)\), and similarly we can use a linear function of \(\overline{X}_n\) that will be close for large \(n\).