Two sample tests: Power computations

Introduction

The topic of today’s lecture is power. That is, the power of a statistical test to reject a false null. Recall that power is the complement of \(\beta\) which is the probability of a type II error (under a particular alternative distribution, \(H_1\)).

allison-horst-errors ¹

Note that our discussion of power below has nothing to do with the \(t\)-test. We are going to assume that \(\sigma\) is known, and analyze what affects the power of a test, in the case of two independent samples.it is the usual computation with \(\sigma^2\), but using \((\bar{X} - \bar{Y})\) instead of \(\bar{X}\).

What quantities impact power?

We are testing \(H_0 : \mu_X - \mu_Y = 0\) against \(H_1 : \mu_X - \mu_Y \neq 0\).

What quantities affect power?

The distance between \(\mu_X\) and \(\mu_Y\), that is, \(|\mu_X - \mu_Y|\).
The significance level \(\alpha\), or probability of Type I error.
Population variances \(\sigma_1^2\), \(\sigma_2^2\).
The sample sizes \(n\), \(m\).

Power Calculations

Power calculations involve finding the sample size that detects the difference of means (rejects the null) with high probability (i.e., a test that achieves a desirable power).

The procedure is the same as we have seen before: fix the rejection region, and compute its probability under the alternative. This involves finding the distribution of the test statistic under the alternative.

We can fix the power we want, and then solve for the sample sizes \(n\), \(m\) we need, or the minimum detectable effect \(\mu_1 - \mu_2\) instead.

If our sample sizes are large enough, we can do approximate power calculations based on the normal distribution.

Examples of Power Calculations with the Two-Sample \(Z\) Test

Example 1 Assume \(X_i \sim N(\mu_1, \sigma^2)\) and \(Y_i \sim N(\mu_2, \sigma^2)\) with unknown \(\mu_1, \mu_2\) and known \(\sigma^2 = 4\). Suppose that \(n = m\), and we take samples of size \(n\) from both populations to test \(H_0 : \mu_1 = \mu_2\) against \(H_1 : \mu_1 < \mu_2\).

What is the minimum sample size needed from each population to have power of \(0.9\) to detect differences of at least \(1\) in a test with Type I error rate of \(0.01\)?

Solution. We rewrite the hypotheses as: \[ H_0 : \mu_1 - \mu_2 = 0, \qquad H_1 : \mu_1 - \mu_2 = -1. \]

The test statistic (with \(n = m\)) is: \[ Z = \frac{(\bar{X} - \bar{Y}) - (\mu_1 - \mu_2)}{\sigma\sqrt{\dfrac{1}{n} + \dfrac{1}{n}}} = \frac{\bar{X} - \bar{Y}}{\sigma\sqrt{2/n}}. \]

For a one-sided test at level \(\alpha = 0.01\), the rejection region and its probability under \(H_0\) are: \[ \text{Rejection Region (R.R)} = \{Z < -2.326\}, \implies \text{the probability of the R.R.} = P(Z < -2.326 \mid H_0). \]

Under \(H_1\) (where \(\mu_1 - \mu_2 = -1\)), we rewrite the test statistic by adding and subtracting: \[ Z = \frac{\bar{X} - \bar{Y}}{\sigma\sqrt{2/n}} = \underbrace{\frac{\bar{X} - \bar{Y} + 1}{\sigma\sqrt{2/n}}}_{\sim\, N(0,1) \text{ under } H_1} - \frac{1}{\sigma\sqrt{2/n}}. \]

The power is therefore: \[ \text{Power} = P\!\left(Z \in \text{R.R.} \mid H_1\right) = P\!\left(\frac{\bar{X} - \bar{Y} + 1}{\sigma\sqrt{2/n}} < -2.326 + \frac{1}{\sigma\sqrt{2/n}}\right). \]

With \(\sigma = 2\), setting power equal to the desired \(0.9\): \[\Phi\!\left(-2.326 + \frac{1}{2\sqrt{2/n}}\right) = 0.9.\]

Solving for \(n\): \[-2.326 + \frac{1}{2\sqrt{2/n}} = \Phi^{-1}(0.9) = 1.28,\] \[\frac{1}{2\sqrt{2/n}} = 1.280 + 2.326 = 3.606,\] \[2\sqrt{\frac{2}{n}} = \frac{1}{3.606} \implies n = 104.25 \implies n \geq 105.\]

Therefore, we need \(n \geq 105\) to achieve power of at least 90%.

Exercise. What about if \(\sigma^2 = 16\)?

Example 2 Suppose we have two samples, \(X_1, \ldots, X_n \sim \mathcal{N}(\mu_X, \sigma^2)\) and \(Y_1, \ldots, Y_n \sim \mathcal{N}(\mu_Y, \sigma^2)\). The test is: \[ H_0 : \mu_X - \mu_Y = 0, \qquad H_1 : \mu_X - \mu_Y = \Delta > 0. \]

One-sided power diagram: null (blue) and alternative (red) distributions with rejection region shaded

Derive an expression for the power of the test.

Check your answer!

For a one-sided test (rejecting when \(\bar{X} - \bar{Y}\) is large), the power is: \[ \text{Power} = P\!\left(\frac{\bar{X}-\bar{Y}}{\sigma\sqrt{2/n}} > z_\alpha \;\Bigg|\; H_1\right) \]

Standardizing under \(H_1\), where \(\bar{X} - \bar{Y} \sim N\!\left(\Delta,\, \sigma^2 \cdot \tfrac{2}{n}\right)\): \[ = P\!\left(\frac{\bar{X}-\bar{Y}-\Delta}{\sigma\sqrt{2/n}} > z_\alpha - \frac{\Delta}{\sigma\sqrt{2/n}}\right) = 1 - \Phi\!\left(z_\alpha - \frac{\Delta}{\sigma}\sqrt{\frac{n}{2}}\right). \]

For a two-sided test at level \(\alpha\), the power has two terms \(P_1\) and \(P_2\) (right and left critical region contributions). The left-tail term \(P_2\) is negligible when the alternative is well-separated, so: \[ \text{Power} \approx 1 - \Phi\!\left(z_{\alpha/2} - \frac{\Delta}{\sigma}\sqrt{\frac{n}{2}}\right) + \Phi\!\left(-z_{\alpha/2} - \frac{\Delta}{\sigma}\sqrt{\frac{n}{2}}\right) \] (Usually one of the terms is negligible, so we may only consider the dominating term.)

Now suppose we want to detect an effect size of \(\Delta = 2\) with probability \(0.8\) when the variance for both \(X\) and \(Y\) is about \(25\). What is the smallest \(n\) we need? You may assume a two-sided test, with \(\alpha = 0.05\), so \(\alpha/2 = 0.025\), and \(z_{\alpha/2} = 1.96\).

Two-sided power diagram: null (blue) and alternative (red) distributions with both parts of the rejection regions shaded

Check your answer!

With \(\Delta = 2\), \(\sigma = 5\), desired power \(= 0.8\), and \(z_{\alpha/2} = 1.96\): \[1 - \Phi\!\left(1.96 - \frac{2}{5}\sqrt{\frac{n}{2}}\right) = 0.8 \implies \Phi\!\left(1.96 - \frac{2}{5}\sqrt{\frac{n}{2}}\right) = 0.2.\]

Since \(\Phi^{-1}(0.2) \approx -0.84\): \[1.96 - \frac{2}{5}\sqrt{\frac{n}{2}} = -0.84 \implies \frac{2}{5}\sqrt{\frac{n}{2}} = 2.80 \implies \sqrt{\frac{n}{2}} = 7.\]

Therefore: \[\frac{n}{2} = \left[(1.96 + 0.84)\cdot\frac{5}{2}\right]^2 = 7^2 = 49 \implies \boxed{n \approx 98}.\]

References

Rice, John A. 2006. Mathematical Statistics and Data Analysis. 3rd ed. Duxbury Press.

Footnotes

Artwork by Allison Horst↩︎