Two-sample tests: Paired data

[DRAFT, IN PROGRESS]

Introduction

So far, for both the \(t\)-test and the Mann–Whitney-Wilcoxon tests, we worked with independent samples. Now we consider samples \(X_1, X_2, \ldots, X_n\) and \(Y_1, Y_2, \ldots, Y_n\) that are not necessarily independent, but may be paired in some way.

Motivating examples:

Analyze effect of a treatment on twins.
Measure effect of blood alcohol on reaction times: test reaction times on volunteers before and after they reach a pre-specified blood alcohol level.
Check effect of smoking on blood platelet aggregation: draw blood from subjects before and after they smoke a cigarette.

Since the samples are not independent, the methods of §11.2 do not apply. If we are interested in the difference between the two groups, it is natural to focus on

\[ D = X - Y. \]

Methods for Analyzing Paired Differences

Normal-distribution–based approach
Rank-based (nonparametric) approach

Method Based on the Normal Distribution

Given \(X_1, \ldots, X_n\) and \(Y_1, \ldots, Y_n\), let

\[ D_i = X_i - Y_i. \]

Assume \(D_i \sim \mathcal{N}(\mu_D,\, \sigma_D^2)\), so that \(\mu_D = \mu_X - \mu_Y\).

Since \(\sigma_D\) is typically unknown we use the one-sample \(t\)-statistic:

\[ t = \frac{\bar{D} - \mu_D}{s_D / \sqrt{n}} \sim t_{n-1}, \qquad \text{where } \bar{D} = \bar{X} - \bar{Y}. \]

The \((1-\alpha)100\%\) confidence interval is

\[ \bar{D} \;\pm\; t_{n-1,\,\alpha/2} \cdot \frac{s_D}{\sqrt{n}}. \]

To test whether a treatment has no effect:

\[ H_0: \mu_D = 0 \quad \text{vs.} \quad H_1: \mu_D \neq 0. \]

Nonparametric Approach: Wilcoxon Signed-Rank Test

When to use

Use the signed-rank test when the population distribution is unknown, so the \(t\)-based approach cannot be justified.

Sunburn Example

Seven volunteers test suntan lotions with and without a new ingredient. Each volunteer applies the old lotion to one side of their spine and the new lotion to the other (side chosen randomly), then exposes their back to the sun.

Volunteer	1	2	3	4	5	6	7
\(X_i\): Degree of sunburn (old lotion)	42	51	31	61	44	55	48
\(Y_i\): Degree of sunburn (new lotion)	38	53	36	52	33	49	36
\(D_i = X_i - Y_i\)	4	−2	−5	9	11	6	12
Rank of \(\|D_i\|\)	2	1	3	5	6	4	7

(Negative differences highlighted; their ranks are circled in the original notes.)

Computing the test statistics:

\[ W_- = \text{rank-sum of negative differences} = 1 + 3 = 4 \]

\[ W_+ = \text{rank-sum of positive differences} = 28 - 4 = 24 \]

Verification — ranks 1 through \(n\) always sum to \(\frac{n(n+1)}{2}\):

\[ W_- + W_+ = \frac{7(7+1)}{2} = \frac{56}{2} = 28. \checkmark \]

We use \(W_- = 4\) (the smaller of the two) as our test statistic.

Wilcoxon Signed-Rank Test: Idea and Mechanics

Intuition

Tip

Under \(H_0\) (the new ingredient has no effect), \(D_i\) and \(-D_i\) have the same distribution, so the \(k\)-th largest difference is equally likely to be positive or negative. Hence all sign assignments to the \(n\) ranks are equally likely.

Hypotheses

\[ H_0: \text{All sign assignments to ranks are equally likely} \quad\text{(Di's symmetric about 0)} \]

\[ H_1: \text{The treatment has an effect} \quad\text{(Di's not equally likely to be positive or negative)} \]

Mechanics

Compute \(D_i = X_i - Y_i\).
Rank the absolute differences \(|D_i|\); let \(R_i = \text{rank}(|D_i|)\).
Compute

\[ W_+ = \sum_{i:\, D_i > 0} R_i, \qquad W_- = \sum_{i:\, D_i < 0} R_i. \]

Use whichever of \(W_+\) or \(W_-\) is smaller as the test statistic.

Null Distribution of \(W_-\) (or \(W_+\))

Under \(H_0\) there are \(2^n\) equally likely sign assignments, giving \(2^n\) equally likely values of \(W_-\).

\[ P(W_- \leq w) = \frac{\#\{W_- \leq w\}}{2^n} \]

Sunburn Example (continued)

With \(n = 7\), there are \(2^7 = 128\) equally likely sign assignments. We need \(P(W_- \leq 4)\).

Value of \(W_-\)	Rank assignments (negative \(D_i\)’s)
0	\(\emptyset\) — no negative \(D_i\)
1	\(\{1\}\) — smallest \(D_i < 0\)
2	\(\{2\}\) — second smallest \(D_i < 0\)
3	\(\{1,2\},\ \{3\}\)
4	\(\{1,3\},\ \{4\}\)

Total assignments with \(W_- \leq 4\): \(\;1+1+1+2+2 = 7\).

\[ P(W_- \leq 4) = \frac{7}{128} \approx 0.0547 \]

This is our \(p\)-value. For a two-sided test at level \(\alpha = 0.05\), we would double it: \(2 \times 0.0547 \approx 0.109\), so we fail to reject \(H_0\) at the 5% level.

For small \(n\) one can also use Table 9 in the textbook.

Normal Approximation for \(W_+\) (when \(n \geq 20\))

\[ E(W_+) = \frac{n(n+1)}{4} \]

\[ \operatorname{Var}(W_+) = \frac{n(n+1)(2n+1)}{24} \]

Reject \(H_0\) for large values of the standardized statistic:

\[ \left| \frac{W_+ - \dfrac{n(n+1)}{4}}{\sqrt{\dfrac{n(n+1)(2n+1)}{24}}} \right| \]

References

Rice, John A. 2006. Mathematical Statistics and Data Analysis. 3rd ed. Duxbury Press.