Two sample tests: Continuing Nonparametric Tests
[DRAFT, IN PROGRESS]
Introduction
Practice: Steps to Perform the WMW Test
- Form the combined sample:
\[ Z = (X_1, \ldots, X_n, Y_1, \ldots, Y_m) \]
Let \(n_1 = \min(n, m)\) — consider the smaller sample first. Suppose \(n_1 = n\) and \(R = R(X)\).
\(R\) = sum of ranks of the smaller sample.
Compute:
\[ R' = n_1(m + n + 1) - R \]
Since \(n_1 \leq \frac{m+n}{2}\), we have \(R(X) + R(Y) \geq n_1(m+n+1)\), so \(R'\) is at most the sum of ranks of the other sample.
- Compute:
\[ R^* = \min(R, R') \]
- Compare \(R^*\) to the table for the appropriate significance level. If it is too small, reject the null.
Why? If \(n < m\), then:
\[ R(X) + R(Y) \geq n(m+n+1) \implies R(Y) \geq n(m+n+1) - R(X) \]
Notes: WMW Practice
Ties: Assign the average of the ranks. For example, if the data is \(0.1, 0.1, 0.25\), the ranks would be \(1.5, 1.5, 3\).
Since \(n_1 = \min(n, m) \implies n_1 \leq (m+n)/2\)
Thus:
\[ R(X) + R(Y) \geq n_1(m + n + 1) \]
- Suppose \(|X| \leq |Y|\), then \(n_1 = |X|\) and \(R = R(X)\)
\[ R(Y) \geq n_1(m + n + 1) - R \]
\(R'\) is at most the sum of ranks of the other sample.
We look at the lower quantiles and reject the null if \(R^*\) is below the critical value associated with \(\alpha\).
Example: Pizza Places
Data (\(n = 7\) for place A, \(m = 5\) for place B, so \(n_1 = 5\)):
| Pizza place | Value | Rank |
|---|---|---|
| A | 20.4 | 8 |
| A | 24.2 | 12 |
| A | 15.4 | 1 |
| A | 21.4 | 10 |
| A | 20.2 | 6.5 |
| A | 18.5 | 5 |
| A | 21.5 | 11 |
| B | 20.2 | 6.5 |
| B | 16.9 | 2 |
| B | 18.4 | 4 |
| B | 17.3 | 3 |
| B | 20.5 | 9 |
\[ R(X) = 53.5, \quad R(Y) = 24.5 \]
\[ R' = m(m+n+1) - R(Y) = 5(5+7+1) - 24.5 = 65 - 24.5 = 40.5 \]
\[ R^* = \min(24.5,\ 40.5) = \boxed{24.5} \]
Critical value from table \(= 20\). Since \(R^* > 20\), fail to reject \(H_0\).
There is no real difference in the rankings.
Example: Flint Hardness
An experiment compares the hardness of flint from areas A and B (4 samples from A, 5 from B). Pieces were rubbed against each other and ranked from softest to hardest:
\[ \underbrace{A}_{1}\ \underbrace{A}_{2}\ \underbrace{A}_{3}\ \underbrace{B}_{4}\ \underbrace{A}_{5}\ \underbrace{B}_{6}\ \underbrace{B}_{7}\ \underbrace{B}_{8}\ \underbrace{B}_{9} \]
Here \(X \leftrightarrow A\), \(Y \leftrightarrow B\), \(n = 4\), \(m = 5\), so \(n_1 = 4\).
\[ H_0: \text{Flints from both areas are of equal hardness} \]
\[ H_1: \text{Flints are not of equal hardness} \]
The sample from region A is smaller:
\[ R(A) = 1 + 2 + 3 + 5 = 11 \]
(Smallest possible rank sum for A is \(10\); largest possible is \(34\).)
\[ R' = m(m+n+1) - R(X) = 4(10) - 11 = 29 \]
\[ R^* = \min(11,\ 29) = 11 \]
With \(n_1 = 4\), \(n_2 = 5\), \(\alpha = 0.05\): Reject the null.
Table 8: Critical Values for the WMW Test
Table 8 gives critical values of the smaller rank sum \(R^*\) for the Wilcoxon Mann-Whitney test, indexed by \(n_2\) (larger sample size) and \(n_1\) (smaller sample size), for both one-sided and two-sided tests at various \(\alpha\) levels.
Back to Toy Example: \(m = 2\), \(n = 3\)
Recall the rank-sum count table from before:
| \(w\) | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|
| \(\#(W_T \leq w)\) | 1 | 1 | 2 | 2 | 2 | 1 | 1 | |||
| \(\#(W_C \leq w)\) | 1 | 1 | 2 | 2 | 2 | 1 | 1 |
The rows are shifted copies of each other. We can make them coincide by subtracting the minimum rank sums. Define:
\[ U_T = W_T - 3, \qquad U_C = W_C - 6 \]
Since they look the same, we don’t need to distinguish between them — call them both \(U\):
| \(U\) | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|---|
| \(\#(U = u)\) | 1 | 1 | 2 | 2 | 2 | 1 | 1 |
| \(\#(U \leq u)\) | 1 | 2 | 4 | 6 | 8 | 9 | 10 |
Approximate Distribution of Rank Sum
Let \(T_Y\) be the sum of the ranks of \(Y_1, Y_2, \ldots, Y_m\) (so \(T_Y = W_C\)).
Under \(H_0\) (Theorem A, page 439):
\[ E(T_Y) = \frac{m(m + n + 1)}{2} \]
\[ \text{Var}(T_Y) = \frac{mn(m + n + 1)}{12} \]
\[ \frac{T_Y - E(T_Y)}{\sqrt{\text{Var}(T_Y)}} \xrightarrow{d} \mathcal{N}(0, 1) \]
§11.3: Paired Samples
So far, both the \(t\)-test and the Mann-Whitney test assumed independent samples. Now we consider samples \(X_1, X_2, \ldots, X_n\) and \(Y_1, Y_2, \ldots, Y_n\) that are not necessarily independent, but may be paired:
- Effect of a treatment on twins
- Effect of blood alcohol on reaction times: measure each volunteer before and after consuming alcohol
- Effect of smoking on blood platelet aggregation: draw blood before and after smoking a cigarette
Since the samples are not independent, the methods of §11.2 do not apply. If we are interested in the difference between the two groups, it is natural to focus on:
\[ D = X - Y \]
We will use two methods to analyze differences of paired samples: 1. Normal distribution approach 2. Rank-based approach
Method Based on the Normal Distribution
Given pairs \((X_1, Y_1), \ldots, (X_n, Y_n)\), let:
\[ D_i = X_i - Y_i \]
Assume \(D_i \sim \mathcal{N}(\mu_D, \sigma_D^2)\), so \(\mu_D = \mu_X - \mu_Y\).
Since \(\sigma_D\) is usually unknown, we use the \(t\)-statistic:
\[ t = \frac{\bar{D} - \mu_D}{s_{\bar{D}}} \sim t_{n-1} \]
where \(\bar{D}\) is the sample mean of the differences, and \(\bar{D} = \bar{X} - \bar{Y}\).
Note: this is a one-sample \(t\)-test applied to the \(D_i\)’s!
The \((1-\alpha)100\%\) confidence interval is:
\[ \bar{D} \pm t_{n-1,\, \alpha/2} \cdot s_{\bar{D}} \]
To test whether the treatment has no effect:
\[ H_0: \mu_D = 0 \qquad \text{versus} \qquad H_1: \mu_D \neq 0 \]
Nonparametric Example: Suntan Lotion
Seven volunteers test suntan lotions with and without a new ingredient. The old lotion is applied to one side of the spine, the new to the other (side selected randomly). Degree of sunburn recorded:
| Volunteer | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|
| \(X_i\): old lotion | 42 | 51 | 31 | 61 | 44 | 55 | 48 |
| \(Y_i\): new lotion | 38 | 53 | 36 | 52 | 33 | 49 | 36 |
| \(D_i = X_i - Y_i\) | 4 | \(-2\) | \(-5\) | 9 | 11 | 6 | 12 |
| Rank of \(|D_i|\) | 2 | 1 | 3 | 5 | 6 | 4 | 7 |
Under \(H_0\) (the new ingredient does not improve the lotion), the \(k\)-th largest difference should be equally likely to be positive or negative, and all sign assignments to the differences are equally likely.
Wilcoxon Signed Rank Test
Idea: If there is no difference between \(X_i\) and \(Y_i\), then half the \(D_i\)’s should be positive and half negative — i.e., the \(D_i\)’s should come from a symmetric distribution.
Mechanics:
- Compute \(D_i = X_i - Y_i\)
- Find and rank the absolute differences \(|D_i|\)
- Define \(R_i = \text{rank of } |D_i|\)
- Compute:
\[ W_+ = \sum_{i:\, D_i > 0} R_i \]
- Compute \(W_-\) analogously (sum of ranks where \(D_i < 0\)). Use whichever of \(W_+\) or \(W_-\) is smaller.
Under the null hypothesis of symmetry, \(W_+\) should not be too large or too small, since \(D_i\) and \(-D_i\) have the same distribution. If the null is false, \(W_+\) will be more extreme.
Signed Rank Test: Null Distribution
If the null is true, then since \(D_i\) and \(-D_i\) have the same distributions, the \(k\)-th largest difference is equally likely to be positive or negative. This means any assignment of signs to the \(n\) ranks is equally likely.
- Total number of sign assignments: \(2^n\)
- We get \(2^n\) equally likely values of \(W_+\) (and \(W_-\)), not necessarily all distinct.
The null distribution of \(W_+\) or \(W_-\) is:
\[ P(W_+ \leq w) = \frac{\#(W_+ \leq w)}{2^n} \]
- For small \(n\): use Table 9
- For \(n \geq 20\): use a normal approximation to the distribution of \(W_+\)