Matched-Pair Designs
Introduction
In this lecture, we follow the material of section 13.5 of the text by Rice. We will see later, when we return to analyzing quantitative data, what we can do when we have paired data. For now, we are still dealing with categorical data. Typical examples of paired data are when we have:
- Siblings in a case-control study (here we call the sibling who has the treatment (perhaps affected by a condition or disease) the “case” and we call the unaffected sibling the “control”).
- A single subject measured before and after a treatment.
In such situations, it is clear that the data are not independent, and pairing can increase the power, and control for outside sources of variability. That said, it is not always clear how to analyze the data, as the following example shows.
Example: Tonsillectomies and Hodgkin’s Lymphoma
In a brief two-page 1971 paper published in the journal The Lancet, Vianna, Greenwald & Davies proposed that the tonsils — being lymphoid tissue — acted as a biological barrier that helped protect the body against developing Hodgkin’s disease. Their central claim was that people who had undergone a tonsillectomy (surgical removal of the tonsils) were at greater risk of developing Hodgkin’s disease, because the removal of this lymphoid tissue stripped them of that protective barrier. To this end, they collected data that compared the percentage of tonsillectomies among a group of patients sufferent from Hodgkin’s lymphoma (a cancer of the lymphatic system) and among an independent control group. Here is their data, presented in a contingency table of patients vs. controls:
| Tonsillectomy | No Tonsillectomy | Total | |
|---|---|---|---|
| Hodgkins | 67 | 34 | 101 |
| Control | 43 | 64 | 107 |
| Total | 110 | 98 | 208 |
Table with expected values for chi-square test
The expected values are in parentheses. | | Tonsillectomy | No Tonsillectomy | Total | |—|—|—|—| | Hodgkins | 67 (53.4) | 34 (47.5) | 101 | | Control | 43 (56.3) | 64 (50.4) | 107 | | Total | 110 | 98 | 208 |They concluded that the distributions were not homogeneous and suggested that tonsil removal was associated with greater incidence of Hodgkin’s disease, which fit in with their belief. This paper set of a flurry of research, and people found it difficult to replicate their result. A paper by another pair of investigators, Sandra Johnson and Ralph Johnson, published in the New England Journal of Medicine in 1972, looked at 85 patients with Hodgkin’s lymphoma who had a sibling, relatively close in age to them (within 5 years), and was disease free. They compared the patients to their siblings, not an independent control group. They presented the following data:
| Tonsillectomy | No Tonsillectomy | Total | |
|---|---|---|---|
| Hodgkin’s patients | 41 | 44 | 85 |
| Control siblings | 33 | 52 | 85 |
| Total | 74 | 96 | 170 |
Now they found a chi-square statistic of 1.53, which was not significant, using 1 degree of freedom. This appeared to rebut the earlier claim by Vianna, Greenwald, and Davies - or did it?
The problem with their analysis is that siblings do not constitute independent samples. This is an example of paired data, and would have to be examined using different methods. We have to redo the table with the entries in each cell being the number of pairs of siblings for that cell count. Here is the reorganized table, and note that the grand total is now 85, not 170, because we are counting sibling pairs and not individuals:
| Sibling: No Tonsillectomy | Sibling: Tonsillectomy | |
|---|---|---|
| Patient: No Tonsillectomy | 37 | 7 |
| Patient: Tonsillectomy | 15 | 26 |
Note: \(37 + 7 + 15 + 26 = 85\) pairs total. Thus, instead of comparing two multinomial samples (patients vs siblings), with two categories each (tonsillectomy vs no tonsillectomy), we are in fact, looking at one multinomial sample of 85 pairs, and 4 categories (as shown above). Now we need to check if the probabilities of tonsillectomies are the same for both patients and their siblings.
The null hypothesis is: the probability of tonsillectomy is the same for cases (patients) and controls (siblings). That is, whether you had your tonsils removed is not associated with Hodgkin’s lymphoma.
Let’s develop the test we will use in this situation.
McNemar’s Test
Setup
Let \(\pi_{ij}\) denote the probability that a pair falls in cell \((i,j)\). The general \(2 \times 2\) table of probabilities is:
| Control: Level 1 | Control: Level 2 | Row marginal | |
|---|---|---|---|
| Case: Level 1 | \(\pi_{11}\) | \(\pi_{12}\) | \(\pi_{1\cdot}\) |
| Case: Level 2 | \(\pi_{21}\) | \(\pi_{22}\) | \(\pi_{2\cdot}\) |
| Column marginal | \(\pi_{\cdot 1}\) | \(\pi_{\cdot 2}\) | 1 |
The null hypothesis is that the probabilities of tonsillectomy vs no tonsillectomy should be the same for both components of the pair (patient and sibling). Whether they have Hodgkin’s or not should not matter.
\[ H_0: \pi_{i\cdot} = \pi_{\cdot i} \quad \text{for } i = 1, 2 \]
This is equivalent to testing: \[ H_0: \pi_{12} = \pi_{21} \qquad \text{vs.} \qquad H_1: \pi_{12} \neq \pi_{21} \]
Key insight: The concordant pairs (diagonal cells) tell us nothing about a treatment effect — both members of the pair responded the same way regardless. Only the discordant pairs (off-diagonal) are informative.
MLEs Under \(H_0\)
Under the constraint \(\pi_{12} = \pi_{21}\), the MLE of the common off-diagonal probability is: \[ \hat{\pi}_{12} = \hat{\pi}_{21} = \frac{n_{12} + n_{21}}{2n} \] The diagonal MLEs are unchanged: \(\hat{\pi}_{11} = n_{11}/n\) and \(\hat{\pi}_{22} = n_{22}/n\).
The unconstrained MLEs are \(\hat{\pi}_{ij} = n_{ij}/n\), that is the usual MLE for a multinomial.
Deriving the Test Statistic
Using Pearson’s chi-square statistic \(X^2 = \sum (O_{ij} - E_{ij})^2 / E_{ij}\), the observed and expected counts are:
- \(O_{11} = E_{11} = n_{11}\), \(\quad O_{22} = E_{22} = n_{22}\) (diagonal terms vanish)
- \(O_{12} = n_{12}\), \(\quad E_{12} = n \cdot \hat{\pi}_{12} = \dfrac{n_{12} + n_{21}}{2}\)
- \(O_{21} = n_{21}\), \(\quad E_{21} = n \cdot \hat{\pi}_{21} = \dfrac{n_{12} + n_{21}}{2}\)
Let \(c = (n_{12} + n_{21})/2\). Then:
\[X^2 = \frac{(n_{12} - c)^2}{c} + \frac{(n_{21} - c)^2}{c}\]
Expanding the numerator:
\[= \frac{n_{12}^2 + c^2 - 2c\,n_{12} + n_{21}^2 + c^2 - 2c\,n_{21}}{c}\]
\[= \frac{n_{12}^2 + n_{21}^2 + 2c^2 - 2c(n_{12} + n_{21})}{c}\]
Since \(n_{12} + n_{21} = 2c\), we have \(2c(n_{12}+n_{21}) = 4c^2\), so \(2c^2 - 4c^2 = -2c^2\):
\[= \frac{n_{12}^2 + n_{21}^2 - 2c^2}{c} = \frac{n_{12}^2 + n_{21}^2 - \frac{(n_{12}+n_{21})^2}{2}}{\frac{n_{12}+n_{21}}{2}}\]
After simplification this reduces to the elegant McNemar statistic:
\[\boxed{X^2 = \frac{(n_{12} - n_{21})^2}{n_{12} + n_{21}}}\]
Degrees of Freedom
The full multinomial has 4 cells constrained to sum to 1, giving dimension \(4 - 1 = 3\). Under \(H_0\) we impose one additional constraint (\(\pi_{12} = \pi_{21}\)), leaving dimension \(3 - 1 = 2\). Therefore:
\[df = 3 - 2 = 1\]
Application to the Tonsillectomy Example
From the paired table: \(n_{12} = 7\), \(n_{21} = 15\).
\[X^2 = \frac{(7 - 15)^2}{7 + 15} = \frac{64}{22} \approx 2.91\]
\[P( X^2 > 2.91 \mid X^2 \sim \chi^2_1) \approx 0.088\]
At \(\alpha = 0.05\), we fail to reject \(H_0\). There is insufficient evidence that the probability of tonsillectomy differs between Hodgkin’s patients and their sibling controls. (The earlier analyses by Johnson and Johnson were flawed because they ignored the pairing structure.)