Fisher’s Exact Test
Introduction
In the previous chapter, we discussed the case of the VA nurse, Kristen Gilbert, who was accused of murdering several of her patients. The prosecution team, led by Assistant U.S. Attorney William Welch, realized that they would need statistical evidence to convince the grand jury. To this end, they asked Stephen Gehlbach, a professor in the School of Public Health and Health Sciences at UMass Amherst to analyze the hospital records and testify to the grand jury.
Dr. Gehlbach had performed a statistical test on the last 18 months of the data (corresponding to 547 days and 1,641 shifts) to see if these differences could be due to chance and within ordinary variability. We have already described his testimony, and here we present a different approach to the hypothesis test.
Contingency table presented to the grand jury
We reproduce the two way contingency table that Gehlbach presented to the grand jury. We have two binary categorical variables: along the rows we have whether or not Gilbert was present on a shift, and along the columns we have whether or not there was a death on the shift.
| DEATH ON SHIFT? | |||
|---|---|---|---|
| GILBERT PRESENT? | Yes | No | Total |
| Yes | 40 | 217 | 257 |
| No | 34 | 1350 | 1384 |
| Total | 74 | 1567 | 1641 |
TABLE 1 The basis of the statistical test
Now, though it is clear that there appears to be an imbalance in the numbers in the table above, one might argue that the coin-tossing analogy doesn’t make sense since it isn’t clear that each shift is independent of the others. We need to test the null hypothesis (of the defense) that there is nothing nefarious going on, and any differences observed in the frequencies of deaths is just due to chance variation.
We will perform Fisher’s exact test on this data.
Fisher’s exact test
Under the null hypothesis of no association between the row variable and the column variable, the margins of the table are fixed. We denote the counts in the table, and the totals in the margins as shown in the table below:
| \(N_{11}\) | \(N_{12}\) | \(n_{1.}\) |
| \(N_{21}\) | \(N_{22}\) | \(n_{2.}\) |
| \(n_{.1}\) | \(n_{.2}\) | \(n_{..}\) |
Note that \(N_{ij}\) are random variables, with \(n_{i.}\) being the total of the \(i\)th row and \(n_{.j}\) being the total of the \(j\)th column.
Under the null hypothesis, \(N_{11}\) is a hypergeometric random variable, in which we are taking a sample of size \(n_{.1}\) from a population of size \(n_{..}\) and counting the number of white balls in the sample. Recall that the population consists of white balls and black balls, and in this situation, we have \(n_{1.}\) white balls and \(n_{2.}\) black balls.
This gives us that \(N_{11} \sim \mathrm{HG}(w =n_{1.}, b = n_{2.}, n = n_{.1})\) and \[ P(N_{11} = k) = \frac{\binom{n_{1.}}{k}\binom{n_{2.}}{n_{.1}-k}}{\binom{n_{..}}{n_{.1}}} \] If we let \(k = n_{11}\), the observed value of \(N_{11}\), we get (note that \(n_{11}+n_{21} = n_{.1}\)): \[ P(N_{11} = n_{11}) = \frac{\binom{n_{1.}}{n_{11}}\binom{n_{2.}}{n_{21}}}{\binom{n_{..}}{n_{.1}}} \]
Here is the table with the Gilbert data:
| Death on shift | No death on shift | ||
|---|---|---|---|
| Gilbert present on shift | \(N_{11} = 40\) | \(N_{12} = 217\) | \(n_{1.} = 257\) |
| Gilbert not present on shift | \(N_{21} = 34\) | \(N_{22} = 1350\) | \(n_{2.} = 1384\) |
| \(n_{.1} = 74\) | \(n_{.2} = 1567\) | \(n_{..} = 1641\) |
Plugging in the numbers for the Gilbert data, we get that the \(p\)-value for this test is: \[
P(N_{11} \ge 40) = \frac{\binom{257}{40}\binom{1384}{34}}{\binom{1641}{74}} \approx 4.33\times 10^{-15}
\] We computed the \(p\)-value using \(P(N_{11} \ge 40) =\) 1-phyper(39, m = 257, n = 1384, k = 74).
In this case, we were only interested in the numbers for the value for \(N_{11}\) being too high, that is, a one-sided test. We could double the \(p\)-value for a two-sided test, but it would still essentially be zero, indicating that the association of deaths with Gilbert being present is too strong to be attributed to coincidence.
This is called an exact test since we use the exact distribution (hypergeometric), as opposed to using an approximate distribution (chi-square) to test the association. Note that we can perform this test for association when we have two binary categorical random variables. If we have more than two categories, the hypergeometric distribution will not work, and we will need to use the Chi-square test for Independence, which we will describe later.