In statistical hypothesis testing, you often have the choice between tests that assume a certain distribution of the underlying data and tests that don’t make these assumptions. For example, when evaluating a drug trial, you can choose between, e.g., the t-test, which assumes that the experimental data is normally distributed, or the Mann-Whitney U test, otherwise known as the Wilcoxon rank-sum test, which makes no assumption about that distribution. As Rebecca Black might say, “which test should you take?”

The t-test is called a *parametric test*, since it assumes that the data in question follows a specific, known, distribution under the null hypothesis (perhaps with unknown distribution parameters), in this case the normal distribution. Unfortunately, there is rarely if ever a good theoretical reason for that assumption. Usually nothing is known about the experimental error and its distribution, let alone about the true population values.

The rank-sum test is called an *unparametric test*, since it makes no assumption about the distribution of the data. Generally, unparametric tests make fewer assumptions than parametric tests.

Now it is true that *if* you know the distribution of your data, then parametric tests will give you closer bounds and better estimates, but if the assumptions turn out not to be true, the results can be misleading or wrong. And as I have remarked above, it is rare to know the distribution of your data from theoretical reasons.

As David Colquhoun has remarked in his excellent statistics textbook (which, by the way, is available for download and which is so full of good common sense that I have yet to see a better one), statistical tests exist to prevent you from making a fool of yourself. And the fewer assumptions you make, the less chance there is of fooling yourself and your readers. Hence I recommend that *when in doubt, use unparametric tests*. The nature of things will make it so that you almost always will be in doubt.

Of course, as Colquhoun also observes, “if you the distribution is known (not assumed but *known*), then use the appropriate parametric test”. If the distribution is not known, then sometimes, with small samples, a real effect might not give a statistically significant result, but that merely means that “it is a disadvantage not to know the distribution” and that this “does not constitute a disadvantage of nonparametric tests”.

Additionally, nonparametric tests can often be as powerful as parametric tests. Let’s try an example. Assume that we are testing blood pressure reduction medication, and that we have 20 people, randomly assigned to a treatment group and a control group. The treatment group gets the drug and the control group gets a placebo. After two weeks, the systolic blood pressure is measured in those 20 patients, and this is the result:

Control Group:

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|

145 | 152 | 134 | 121 | 147 | 171 | 133 | 148 | 162 | 155 |

Treatment Group:

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|

137 | 144 | 128 | 113 | 139 | 163 | 125 | 140 | 154 | 147 |

If you look at the table closely, you will see that I have cheated. I have simply subtracted 8 from the control group’s values. This means that because of the way the datasets are constructed, there is a real difference in means. Let’s see if the test finds it.

Is our data normally distributed? Here is a Q-Q plot of the control group data, which doesn’t look so bad:

A Shapiro-Wilk normality test yields a *p*-value of 0.98, so the data is as nicely normal as a data set consisting only of ten values can be. Now our null hypothesis is that the treatment had no effect. And we are in luck because (1) our data is normally distributed, (2) the two samples are independent (one presumes), (3) the sample sizes are equal, and (4) the variances in the data are equal. We know that last one because of the way I cheated with the data sets. Therefore, we can test using R:

> bp.control <- c(145, 152, 134, 121, 147, 171, 133, 148, 162, 155)

> bp.treatment <- c(137, 144, 126, 113, 139, 163, 125, 140, 154, 147)

> t.test(bp.control, bp.treatment, var.equal = TRUE, alternative = "two.sided")

```
```Two Sample t-test

`data: bp.control and bp.treatment`

t = 1.2198, df = 18, p-value = 0.2383

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-5.778803 21.778803

sample estimates:

mean of x mean of y

146.8 138.8

Holy smoke! Even with all these nice assumptions we only get a *p*-value of 24%! The small data set and the large variation within that data set have completely masked the small but real reduction! Let’s see the Wilcoxon rank-sum test then.

> wilcox.test(bp.control, bp.treatment, exact=FALSE, alternative="two.sided")

Wilcoxon rank sum test with continuity correction

data: bp.control and bp.treatment

W = 65.5, p-value = 0.2567

alternative hypothesis: true location shift is not equal to 0

Why does R call the Wilcoxon test `wilcox.test`

and not `wilcoxon.test`

? No idea. Anyway, the unparametric test, with no assumptions at all about the underlying distribution, comes to essentially the same conclusion as the t-test, which makes quite strong assumptions.