StATS: What is a Mann-Whitney test?
The Mann-Whitney test (sometimes called the Wilcoxon-Mann-Whitney test) is a nonparametric test. It compares two independent groups on an outcome variable that is ordinal. Ordinal means that the values can be ranked from low to high. This is a less stringent type of data than continuous data, and can incorporate measurements like grades (A+, A, A-, B+, etc.) and Likert scale items (Strongly Disagree, Moderately Disagree, Slightly Disagree, Neutral, etc.) where you may be uncomfortable assigning a numeric code to the data.
Because the Mann-Whitney test is nonparametric, it does not require the data to follow a normal distribution. It performs reasonably well for a variety of distributions that are decidedly non-normal and is less sensitive to outliers than the traditional two sample t-test. The Mann-Whitney test uses a ranking of the data, and some statisticians feel this can distort the results, especially when there are a lot tied values in the data..
There is extensive debate over when you should use this test, but in my opinion, the choice of this test versus a t-test is not all that critical.
There are two equivalent approaches for computing the Mann-Whitney test. The first approach calculates P[X>Y], the probability that a randomly selected patient from the first group has a larger value than a randomly selected patient from the second group. The second approach computes the rank of all the data and looks at whether the sum of the ranks for the patients in the first group is either too big or too small.
Here's an example from the web. Nine elderly and eight young patients were asked to stand on a device that measures postural sway, the tendency for a person's center of gravity to shift over time. A large postural sway might indicate a tendency to lose balance easily. Here is the data
age fbsway sidesway
1 Elderly 19 14
2 Elderly 30 41
3 Elderly 20 18
4 Elderly 19 11
5 Elderly 29 16
6 Elderly 25 24
7 Elderly 21 18
8 Elderly 24 21
9 Elderly 50 37
10 Young 25 17
11 Young 21 10
12 Young 17 16
13 Young 15 22
14 Young 14 12
15 Young 14 14
16 Young 22 12
17 Young 17 18A boxplot of the front to back sway (fbsway) shows that the elderly patients have a tendency to have larger values.
Rank the data to get the following:
age fbsway rlsway
1 Elderly 19 6/7
2 Elderly 30 16
3 Elderly 20 8
4 Elderly 19 6/7
5 Elderly 29 15
6 Elderly 25 13/14
7 Elderly 21 9/10
8 Elderly 24 12
9 Elderly 50 17
10 Young 25 13/14
11 Young 21 9/10
12 Young 17 4/5
13 Young 15 3
14 Young 14 1/2
15 Young 14 1/2
16 Young 22 11
17 Young 17 4/5It's not too clear what to do with the ties, but the simplest thing is to average. If two values are tied for the smallest rank, rather than assigning the ranks of 1 and 2, compromise and assign 1.5 to both.
age fbsway rfbsway
1 Elderly 19 6.5
2 Elderly 30 16
3 Elderly 20 8
4 Elderly 19 6.5
5 Elderly 29 15
6 Elderly 25 13.5
7 Elderly 21 9.5
8 Elderly 24 12
9 Elderly 50 17
10 Young 25 13.5
11 Young 21 9.5
12 Young 17 4.5
13 Young 15 3
14 Young 14 1.5
15 Young 14 1.5
16 Young 22 11
17 Young 17 4.5The sum of ranks associated with the young patients is a little easier to calculate since there are only 8 of them. The sum is 49. The lowest possible sum (if all the values in the young group were smaller than all the values in the elderly group) would be 36 (1+2+...+8), and the largest possible value would be 100 (10+...+17). If the ranks of 1-17 were equally likely, then you'd expect to see a sum of 72. Clearly, the value in this data set is rather low, causing you to believe, perhaps, that young patients have less front-to-back sway than older patients.
Arrange the data to compute all pairwise differences. The first elderly value (19) minus the first young value (25) gives a difference of -6. The first elderly value (19) minus the second young value (25) gives a difference of -2. Keep on doing this until you subtract the last elderly value (50) from the last young value (17) to get a difference of 33.
19 30 20 19 29 25 21 24 50
25 -6 5 -5 -6 4 0 -4 -1 25
21 -2 9 -1 -2 8 4 0 3 29
17 2 13 3 2 12 8 4 7 33
15 4 15 5 4 14 10 6 9 35
14 5 16 6 5 15 11 7 10 36
14 5 16 6 5 15 11 7 10 36
22 -3 8 -2 -3 7 3 -1 2 28
17 2 13 3 2 12 8 4 7 33Notice that there is a mix of positive and negative differences, but mostly positive differences. There are exactly 58 positive differences, 12 negative differences, and 2 zero differences. It's not exactly clear what you should do with the zero differences, but treating each one as half positive and half negative seems to be reasonable. With 59 positive and 13 negative differences, you would estimate P[X>Y] at 82%. That's quite a bit larger than 50% and also seems to indicate that the elderly patients tend to have larger front-to-back sway values than young patients.
I won't show the details here, but you can easily compute a p-value for the Mann-Whitney test. A confidence interval takes a bit more work; it uses the pairwise differences described above. All the details can be found on pages 106-135 of Nonparametric Statistical Methods. Hollander M, Wolfe DA (1999) New York: John Wiley & Sons, Inc.
The data set described above is available at the OzDASL web site at http://www.statsci.org/data/general/balaconc.html. It was originally published in
Teasdale, N., Bard, C., La Rue, J., and Fleury, M. (1993). On the cognitive penetrability of posture control. Experimental Aging Research 19, 1-13.
This page was written by Steve Simon while working at Children's Mercy Hospital. Although I do not hold the copyright for this material, I am reproducing it here as a service, as it is no longer available on the Children's Mercy Hospital website. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Definitions.