Use of Likert data with ANOVA (created 2009-10-13)

This page is moving to a new website.

I never quite feel I can offer my students a thoughtful explanation about the use of Likert data with ANOVA. It is recommended that ANOVA be used with interval or ratio data, but, in practice, ANOVA is sometimes used when the data is ordinal (as you'd find when using Likert scales). This confuses some students. Are there any good references out there I can share with my students that might explain the pros and cons of using ordinal data with ANOVA?

Your discomfort is shared by many of us. I have vacillated on this issue many times. I would just let students know that use of ANOVA for Likert scale items is controversial. There is no consensus in the research community on how to handle this type of data. When there is no consensus, choose whatever you like, but be prepared to re-analyze the data when the peer-reviewer asks you to change to the competing procedure.

I should note that use of nonparametric procedures with Likert scale items is also controversial. So using Kruskal-Wallis instead of ANOVA is not a guarantee to avoid criticism.

One might consider Item Response Theory, though I have never used this approach, or ordinal logistic regression, which I have used only rarely for this type of data. In a basic statistics class, it may be a mistake to mention a bunch of advanced methods like these.

I have a few references at my old website about analysis of Likert scale items.

and a page at my new website

notes that the use of ANOVA for a sum of Likert scale items is a bit less controversial.

Let me suggest that whether you use ANOVA for Likert scale items depends on your general attitude towards averaging Likert scale items. There's a great New Yorker cartoon that shows a road side sign for a town, let's call it New Bedford. On the sign announcing the town, it proclaims:

 and beneath these statistics it shows the following

I liked the cartoon so much that I paid to have it included in my book on the chapter on meta-analysis. But the general concept also applies here.

There are some things that were not meant to be totaled (or equivalently, some things that were not meant to be averaged). If you average a Likert scale, you are making the presumption that a score of 1 combined with a score of 5 is equivalent to two scores of 3. In other words, a "strongly agree" and a "strongly disagree" provides the same average impression as two "neutrals".

If you're comfortable with the equal intervals assumption for Likert, then the average shown doesn't bother you. If you are uncomfortable with the equal intervals assumption for Likert, then the average shown above is like that road sign.

We do present averages for certain ordinal scales. The most common example is a grade point average where we assign numbers (A=4, F=0) and then average those numbers. The grade point average implies that a student who gets two Cs is comparable to a student who gets an A and an F. Is that a reasonable assumption? Probably not, as an F is much more worse than a C than an A is better than a C.

But I don't mind using a grade point average, even if it is imperfect. And I don't mind using averages to summarize Likert scale items. An average of 3.1 tells me more or less that there is only a very slight predominance of agrees and strongly agrees over disagrees and strongly disagrees. But I would understand someone who hates using an average in this setting.

Of course, ANOVA involves calculating means, so if you are uncomfortable calculating means for ordinal data, you have to be uncomfortable with ANOVA for Likert scale items. The converse is not necessarily true, but probably is close enough. Someone who doesn't get all that upset with an average as a summary measure for a group of Likert scale measurements probably doesn't get all that upset with an ANOVA on Likert scale measurements.