P.Mean: What are the assumptions of logistic regression (created 2011-09-01).

News: Sign up for "The Monthly Mean," the newsletter that dares to call itself average, www.pmean.com/news.

Does anyone have a good reference for the assumptions of binary logistic regression? I have a client who has an anonymous reviewer who says his analysis doesn't meet one of the assumptions, but it doesn't make any sense in this situation, and I think the reviewer doesn't understand something.

There are only two assumptions: linearity on the log odds scale and independence. Independence is usually established by citing the sampling mechanism. As long as you don't do something like take data from identical twins, you should be okay here. Independence becomes a problem with cluster sampling and hierarchical models.

There is a formal test of linearity called the Hosmer and Lemeshow test. You split your continuous predictor variable into ten groups, and compare the group probabilities with what a model that is linear on the log odds scale would predict. You can also fit a more complex model, such as a spline and compare it to the linear model.

Linearity, of course, is a moot point if your predictor variable is categorical. A binary predictor variable, for example, reduces to a two by two table, and the only assumption for a two by two table is independence.

Creative Commons License This page was written by Steve Simon and is licensed under the Creative Commons Attribution 3.0 United States License. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Logistic Regression.