A simple example of a mixed logistic regression (created 2010-10-12).

This page has moved to a new website.

I am working on a project that will require the use of mixed linear and mixed logistic regression models. I thought I should spend some time working with the latter models to familiarize myself with how they work.

Mixed logistic regression models work much like a logistic regression model. You assume that your independent variables are linearly related to the log odds of an event occurring. For most variables, this implies that the antilog of the parameter estimate represents an odds ratio. For example, if you have a binary variable representing whether the patient is in the control group or the treatment group, then the antilog of the indicator variable for treatment status will represent the odds ratio for the outcome versus treatment status. If there is a single continuous predictor variable, then the antilog of the parameter estimate represents the change in odds when the predictor variable increases by one unit.

In a mixed logistic regression model, an additional term is added to account for a random effect. This random effect might be clinics in a cluster randomized trial. It might be a center effect in a multi-center trial. It might be a subject effect in a longitudinal study. The mixed logistic regression model assumes that this random effect adds a normally distributed term to the log odds scale.

Here's an example of how the random effect works.

In the graph above, the solid line represents the probability curve for an "average" individual. The dotted lines represent normally distributed deviations from the curve on a log odds scale. In this example the normal distribution has a standard deviation of 1, which implies a substantial amount of variation from one individual to another.

Here's an example where the standard deviation is 0.5. Notice that each individual subject deviates less from the average or norm.

Here's an example where the standard deviation is 0.2.

What's a reasonable value to the standard deviation of the random effect? If you have some preliminary data, you can estimate this standard deviation directly, but in many settings, you have to plan a study before you know what that value might be.

The standard deviation of the random effects places bounds on the amount of heterogeneity among subjects, clinics, or centers. A standard deviation of 1 implies a range of approximately +/-3, and on a log odds scale, this is huge. The antilog of 3 is approximately 20, which means that the most extreme subject has odds of an event occurring that is uniformly 20 times higher across the range of your independent variable compared to the average. With a standard deviation of 0.2, the range would be +/-0.6. Since the antilog of 0.6 is approximately 1.8, there would be much less heterogeneity, even for an extreme subject.

You can also visualize the degree of heterogeneity in terms of how far left and right the curves for individual subjects deviate from the average. This left and right shift is related to the ratio of the range to the slope on the log odds scale. In this example, the slope on the log odds scale is 0.02, so the shift for an extreme value could be as large as 3 / 0.02 = 150 units to the left or right, assuming the standard deviation is 1 If the slope on the log odds scale is even flatter (closer to 0), then the shift can be even more severe, as is shown in the graph below.

This has the same normal distribution, with a standard deviation of 1, but now the slope on the log odds scale is half as big. This leads to a flatter curve, and a random shift from this curve appears far more pronounced.