P.Mean: How do you compute an adjusted probability? (created 2013-04-26).

News: Sign up for "The Monthly Mean," the newsletter that dares to call itself average, www.pmean.com/news.

I am helping out with a paper, and the author wanted to get an adjusted probability from a logistic regression model. What, exactly, is an adjusted probability, and how would you compute it?

An adjusted probability is a probability that has been modified to account for covariate imbalance. It is effectively the predicted probability where all of the covariates are fixed at a specific value, usually the overall mean. Here's an example.

In a study of breast feeding in pre-term infants, the intervention group did very well in breast feeding at discharge compared to the control group. Here are the results in SPSS.

Table of unadjsted probabilities

In the treatment group, 33 / 38 or about 87% of the babies were exclusively breast feeding at discharge. In the control group, 19 / 46 or about 41% of the babies were exclusively breast feeding at discharge. This result was statistically significant, with an odds ratio of 9.4 and 95% confidence interval of 3.1 to 28.4.

Simple logistic model for treatment effect

So, even after allowing for sampling error, there is at least a three fold increase in the odds of exclusive breast feeding at discharge in the treatment group compared to the control group.

But there's a problem here. The study was randomized, but even with randomization, the mothers in the treatment group were slightly older than the mothers in the control group.

Difference in mother's ages

There is about a four year gap in ages between the two groups. Could the success of the intervention be due in part or in whole to this difference in ages? Maybe, because mother's age is indeed related to breast feeding success.

Logistic model for mother's age

Each year of mother's age increases the odds of breast feeding by 17%. If you have an imbalanced covariate that is related to your outcome measure, you need to worry a bit. Fit a logistic model with both the treatment variable and mother's age.

Logistic regression model with both age and treatment

The odds ratio is a bit smaller (6.9) but it is still statistically significant with a 95% confidence interval from 2.2 to 22.1. Now, we should probably revisit the two probabilities mentioned earlier. The probability for the treatment group (87%) is unfairly too high and the probability for the control group (41%) is unfairly a bit too low.

What would be a fair estimate? A fair estimate would be one in which the age of the mothers in the two groups is identical. We can choose any age, but a logical one would be the overall average (27.33 years). The predicted log odds for the control group, holding the mother's age at 27.33 is

-3.631 + 0.128*27.33 = -0.133

Convert this to the odds scale by exponentiating

exp(-0.133) == 0.875

and then convert from odds to probability

0.875 / (1 + 0.875) = 0.467.

Notice that the adjusted probability is slightly higher than the unadjusted probability. This is what you'd predict the breast feeding probability to be if the control mothers were about two years older (27.33 instead of 25.35) on average.

The predicted log odds for the treatment group, holding the mother's age at 27.33 is

-3.631 + 0.128*27.33 + 1.936 = 1.803

Convert this to the odds scale by exponentiating

exp(1.803) == 6.068

and then convert from odds to probability

6.068 / (1 + 6.068) = 0.859

Notice that the adjusted probability is slightly lower in the treatment group. This is what'd you'd predict if the treatment mothers were younger (27.33 instead of 29.74). The probability of exclusive breast feeding at discharge is still quite different (86% versus 47%) even after adjusting for the age difference in the model. Notice that there is more adjustment in the control group than the treated group. That is due to the nonlinear nature of the logistic regression model. Probabilities around 86% are at the point where the S-shaped curve has mostly leveled off, so an adjustment has less effect here than near the middle of the S-shaped curve.

Creative Commons License This page was written by Steve Simon and is licensed under the Creative Commons Attribution 3.0 United States License. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Incomplete pages.