P.Mean >> Category >> Logistic regression (created 2007-06-26).

The logistic regression model provides a framework for quantitative predictions of an outcome variable that is categorical, using one or more predictor variables. Articles are arranged by date with the most recent entries at the top. You can find outside resources at the bottom of this page.


18. P.Mean: Calculating predicted probabilities from a logistic regression model (created 2013-07-31). Suppose you run a logistic regression model and want to take the coefficients from that model and do something useful with them. In particular, you want to see what your logistic regression model might predict for the probability of your outcome at various levels of your independent variable. Here's how you would do it.


17. What is Fisher's Exact Test? (December 2011)

16. P.Mean: Should I report the univariate or the multivariate logistic regression analysis? (created 2011-09-07). Dear Professor Mean, I have results from four univariate logistic regression models and one multivariate logistic regression model with all four variables. In my univariate analysis, all the variables are significant. But, in multivariate analysis, only x1 is significant. Which results should I report?

15. P.Mean: What are the assumptions of logistic regression (created 2011-09-01). Does anyone have a good reference for the assumptions of binary logistic regression? I have a client who has an anonymous reviewer who says his analysis doesn't meet one of the assumptions, but it doesn't make any sense in this situation, and I think the reviewer doesn't understand something.

14. The Monthly Mean: Why McNemar? Why not Chi-square or Fisher's exact? (May/June 2011)


13. P.Mean: Can an outcome with three levels be used in logistic regression (created 2008-09-18). I had a quick question about logistic regression. Is this the appropriate test to use when your outcome variable has 3 levels? For example, we are looking at factors associated with obesity in children. Our outcome variable is BMI percentage and is classified as either normal, at risk, or overweight. I ran logistic regression on SAS and then realized this may not be the right test to run.

12. P.Mean: Another inquiry about slash and burn models (created 2008-08-20). In a binary logistic regression model, do all variables including the constant need to be significant before you can include them in the model or is it just the constant that has to be significant?

Outside resources:

The Analysis of Binary Data Description: Cox and Snell's book is a nice introduction to two by two tables with some advanced topics like overdispersion. This book is for students who want more mathematical details.

Applied Logistic Regression Description: Hosmer and Lemeshow's book is the resource that everyone turns to when they need information about logistic regression. This book is for students who want more mathematical details.

Chris Corcoran, Louise Ryan, Pralay Senchaudhuri, et al. An Exact Trend Test for Correlated Binary Data. Biometrics. 2001;57(3):941-948. Abstract: "The problem of testing a dose-response relationship in the presence of exchangeably correlated binary data has been addressed using a variety of models. Most commonly used approaches are derived from likelihood or generalized estimating equations and rely on large-sample theory to justify their inferences. However, while earlier work has determined that these methods may perform poorly for small or sparse samples, there are few alternatives available to those faced with such data. We propose an exact trend test for exchangeably correlated binary data when groups of correlated observations are ordered. This exact approach is based on an exponential model derived by Molenberghs and Ryan (1999) and Ryan and Molenberghs (1999) and provides natural analogues to Fisher's exact test and the binomial trend test when the data are correlated. We use a graphical method with which one can efficiently compute the exact tail distribution and apply the test to two examples." [Accessed July 16, 2010]. Available at: http://dx.doi.org/10.1111/j.0006-341X.2001.00941.x.

Generalized Linear Models

Logistic Regression: A Self-Learning Text

Logistic Regression in Rare Events Data. Gary King, Langche Zeng. Excerpt: Rare events are binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros ("nonevents"). In many literatures, rare events have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of non-events (peace). This enables scholars to save as much as 99% of their (non-fixed) data collection costs, or to collect much more meaningful explanatory variables. This website was last verified on 2003-02-17. URL: www.gking.harvard.edu/preprints.shtml#0s

Modeling Frequency and Count Data

Regression models for prognostic prediction: advantages, problems, and suggested solutions. Harrell, Frank E Jr., Kerry L Lee, David B Matchar, Thomas A Reichert. Cancer Treatment Reports 1985: 69(10); 1071-77. [Medline]. Description: When you have many variables relative to the number of events in a logistic regression model, the traditional approach fares poorly. This article shows some alternative approaches.

A simulation study of the number of events per variable in logistic regression analysis. P. Peduzzi, J. Concato, E. Kemper, T. R. Holford, A. R. Feinstein. J Clin Epidemiol 1996: 49(12); 1373-9. Description: This article provides justification for the rule of thumb that you need 10 events per independent variable.

Statistical Methods for Rates and Proportions

Ian Campbell. Two-by-two Methods. Excerpt: "This page expands on the methods section published in the paper: Campbell Ian, 2007, Chi-squared and Fisher-Irwin tests of two-by-two tables with small sample recommendations, Statistics in Medicine, 26, 3661 - 3675." [Accessed June 14, 2010]. Available at: http://www.iancampbell.co.uk/twobytwo/methods.htm.

Creative Commons License All of the material above this paragraph is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2017-06-15. The material below this paragraph links to my old website, StATS. Although I wrote all of the material listed below, my ex-employer, Children's Mercy Hospital, has claimed copyright ownership of this material. The brief excerpts shown here are included under the fair use provisions of U.S. Copyright laws.


11. Stats: Testing for an increasing trend in a proportion (November 26, 2007). Someone asked me how to see if a sequence of four proportions is showing a significant increase over time. The data represents the proportion of imaging studies that are requested by a primary care physician (pcp), as opposed to studies ordered by a specialist.

10. Stats: Interpretation of an odds ratio (March 21, 2007). Someone sent me some data on crime. In a sample of 2,957,239 people, 961 were criminals. 41 of the criminals were in the first group (who numbered 20,109). The remaining 920 were in the larger group (2,937,130). This person computed an odds ratio of 6.5 and wondered what it meant.

9. Stats: Differences between the Chi-square test, Fisher's Exact test, and logistic regression (January 9, 2007). I received an email from India (isn't the Internet wonderful?) that asked me to comment on the differences between a Chi-square test, Fisher's Exact test, and logistic regression. Let's take each of these in sequence.


8. Stats: Checking a Chi-square test (February 13, 2006). Someone preparing a critique of a research article wanted to check the accuracy of the statistics in that article. They noted that in a group of 37 patients without the intervention, only one was successful in avoiding a certain type of risky behavior. In a group with counseling, 7 out of 44 avoided the risky behavior.


7. Stats: Continuous variables in a logistic regression model (February 9, 2005). I got a question by email that asked, in a rather indirect way, how to interpret the odds ratio estimate for a continuous variable in a logistic regression model. It turns out that the odds ratio represents a change in the estimated odds of the outcome when the continuous variable increases by one unit.


6. Stats: Categorical variables in a logistic regression model (June 1, 2004). On April 8, I had written a brief description of interactions in a logistic regression model. This was a supplement to a discussion of the concepts behind the logistic regression model. Another important topic in that series of explanations is the interpretation of logistic regression coefficients for categorical variables.

5. Stats: Interactions in logistic regression (April 8, 2004). Someone asked me how to compute interactions in binary logistic regression. You need to be careful, since interactions are tricky to interpret.


4. Stats: The concepts behind the logistic regression model (July 23, 2002). The logistic regression model is a model that uses a binary (two possible values) outcome variable. Examples of a binary variable are mortality (live/dead), and morbidity (healthy/diseased). Sometimes you might take a continuous outcome and convert it into a binary outcome. For example, you might be interested in the length of stay in the hospital for mothers during an unremarkable delivery. A binary outcome might compare mothers who were discharged within 48 hours versus mothers discharged more than 48 hours.

3. Stats: SPSS dialog boxes for logistic regression (July 22, 2002). This handout shows some of the dialog boxes that you are likely to encounter if you use logistic regression models in SPSS.


2. Stats: Fisher's Exact Test (August 23, 2000). Dear Professor Mean: What is Fisher's Exact Test and when should I use it?


1. Stats: Guidelines for logistic regression models (September 27, 1999). There are three steps in a typical logistic regression model: 1. Fit a crude model; 2. Fit an adjusted model; 3. Examine the predicted probabilities.

Theme and closely related categories:

What now?

Browse other categories at this site

Browse through the most recent entries

Get help

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2017-06-15.