P.Mean >> Category >> Confidence intervals (created 2007-06-07).

A confidence interval provides a range of plausible values for an estimate that accounts for sampling error. Articles are arranged by date with the most recent entries at the top. Also see Clinical importance, Descriptive statistics, Hypothesis testing, and P-values. You can find outside resources at the bottom of this page.

Most of the new content will be added to my blog (blog.pmean.com).

Review all blog entries related to confidence intervals.


23. The Monthly Mean: Salvaging a negative confidence interval (July/August 2011)

22. The Monthly Mean: The rule of three (July 2010)

21. The Monthly Mean: A simple improvement to the binomial confidence interval. (February 2009)

20. P.Mean: Can I salvage my negative confidence interval (created 2011-09-02). I was involved in a small case-control project that was intended to explore some genotypes as predictors of disease progression. We had between 50 and 60 cases and controls (each). One particular predictor had a OR of 0.5 with 95% confidence limits of 0.2 and 1.2. We reported the negative reulsts, but a a long time a go, I did read some papers showing some different interpretations of confidence intervals. If I remember right, there was some statements like: it is less likely that a point estimate such as 0.5[0.2-1.2] be 1 then one as 0.8[0.5-1.2], considering the proportion of the CI that is distant from 1. Even now, it sounds weird to me. Can I say something about this in my paper?

19. P.Mean: Calculating a confidence interval for a standard deviation (created 2011-07-10). Suppose that you had a sample of 80 observations and you computed a standard deviation of those 80 observations. Like any other statistic, the standard deviation will have some sampling error associated with it. But how much sampling error. This is an important question, because I needed to establish that 80 was a reasonable sample size in a study where there was no formal research hypothesis. You do this by showing that key statistics that you will estimate have good precision. In this study, the standard deviation was especially important, so I wanted to show that a standard deviation based on 80 observations had a good level of precision.


18. P.Mean: Does a wide confidence interval mean that my conclusions are all wrong? (created 2011-03-24). Dear Professor Mean, My confidence intervals are very wide. I do not know how to explain this. Does this mean that my results are likely to be wrong?

17. P.Mean: Standard error for an odds ratio (created 2009-08-12). I submitted an article to a journal that included some odds ratios and their confidence intervals. The journal editor said that their policy was to report standard errors and not confidence intervals. How do I do this for an odds ratio?


16. P.Mean: How do you compute a continuity correction for a confidence interval? (created 2008-09-12). I helped author a page on Wikipedia about confidence intervals for a binomial proportion and a question arose on the discussion page about applying a continuity correction.

15. P.Mean: Where did you get that formula for the confidence interval? (created 2008-09-09). I sent someone a confidence interval for a single proportion, and they asked how I computed it. That's a fair question. It turns out that I used a classic formula that everyone learns (and then forgets) in their basic Statistics class.

Outside resources:

Exact confidence interval for Poisson count. Tomas Aragon and Travis Porco. Accessed on 2002-11-27. "These functions calculate the exact confidence intervals for an observed count with a Poisson distribution. By default, ci.poisson calculates the 95% confidence interval for the observed count x. This function calls two other functions which use the bisection method to calculate the exact confidence interval: lci.poisson and uci.poisson." www.medepi.org/epitools/rfunctions/cipois.html

G. Guyatt, R. Jaeschke, N. Heddle, et al. Basic statistics for clinicians: 2. Interpreting study results: confidence intervals. CMAJ. 1995;152(2):169-173. Excerpt: "In the second of four articles, the authors discuss the "estimation" approach to interpreting study results. Whereas, in hypothesis testing, study results lead the reader to reject or accept a null hypothesis, in estimation the reader can assess whether a result is strong or weak, definitive or not. A confidence interval, based on the observed result and the size of the sample, is calculated. It provides a range of probabilities within which the true probability would lie 95% or 90% of the time, depending on the precision desired. It also provides a way of determining whether the sample is large enough to make the trial definitive. If the lower boundary of a confidence interval is above the threshold considered clinically significant, then the trial is positive and definitive, if the lower boundary is somewhat below the threshold, the trial is positive, but studies with larger samples are needed. Similarly, if the upper boundary of a confidence interval is below the threshold considered significant, the trial is negative and definitive. However, a negative result with a confidence interval that crosses the threshold means that trials with larger samples are needed to make a definitive determination of clinical importance. " [Accessed September 9, 2010]. Available at: http://www.cmaj.ca/cgi/content/abstract/152/2/169.

Journal article: Karim Hirji, Morten Fagerland. Calculating unreported confidence intervals for paired data BMC Medical Research Methodology. 2011;11(1):66. Abstract: "BACKGROUND: Confidence intervals (or associated standard errors) facilitate assessment of the practical importance of the findings of a health study, and their incorporation into a meta-analysis. For paired design studies, these items are often not reported. Since the descriptive statistics for such studies are usually presented in the same way as for unpaired designs, direct computation of the standard error is not possible without additional information. METHODS: Elementary, well-known relationships between standard errors and p-values were used to develop computation schemes for paired mean difference, risk difference, risk ratio and odds ratio. RESULTS: Unreported confidence intervals for large sample paired binary and numeric data can be computed fairly accurately using simple methods provided the p-value is given. In the case of paired binary data, the design based 2x2 table can be reconstructed as well. CONCLUSIONS: Our results will facilitate appropriate interpretation of paired design studies, and their incorporation into meta-analyses." [Accessed on May 17, 2011]. Available at: http://www.biomedcentral.com/1471-2288/11/66

M Borenstein. The case for confidence intervals in controlled clinical trials. Control Clin Trials. 1994;15(5):411-428. Abstract: "A statistical wit once remarked that researchers often pose the wrong question and then proceed to answer that question incorrectly. The question that researchers intend to ask is whether or not a treatment effect is clinically significant. The question that is typically asked, however, is whether or not the treatment effect is statistically significant--a question that may be only marginally related to the issue of clinical impact. Similarly, the response, in the form of a p value, is typically assumed to reflect clinical significance but in fact reflects statistical significance. In an attempt to address this problem the medical literature over the past decade has been moving away from tests of significance and toward the use of confidence intervals. Concretely, study reports are moving away from "the difference was significant with a p value under 0.01" and toward "the one-year survival rate was increased by 20 percentage points with a 95% confidence interval of 15 to 24 percentage points." By focusing on what the effect is rather than on what the effect is not confidence intervals offer an appropriate framework for reporting the results of clinical trials. This paper offers a non-technical introduction to confidence intervals, shows how the confidence intervals framework offers advantages over hypothesis testing, and highlights some of the controversy that has developed around the application of this method. Additionally, we make the argument that studies which will be reported in terms of confidence intervals should similarly be planned with reference to confidence intervals. The sample size should be set to ensure that the estimates of effect size will be reported not only with adequate power but also with appropriate precision." [Accessed September 9, 2010]. Available at: http://www.ncbi.nlm.nih.gov/pubmed/8001360.

Gerard E. Dallal. Confidence Intervals. Excerpt: "Confidence Intervals are a way of taking data from a sample and saying something about the population from which the sample was drawn." [Accessed September 9, 2010]. Available at: http://www.jerrydallal.com/LHSP/ci.htm.

David A. Harville. Confidence Intervals and Sets for Linear Combinations of Fixed and Random Effects. Biometrics. 1976;32(2):403-407. Confidence regions are constructed for linear combinations of fixed effects and realized or sample values of random effects. These regions can be used in instances where the ratios of the variance components can be regarded as known. They have the prescribed long-run frequency of coverage when there is repeated sampling of the random effects as well as the residual effects. They have smaller expected volume than confidence regions obtained by proceeding as though the random effects are fixed effects. [Accessed September 9, 2010]. Available at: http://www.jstor.org/stable/2529507.

Gerard E. Dallal. Confidence Intervals Involving Logarithmically Transformed Data. Excerpt: "Researchers often transform data back to the original scale when a logarithmic transformation is applied to a set of data. Tables might include Geometric Means, which are the anti-logs of the mean of the logged data. When data are positively skewed, the geometric mean is invariably less than the arithmetic mean. This leads to questions of whether the geometric mean has any interpretation other than as the anti-log of the mean of the log transformed data." [Accessed September 9, 2010]. Available at: http://www.jerrydallal.com/LHSP/ci_logs.htm.

M J Gardner, D G Altman. Confidence intervals rather than P values: estimation rather than hypothesis testing.. Br Med J (Clin Res Ed). 1986;292(6522):746-750. Abstract: "Overemphasis on hypothesis testing--and the use of P values to dichotomise significant or non-significant results--has detracted from more useful approaches to interpreting study results, such as estimation and confidence intervals. In medical studies investigators are usually interested in determining the size of difference of a measured outcome between groups, rather than a simple indication of whether or not it is statistically significant. Confidence intervals present a range of values, on the basis of the sample data, in which the population value for such a difference may lie. Some methods of calculating confidence intervals for means and differences between means are given, with similar information for proportions. The paper also gives suggestions for graphical display. Confidence intervals, if appropriate to the type of study, should be used for major findings in both the main text of a paper and its abstract." Available at: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1339793/.

Rory Wolfe, James Hanley. If we're so different, why do we keep overlapping? When 1 plus 1 doesn't make 2. CMAJ. 2002;166(1):65-66. Excerpt: "In the last decade, guidelines for the presentation of statistical results in medical journals have emphasized confidence intervals (CIs) as an adjunct to, or even a replacement for, statistical tests and p values. Because of the intimate links between the 2 concepts, authors now use statements like "the 95% CI overlaps 0" where they would formerly have stated "the difference is not statistically significant at the 5% level." Although this interchangeability is technically correct in 1-sample situations, it does not carry over fully to comparisons involving 2 samples. A frequently encountered misconception is that if 2 independent 95% CIs overlap each other, as they do in Fig. 1, then a statistical test of the difference will not be statistically significant at the 5% level." [Accessed January 5, 2009]. Available at: http://www.cmaj.ca/cgi/content/full/166/1/65.

R G Newcombe. Improved confidence intervals for the difference between binomial proportions based on paired data. Stat Med. 1998;17(22):2635-2650. Existing methods for setting confidence intervals for the difference theta between binomial proportions based on paired data perform inadequately. The asymptotic method can produce limits outside the range of validity. The 'exact' conditional method can yield an interval which is effectively only one-sided. Both these methods also have poor coverage properties. Better methods are described, based on the profile likelihood obtained by conditionally maximizing the proportion of discordant pairs. A refinement (methods 5 and 6) which aligns 1-alpha with an aggregate of tail areas produces appropriate coverage properties. A computationally simpler method based on the score interval for the single proportion also performs well (method 10). [Accessed September 9, 2010]. Available at: http://citeseerx.ist.psu.edu/viewdoc/download?doi=

T Shakespeare, V Gebski, M Veness, J Simes. Improving interpretation of clinical studies by use of confidence levels, clinical significance curves, and risk-benefit contours. The Lancet. 2001;357(9265):1349-1353. Abstract: "The process of interpreting the results of clinical studies and translating them into clinical practice is being debated. Here we examine the role of p values and confidence intervals in clinical decision-making, and draw attention to confusion in their interpretation. To improve result reporting, we propose the use of confidence levels and plotting of clinical significance curves and risk-benefit contours. These curves and contours provide degrees of probability of both the potential benefit of treatment and the detriment due to toxicity. Additionally, they provide clinicians with a mechanism of translating the results of studies into treatment for individual patients, thus improving the clinical decision-making process. We illustrate the application of these curves and contours by reference to published studies. Confidence levels, clinical significance curves, and risk-benefit contours can be easily calculated with a hand calculator or standard statistical packages. We advocate their incorporation into the published results of clinical studies." [Accessed September 9, 2010]. Available at: http://www.thelancet.com/journals/lancet/article/PIIS0140-6736%2800%2904522-0/abstract.

D A Savitz. Is statistical significance testing useful in interpreting data?. Reprod. Toxicol. 1993;7(2):95-100. Abstract: "Although P values and statistical significance testing have become entrenched in the practice of biomedical research, their usefulness and drawbacks should be reconsidered, particularly in observational epidemiology. The central role for the null hypothesis, assuming an infinite number of replications, and the dichotomization of results as positive or negative are argued to be detrimental to the proper design and evaluation of research. As an alternative, confidence intervals for estimated parameters convey some information about random variation without several of these limitations. Elimination of statistical significance testing as a decision rule would encourage those who present and evaluate research to more comprehensively consider the methodologic features that may yield inaccurate results and shift the focus from the potential influence of random error to a broader consideration of possible reasons for erroneous results." [Accessed September 9, 2010]. Available at: http://www.ncbi.nlm.nih.gov/pubmed/8499671.

Joseph Beyene, Rahim Moineddin. Methods for confidence interval estimation of a ratio parameter with application to location quotients. BMC Medical Research Methodology. 2005;5(1):32. Abstract: "BACKGROUND: The location quotient (LQ) ratio, a measure designed to quantify and benchmark the degree of relative concentration of an activity in the analysis of area localization, has received considerable attention in the geographic and economics literature. This index can also naturally be applied in the context of population health to quantify and compare health outcomes across spatial domains. However, one commonly observed limitation of LQ is its widespread use as only a point estimate without an accompanying confidence interval. METHODS: In this paper we present statistical methods that can be used to construct confidence intervals for location quotients. The delta and Fieller's methods are generic approaches for a ratio parameter and the generalized linear modelling framework is a useful re-parameterization particularly helpful for generating profile-likelihood based confidence intervals for the location quotient. A simulation experiment is carried out to assess the performance of each of the analytic approaches and a health utilization data set is used for illustration. RESULTS: Both the simulation results as well as the findings from the empirical data show that the different analytical methods produce very similar confidence limits for location quotients. When incidence of outcome is not rare and sample sizes are large, the confidence limits are almost indistinguishable. The confidence limits from the generalized linear model approach might be preferable in small sample situations. CONCLUSION: LQ is a useful measure which allows quantification and comparison of health and other outcomes across defined geographical regions. It is a very simple index to compute and has a straightforward interpretation. Reporting this estimate with appropriate confidence limits using methods presented in this paper will make the measure particularly attractive for policy and decision makers." [Accessed September 9, 2010]. Available at: http://www.biomedcentral.com/1471-2288/5/32.

C C Hsieh. Note on interval estimation of the difference between proportions from correlated series. Stat Med. 1985;4(1):23-27. Abstract: "This paper presents a procedure for estimating the confidence interval of the difference between proportions in paired observations. As an extension of McNemar's test, this large sample interval estimation procedure uses a variance estimator obtained at the limit and is not conditional on the number of discordant pairs." [Accessed September 9, 2010]. Available at: http://www.ncbi.nlm.nih.gov/pubmed/3992071.

Nathaniel Schenker, Jane F Gentleman. On Judging the Significance of Differences by Examining the Overlap Between Confidence Intervals. The American Statistician. 2001;55(3):182-186. Abstract: "To judge whether the difference between two point estimates is statistically significant, data analysts often examine the overlap between the two associated confidence intervals. We compare this technique to the standard method of testing significance under the common assumptions of consistency, asymptotic normality, and asymptotic independence of the estimates. Rejection of the null hypothesis by the method of examining overlap implies rejection by the standard method, whereas failure to reject by the method of examining overlap does not imply failure to reject by the standard method. As a consequence, the method of examining overlap is more conservative (i.e., rejects the null hypothesis less often) than the standard method when the null hypothesis is true, and it mistakenly fails to reject the null hypothesis more frequently than does the standard method when the null hypothesis is false. Although the method of examining overlap is simple and especially convenient when lists or graphs of confidence intervals have been presented, we conclude that it should not be used for formal significance testing unless the data analyst is aware of its deficiencies and unless the information needed to carry out a more appropriate procedure is unavailable." [Accessed September 9, 2010]. Available at: http://dionysus.psych.wisc.edu/Lit/ToFile/Problems/Can%27t%20find/SchenkerN2001a.pdf.

Daniel R. Jeske, David A. Harvallie. Prediction-interval procedures and (fixed-effects) confidence-interval procedures for mixed linear models. Communications in Statistics - Theory and Methods. 1988;17(4):1053-87. Abstract: "A general approach is presented for devising an approximate 100(1-agr)% prediction interval for an unobservable random variable w based on the value of an observable random vector y. It is assumed theat E(w) and E(y) are ;linear combinations of unknown parameters β1,β2,�,βpand that the joint distribution of w-E(w) and yE(y) is symmetric and known up to the value of a vector θ of unknown parameters, as would be the case if y followed a mixed linear model (with normally distributed random effects and errors) and w were a linear combination of the model's fixed and random effects. Various implementations of the proposed approach are illustrated (and comparisons among them made) in the context of the Behrens-Fisher problem and the problem of making inferences about a group mean under a balanced one-way random-effects model." [Accessed September 9, 2010]. Available at: http://www.informaworld.com/smpp/content~db=all~content=a780056653.

R G Newcombe. Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med. 1998;17(8):857-872. Abstract: "Simple interval estimate methods for proportions exhibit poor coverage and can produce evidently inappropriate intervals. Criteria appropriate to the evaluation of various proposed methods include: closeness of the achieved coverage probability to its nominal value; whether intervals are located too close to or too distant from the middle of the scale; expected interval width; avoidance of aberrations such as limits outside [0,1] or zero width intervals; and ease of use, whether by tables, software or formulae. Seven methods for the single proportion are evaluated on 96,000 parameter space points. Intervals based on tail areas and the simpler score methods are recommended for use. In each case, methods are available that aim to align either the minimum or the mean coverage with the nominal 1 -alpha." [Accessed September 9, 2010]. Available at: http://psychology.stanford.edu/~jlm/pdfs/Newcombe98SingleProp.pdf.

Creative Commons License All of the material above this paragraph is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2017-06-15. The material below this paragraph links to my old website, StATS. Although I wrote all of the material listed below, my ex-employer, Children's Mercy Hospital, has claimed copyright ownership of this material. The brief excerpts shown here are included under the fair use provisions of U.S. Copyright laws.


14. Stats: Confidence interval for a rate (October 10, 2007). Dear Professor Mean, How do you calculate a confidence interval for a rate?

13. Stats: Is my confidence interval wide? (September 11, 2007). Dear Professor Mean, I have a case-control design. Among the cases, 271 were exposed and 317 were unexposed. Among the controls, 125 were exposed and 976 were unexposed. After adjustments for covariates, this produced an odds ratio of 7.4 with a 95% confidence interval of 5.7 to 9.5. Is this a wide interval?


12. Stats: Is a 10% shortfall in sample size critical? (October 23, 2006). Dear Professor Mean, I'm reviewing a paper where they did a power calculation based on 60 patients per group, but in the research study, they ended up only getting 55/58 per group. Since their sample size was much less than what they originally planned for, does this mean that the study had inadequate power?

11. Stats: Is my confidence interval too wide? (September 21, 2006). Dear Professor Mean, Is there a rule of the thumb to judge if a 95% CI is wide or narrow?

10. Stats: An exact confidence interval for a binomial proportion (August 18, 2006). A researcher came into my office this morning with some data that was strongly negative. Out of 15 patients, none showed a detectable improvement after the use of a controversial treatment. That sounds like a strong negative result to me, but a reviewer asked a legitimate question: How do you know that you are not having problems with a Type II error?


9. Stats: Confidence interval for a correlation coefficient (July 11, 2005). In many exploratory research studies, the goal is to examine associations among multiple demographic variables and some outcome variables. How can you justify the sample size for such an exploratory study? There are several approaches, but one simple way that I often use is to show that any correlation coefficients estimated by this research study will have reasonable precision. It may not be the most rigorous way to select a sample size, but it is convenient and easy to understand.

8. Stats: Examples of confidence intervals (June 28, 2005). The following abstracts, all from open source journals, provide good teaching examples of how confidence intervals are used in research publications.

7. Stats: Confidence intervals around a safety level (May 11, 2005). Someone asked me about an environmental clean up. The government told them that the location was cleaned up to a 90% confidence level of the standard. Would this give the residents an assurance that everything was safe? I don't have the background to answer all of this question, but can comment on the Statistical aspects.

6. Stats: Where is the confidence interval? (March 31, 2005). A recent letter to the editor, Child Psychopharmacology, Effect Sizes, and the Big Bang. Mathews M, Adetunji B, Mathews J, Basil B, George V, Mathews M, Budur K, Abraham S. Am J Psychiatry 2005: 162(4); 818. complains about an article claiming that a drug, citalopram, can reduce depressive symptoms A randomized, placebo-controlled trial of citalopram for the treatment of major depression in children and adolescents. Wagner KD, Robb AS, Findling RL, Jin J, Gutierrez MM, Heydorn WE. Am J Psychiatry 2004: 161(6); 1079-83. The letter writers dispute (among other things) the claim of a statistically and clinically significant reduction.


5. Stats: Confidence intervals (November 29, 2004). Dear Professor Mean:  Can you give me a simple explanation of what a confidence interval is?

4. Stats: Rates versus proportions (September 15, 2004). Many people use the words "rates" and "proportions" interchangeably, but there is an important distinction that I draw. A proportion represents a situation where the numerator and denominator both represent counts, and the numerator is a subset of the denominator. Rates represent a situation where the numerator is a count, but the denominator is in different units (such as the number of patient years of risk) or where the numerator is not a subset of the denominator (such as number of automobiles in a town divided by the number of adults living in that town).

3. Stats: Confidence intervals for proportions (July 8, 2004). One of the fellows at the hospital asked me about confidence intervals for proportions. I wrote a couple of simple spreadsheets to do these calculations. It's important to avoid comparing two separate confidence intervals to see if they overlap.


2. Stats: Why 95% confidence limits (May 6, 2002). Dear Professor Mean:, I've been working with small data sets for some neuroimaging research that have five (5) treatment and five (5) control participants. It is not unusual to have such small samples in this kind of work. My 95% confidence interval (CI) included zero; yet, the 85% confidence interval did not include zero. I know that the 95% CI is the common one, but I also know that others are used, but I don't know when to use them. Therefore, I'd like to know why we use 95% confidence limits all the time? When is it appropriate to use other CIs and the logic behind making such decisions?


1. Stats: Asymmetric confidence intervals (September 3, 1999). Dear Professor Mean, I found a journal article with a confidence interval that was asymmetric. For example, the authors reported a mortality difference of 5% and a 95% confidence interval of -1.2% to 12%. I can't understand how the CI can be unequally distributed if it uses the form ESTIMATE +/- 1.96*STANDARD ERROR.


What now?

Browse other categories at this site

Browse through the most recent entries

Get help

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2017-06-15.