P.Mean: "P-values, confidence intervals, and the Bayesian alternative (presented 2009-10-14)

P.Mean >> Statistics webinar >> What do all these numbers mean? Confidence intervals.

Abstract: P-values and confidence intervals are the fundamental tools used in most inferential data analyses. They are possibly the most commonly reported statistics in the medical literature. Unfortunately, both p-values and confidence intervals are subject to frequent misinterpretations. In this two hour webinar, you will learn the proper interpretation of p-values and confidence intervals, and the common abuses and misconceptions about these statistics. You will also see a simple application of Bayesian analysis which provides an alternative to p-values and confidence intervals.

In this seminar, you will learn how to:

distinguish between statistical significance and clinical significance;

explain the ethical issues associated with inadequate sample sizes.

Here's the outline of this talk

Icebreaker

Pop quiz

Definitions

What is a confidence interval?

Practice exercises

Repeat of pop quiz

This talk is based largely on a training class that I offered at Children's Mercy Hospital, Stats #22 : What Do All These Numbers Mean? Confidence Intervals and P-Values,

www.childrensmercy.org/stats/training/hand22.asp,

but also includes material from Chapter 6 of my book, Statistical Evidence in Medical Trials,

www.pmean.com/evidence.html,

and material from the lead article of the July/August issue of The Monthly Mean,

www.pmean.com/news/2009-08.html.

Icebreaker

In your job, you may have had to calculate a statistic of one sort or another. It might have been a simple statistic like a mean or a percentage, it might have been more complicated, like a correlation coefficient, or it might have been something very difficult, like a Poisson regression model with an overdispersion parameter. Tell us about the most complex statistic that you have ever computed, either by hand or using a computer. Include only those statistics that you have calculated outside of your university training. Don't include any statistic that someone else calculated for you. Here's a list sorted (more or less) by the complexity of the statistic.

percentage

mean

standard deviation

t-test

correlation coefficient

linear regression model

survival curve

logistic regression model

other

By the way, you won't need knowledge or familiarity with any of the above statistics to follow the content of this presentation. I am just trying to gauge the experience level of my audience.

Pop quiz

A research paper computes a confidence interval for a relative risk of 0.82 to 3.94. What does this confidence interval tell you.

The result is statistically significant and clinically important.

The result is not statistically significant, but is clinically important.

The result is statistically significant, but not clinically important.

The result is not statistically significant, and not clinically important.

The result is ambiguous.

I do not know the answer.

Definitions

What is a population? A population is a collection of items of interest in research. The population represents a group that you wish to generalize your research to. Populations are often defined in terms of demography, geography, occupation, time, care requirements, diagnosis, or some combination of the above. Contrast this with a definition of a sample. An example of a population would be all infants born in the state of Missouri during the 1995 calendar year who have one or more visits to the Emergency room during their first year of life.

What is a sample? A sample is a subset of a population. A random sample is a subset where every item in the population has the same probability of being in the sample. Usually, the size of the sample is much less than the size of the population. The primary goal of much research is to use information collected from a sample to try to characterize a certain population. As such, you should pay a lot of attention to how representative the sample is of the population. If there are problems, with representativeness, consider redefining your population a bit more narrowly. For example, a sample of 85 smokers between the ages of 13 and 18 in Rochester, Minnesota who respond to an advertisement about participation in a smoking cessation program might not be considered representative of the population of all teenage smokers, because the participants selected themselves. The sample might be more representative if we restrict our population to those teenage smokers who want to quit.

What is a Type I Error? In your research, you specify a null hypothesis (typically labeled H0) and an alternative hypothesis (typically labeled Ha, or sometimes H1). By tradition, the null hypothesis corresponds to no change. When you are using Statistics to decide between these two hypothesis, you have to allow for the possibility of error. Actually, if you are using any other procedure, you should still allow for the possibility of error, but we statisticians are the only ones honest enough to admit this. A Type I error is rejecting the null hypothesis when the null hypothesis is true. Example: Consider a new drug that we will put on the market if we can show that it is better than a placebo. In this context, H0 would represent the hypothesis that the average improvement (or perhaps the probability of improvement) among all patients taking the new drug is equal to the average improvement (probability of improvement) among all patients taking the placebo. A Type I error would be allowing an ineffective drug onto the market.

What is a Type II Error? A Type II error is accepting the null hypothesis when the null hypothesis is false. You should always remember that it is impossible to prove a negative. Some statisticians will emphasize this fact by using the phrase "fail to reject the null hypothesis" in place of "accept the null hypothesis." The former phrase always strikes me as semantic overkill. Many studies have small sample sizes that make it difficult to reject the null hypothesis, even when there is a big change in the data. In these situations, a Type II error might be a possible explanation for the negative study results. Example: Consider a new drug that we will put on the market if we can show that it is better than a placebo. In this context, H0 would represent the hypothesis that the average improvement (or perhaps the probability of improvement) among all patients taking the new drug is equal to the average improvement (probability of improvement) among all patients taking the placebo. A Type II error would be keeping an effective drug off the market.

What is a confidence interval?

Dear Professor Mean: Can you give me a simple explanation of what a confidence interval is?

We statisticians have a habit of hedging our bets. We always insert qualifiers into our reports, warn about all sorts of assumptions, and never admit to anything more extreme than probable. There's a famous saying: "Statistics means never having to say you're certain."

We qualify our statements, of course, because we are always dealing with imperfect information. In particular, we are often asked to make statements about a population (a large group of subjects) using information from a sample (a small, but carefully selected subset of this population). No matter how carefully this sample is selected to be a fair and unbiased representation of the population, relying on information from a sample will always lead to some level of uncertainty.

Short Explanation

A confidence interval is a range of values that tries to quantify this uncertainty. Consider it as a range of plausible values. A narrow confidence interval implies high precision; we can specify plausible values to within a tiny range. A wide interval implies poor precision; we can only specify plausible values to a broad and uninformative range.

Consider a recent study of homoeopathic treatment of pain and swelling after oral surgery (Lokken 1995). When examining swelling 3 days after the operation, they showed that homoeopathy led to 1 mm less swelling on average. The 95% confidence interval, however, ranged from -5.5 to 7.5 mm. From what little I know about oral surgery, this appears to be a very wide interval. This interval implies that neither a large improvement due to homoeopathy nor a large decrement could be ruled out.

Generally when a confidence interval is very wide like this one, it is an indication of an inadequate sample size, an issue that the authors mention in the discussion section of this paper.

How to Interpret a Confidence Interval

When you see a confidence interval in a published medical report, you should look for two things. First, does the interval contain a value that implies no change or no effect? For example, with a confidence interval for a difference look to see whether that interval includes zero. With a confidence interval for a ratio, look to see whether that interval contains one.

Here's an example of a confidence interval that contains the null value. The interval shown below implies no statistically significant change.

Here's an example of a confidence interval that excludes the null value. If we assume that larger implies better, then the interval shown below would imply a statistically significant improvement.

Here's a different example of a confidence interval that excludes the null value. The interval shown below implies a statistically significant decline.

Practical Significance

You should also see whether the confidence interval lies partly or entirely within a range of clinical indifference. Clinical indifference represents values of such a trivial size that you would not want to change your current practice. For example, you would not recommend a special diet that showed a one year weight loss of only five pounds. You would not order a diagnostic test that had a predictive value of less than 50%.

Clinical indifference is a medical judgment, and not a statistical judgment. It depends on your knowledge of the range of possible treatments, their costs, and their side effects. As statistician, I can only speculate on what a range of clinical indifference is. I do want to emphasize, however, that if a confidence interval is contained entirely within your range of clinical indifference, then you have clear and convincing evidence to keep doing things the same way (see below).

One the other hand, if part of the confidence interval lies outside the range of clinical indifference, then you should consider the possibility that the sample size is too small (see below).

Some studies have sample sizes that are so large that even trivial differences are declared statistically significant. If your confidence interval excludes the null value but still lies entirely within the range of clinical indifference, then you have a result with statistical significance, but no practical significance (see below).

Finally, if your confidence interval excludes the null value and lies outside the range of clinical indifference, then you have both statistical and practical significance (see below).

Practice exercises

Read the abstracts presented above. Interpret the confidence intervals presented in these abstracts.

Repeat of pop quiz

Review the pop quiz presented earler. Do you feel more confident in your answers?

What now?

Go to the main page of the P.Mean website

Get help

This work is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2017-06-15.