Interpreting p-values in a published abstract, part 1 (created 2010-04-14).

In one of my recent webinars, I asked people to read the following abstract and interpret the p-values presented within.

P = patients in a community hospital ICU is a reasonable choice. The intervention (actually an exposure) is I = experiencing an extubation failure. The comparison group is C = patients in a community hospital ICU who experienced a successful extubation, but actually there is comparison within this group. So P = patients experiencing an extubation failure and C = patients with successful extubation. There are several outcomes:

• total ICU length of stay after the initial extubation,
• total hospital length of stay after the initial extubation,
• ICU mortality,
• hospital mortality, and
• total hospital cost.

What do the p-values tell us?

The first set of p-values, which are only described as being large (P > 0.05) show that patient groups were similar with respect to several important demographic variables. These are not outcome measures, but rather measures intended to see if the patient group and the comparison groups deviated markedly on an important covariate. Thankfully, the answer to this is no.

The next two p-values are only described as being small (P < 0.01). These p-values are associated with ICU and hospital length of stay. This tells you that there is strong evidence against the null hypothesis that the average values are the same between extubation failures and successful extubations. By looking at the summary statistics provided we can conclude that there is evidence that extubation failures have longer lengths of stay. There is also support for higher ICU mortality among extubation failures (P < 0.05).

But there is insufficient evidence to support a higher hospital mortality rate. The odds ratio is 2.1, which implies an increased risk, but there is a lot of sampling error in this estimate, so the result could be due to sampling error. Note that this paper uses ambiguous mathematical notation throughout but here it is especially confusing. The notation of P < 0.15 could, in theory represent a small p-value (0.02 is certainly less than 0.15). But having read enough of these abstracts, I know that P < 0.15 really means something like 0.10 < P < 0.15. It would be much better to present an exact p-value.

Total hospital costs has a small p-value (P < 0.01). Looking at the summary statistics, it is obvious that there is support for the hypothesis that total hospital costs are significantly higher for extubation failure patients.

Now none of these results should be too surprising. If you can't get the tube out, the patient has to stay longer, pay more money, and (at least in the ICU) suffer a greater mortality risk. The value of this paper is not in proving the obvious, but rather in quantifying it. The true value of this paper is in the confidence intervals that provide a solid measure of the true costs, both financial and physical, that a patient has to endure if he/she is an extubation failure.

So how would you interpret the confidence intervals? This was also a question I posed during the webinar.

The abstract does not provide a confidence interval for the ICU or hospital length of stay. Shame on them!

The confidence interval for the odds ratio of 12.1, associated with the risk of ICU mortality goes from 1.5 to 101. Wow, that is a very wide interval. It does exclude 1, so there is a statistically significant increase in the risk of mortality in the ICU for patients with extubation failures. We can say further that we are confident that the increase in risk is at least 50%, even after allowing for sampling error.

The confidence interval for the odds ratio of 2.1, associated with the risk of hospital mortality goes from 0.8 to 5.4. This interval includes the value of 1, so there is no statistically significant difference in hospital mortality between the two groups. The risk of hospital mortality, however, could plausibly be as large as a five fold increase in risk. I'd complain richly about such a wide interval in other contexts, but here I'm willing to give a pass. Mortality is not a primary endpoint, so the study was not designed to estimate mortality rates precisely.

As an aside, I should mention that in most research settings, mortality is notoriously difficult to estimate precisely, because you need a large number of events (deaths) in order to have reasonable power and precision. A rough rule of thumb is that you should try to get 25 to 50 events in each group. You have to read the full paper, but the total number of hospital deaths are 6 and 17 in the successful extubation and extubation failure groups, respectively. It looks like you'd need about a three or fourfold increase in sample size (3*17 is approximately 50, 4*6 is approximately 25) to be able to handle an event like hospital mortality successfully.

Finally, the increase in total hospital costs \$34,000 (I'm rounding here) has an associated confidence interval of \$23,000 to \$45,000. Since this interval excludes the value of zero, you can conclude that there is a statistically significant increase in costs for extubation failures. Even after allowing for sampling error, the increase in costs is at least \$23,000 and possibly as much as \$45,000.