Can you compute a confidence interval for your p-value? (created 2010-09-10).
This page is moving to a new website.
A question that comes up from time to time is whether you can calculate a confidence interval for a p-value. It always get statisticians into a tizzy because it seems to be such a logical thing to do, but no one does it. Here's how I like to think about the issue.
Any number computed from a sample has sampling error associated with it. A mean has sampling error, a standard deviation has sampling error, a correlation coefficient has sampling error, and so on. A p-value is computed from a sample so it has sampling error also. You could, if you wanted to, calculate a confidence interval for the p-value that accounted for this sampling error. I am unaware of the exact formula, but it can be done.
Now the question becomes, why would you do it? A naive approach would be to reject the null hypothesis if the entire confidence interval was below your alpha level. But this approach is conservative, it does not preserve the Type I error rate. Comparing the p-value without its confidence interval to alpha will preserve your Type I error rate. So producing a confidence interval would just encourage an unhelpful comparison.
Sometimes a p-value is computed using simulation. For example, with moderately large samples, a permutation test may require too much computational effort to list all possible permutations. In this setting, you could use the Monte Carlo technique to generate a random subset of all possible permutations. In this setting, a confidence interval for the p-value would tell you whether the size of the random subset was sufficiently large. A narrow interval would tell you that you did a fine job. A wide interval means that you were too stingy with the number of replications in your Monte Carlo.
Here are two references that someone else provided in response to this question.
H M Hung, R T O'Neill, P Bauer, K K�hne. The behavior of the P-value when the alternative hypothesis is true. Biometrics. 1997;53(1):11-22. Abstract: "The P-value is a random variable derived from the distribution of the test statistic used to analyze a data set and to test a null hypothesis. Under the null hypothesis, the P-value based on a continuous test statistic has a uniform distribution over the interval [0, 1], regardless of the sample size of the experiment. In contrast, the distribution of the P-value under the alternative hypothesis is a function of both sample size and the true value or range of true values of the tested parameter. The characteristics, such as mean and percentiles, of the P-value distribution can give valuable insight into how the P-value behaves for a variety of parameter values and sample sizes. Potential applications of the P-value distribution under the alternative hypothesis to the design, analysis, and interpretation of results of clinical trials are considered." [Accessed September 10, 2010]. Available at: http://www.ncbi.nlm.nih.gov/pubmed/9147587.
Rafe M. J. Donahue. A Note on Information Seldom Reported via the P Value. The American Statistician. 1999;53(4):303-306. Abstract: "The P value, or significance of a statistical test, is often reported when said value is small-that is, less than .05, and is used to reject the null hypothesis. What appears to be less well-known or perhaps less often used, certainly within the realm of clinical trials and perhaps in other areas as well, however, is that the P value also carries information about particular alternative hypotheses since the distribution of the P value under an alternative hypothesis can be found easily. By way of a simple example, this additional piece of information is used to make inference concerning the alternative hypothesis when the null hypothesis is not rejected. Suggestions for decision making and reporting are then given." [Accessed September 10, 2010]. Available at: http://www.jstor.org/stable/2686048.
Donahue, R. M. J. A note on information seldom reported via the p value. The American Statistician, American Statistical Association, 1999, 53, 303-306