P.Mean >> Category >> Sample size justification (created 2007-08-09).

These pages provide formulas and advice for justifying the sample size in a research study. Some of these pages describe the pragmatic and ethical concerns about sample size. Also see Category: Hypothesis testing, Category: Post hoc power, Category: Small sample size issues. I also have a blog, and you might want to look at my blog entries with the sample size tag.


51. P.Mean: How sample size calculations are reported in the literature (created 2012-02-23). I am preparing a webinar on sample size calculations and wanted to examine some examples in the published literature. There were lots of interesting examples in an open source journal called Trials. I only included a few examples in my webinar, but I wanted to save the examples I found here in case I want to expand the talk.

50. P.Mean: Is sample size justification really different for animal studies compared to human studies? (created 2012-01-06). Dear Professor Mean, I've spent my entire career (so far) in developing statistical analysis plans for human subjects research. Recently, a neuroscientist who performs experiments on rats asked me to assist in a power analysis. My conversation with him reminded me of that YouTube video (Biostatistics vs Lab Research): "I think I only need 3 subjects..." In his case, he seemed fixated on needing only 6 rats per group---which is what he had always done in the past. Are the rules for sample size justification different for animal studies than for human studies?


49. The Monthly Mean: Unrealistic scenarios for sample size calculations (December 2011)

48. The Monthly Mean: I want to calculate power, but I don't have a standard deviation for the formula (March/April 2011). Someone was asking for assistance on calculating power. A research agency was willing to lend some of its data for a secondary data analysis on a large data set (1,314 observations), but it asked for anyone requesting this data to demonstrate that their hypothesis had adequate power before sharing their data. There were publications based on this data, but using different endpoints, so the person could not get the standard deviation needed for the formula for power.


47. P.Mean: Three things you need for a power calculation (created 2001-11-08, revised 2011-04-26). Dear Professor Mean, I want to do research. Is forty subjects enough, or do I need more? -- Eager Edward

46. P.Mean: Quick sample size calculations (created 2001-10-11, revised 2011-04-26). Dear Professor Mean, I'm reading a research paper. I suspect that the sample size is way too small. I don't like the findings of the study anyway, so I'm hoping that you will help me discredit this study. Is there a quick sample size calculation that I can use? -- Cynical Chris

45. P.Mean: Fighting the claim that any size difference is clinically important (created 2010-08-05). When working with people to select an appropriate sample size, it is important to establish the minimum clinically important difference (MCID). This is a difference such that any value smaller would be clinically trivial, but any value larger would be clinically important. I get told quite often that any difference that might be detected is important. I could be flippant here and then tell them that their sample size is now infinite and my consulting rate is proportional to the sample size, but I don't make flippant comments (out loud, at least). Here's how I might challenge such a claim.

44. P.Mean: The futility of small sample sizes for evaluating a binary outcome (created 2010-06-16). I'm helping out with a project that involves a non-randomized comparison of two groups of patients. One group gets a particular anesthetic drug and the other group does not. The researcher wants to compare rates of hypotension, respiratory depression, apnea, and hypoxia. I suggested using continuous outcomes like O2 saturation levels rather than discrete events like hypoxia, but for a variety of reasons, they cannot use continuous outcomes. Their original goal was to collect data on about 20 patients in each group.

43. P.Mean: An interesting alternative to power calculations (created 2010-06-09). Someone on the MedStats Internet discussion group mentioned an alternative to power calculations called accuracy in parameter estimation (AIPE). It looks interesting. Here are some relevant references.

42. P.Mean: Minimum sample size needed to a time series prediction (created 2010-06-08). Someone asked what the minimum sample size that was needed in a time series analysis model to forecast future observations. Strictly speaking, you can forecast with two observations. Draw a straight line connecting the two points and then extend that line as far as you want in the future. But you wouldn't want to do that. So a better question might be what is the minimum number of data points that you would need in order to provide a good forecast of the future.

41. P.Mean: Power calculations for comparison of Poisson counts across two groups (created 2010-01-11). Suppose you want to compare Poisson count variables across two groups. How much data would you need to collect? It's a tricky question and there are several approaches that you can consider.


40. The Monthly Mean: What size difference should you use in your power calculation? (December 2009)

39. P.Mean: Accounting for clusters in an individually randomized clinical trial (created 2009-10-13). I have a clinical trial with clusters (the clusters are medical practice), but unlike a cluster randomized trial, I am able to randomize within each cluster. From what I've read about this, I can provide an estimate for the Intraclass Correlation Coefficient (ICC) that will decrease my sample size. But I'm uncomfortable doing this. Can you help?

38. The Monthly Mean: Power for a three arm study (November 2009) and P.Mean: Power for a three arm experiment (created 2009-09-14). "I want to compute power for a three arm experiment. The outcome variable is binary (yes/no). I know how to compute power for a two-arm experiment already, but have no idea how to handle the third arm."

37. P.Mean: The first three steps in selecting an appropriate sample size (created 2009-07-20). I got an email last week from a client wanting to start a new research project looking at relationships between parenting beliefs and childhood behaviors. The description of the sorts of things to examine was quite elaborate, and it ended with the question "how many families would we need to have any significant differences if they exist?" Unfortunately, all the elaborate information provided did not include the information I would need to answer this question. Justifying a sample size usually involves three steps.


36. P.Mean: Example of power calculation for a repeated measures design (created 2008-10-19). I was asked how to calculate power for an interaction term in a repeated measures design. There were two groups (treatment and control), and subjects in each group were measured at four time points. The interaction involving the third time point was considered most critical.

35. P.Mean: Power calculations for repeated measures designs (created 2008-09-25). I�ve been struggling with a design/analysis question related to repeated measures design and power analysis. Can you help?

34. P.Mean: Source for sample size formula (created 2008-08-20). Hello, I am looking at your page on sample size calculation, and I'm curious as to where you got the equation shown there. I can't seem to find that exact form in Cohen's book, not does it appear anywhere else that I've looked. Would you happen to know its original source?

33. P.Mean: Where did that standard deviation come from? (created 2008-07-09). Someone wanted some help with a power calculation. I gave the standard spiel that you need three things: a research hypothesis, an estimate of the standard deviation of your outcome measure, and the minimum clinically important difference. This was for a study looking at 10 exposed patients (recent spider bites) and 30 control patients. I got an article back in email very quickly, and while it was interesting to read, it wasn't quite what I needed.

Outside resources:

Peter Bacchetti, Leslie E. Wolf, Mark R. Segal, Charles E. McCulloch. Bacchetti et al. Respond to "Ethics and Sample Size--Another View". Am. J. Epidemiol. 2005;161(2):113. Excerpt: "We thank Dr. Prentice (1) for taking the time to respond to our article (2). We explain here why we do not believe that he has provided a meaningful challenge to our argument. We see possible objections related to unappealing implications, use of power to measure value, implications for series of trials, how value per participant is calculated, and participants� altruistic satisfaction." [Accessed July 7, 2010]. Available at: http://aje.oxfordjournals.org.

John S. Uebersax. Bayesian Unconditional Power Analysis. Description: When you perform a traditional power calculation, you need to specify the size of the difference that you want to detect. Sometimes this represents the minimum difference that is clinically relevant and sometimes it is a difference that is observed in a previous research study. If the latter is chosen, you need to account for sampling error in the previously observed difference. Otherwise the estimated power is biased, often biased downward. This website was last verified on 2009-11-15. URL: http://www.john-uebersax.com/stat/bpower.htm.

David A. Schoenfeld. Considerations for a parallel trial where the outcome is a time to failure. Description: This web page calculates power for a survival analysis. You need to specify the accrual interval, the follow-up interval, the median time to failure in the group with the smallest time to failure. Thne also specify two of the following three items: power, total number of patients, and the minimal detectable hazard ratio. In an exponential model the last term is equivalent to the ratio of median survival times. [Accessed June 16, 2010]. Available at: http://hedwig.mgh.harvard.edu/sample_size/time_to_event/para_time.html.

Peter Bacchetti. Current sample size conventions: Flaws, harms, and alternatives. BMC Medicine. 2010;8(1):17. Abstract: "BACKGROUND: The belief remains widespread that medical research studies must have statistical power of at least 80% in order to be scientifically sound, and peer reviewers often question whether power is high enough. DISCUSSION: This requirement and the methods for meeting it have severe flaws. Notably, the true nature of how sample size influences a study's projected scientific or practical value precludes any meaningful blanket designation of <80% power as "inadequate". In addition, standard calculations are inherently unreliable, and focusing only on power neglects a completed study's most important results: estimates and confidence intervals. Current conventions harm the research process in many ways: promoting misinterpretation of completed studies, eroding scientific integrity, giving reviewers arbitrary power, inhibiting innovation, perverting ethical standards, wasting effort, and wasting money. Medical research would benefit from alternative approaches, including established value of information methods, simple choices based on cost or feasibility that have recently been justified, sensitivity analyses that examine a meaningful array of possible findings, and following previous analogous studies. To promote more rational approaches, research training should cover the issues presented here, peer reviewers should be extremely careful before raising issues of "inadequate" sample size, and reports of completed studies should not discuss power. SUMMARY: Common conventions and expectations concerning sample size are deeply flawed, cause serious harm to the research process, and should be replaced by more rational alternatives." [Accessed July 7, 2010]. Available at: http://www.biomedcentral.com/1741-7015/8/17.

Scott Aberegg, D Roxanne Richards, James O'Brien. Delta inflation: a bias in the design of randomized controlled trials in critical care medicine. Critical Care. 2010;14(2):R77. Abstract: "INTRODUCTION: Mortality is the most widely accepted outcome measure in randomized controlled trials of therapies for critically ill adults, but most of these trials fail to show a statistically significant mortality benefit. The reasons for this are unknown. METHODS: We searched five high impact journals (Annals of Internal Medicine, British Medical Journal, JAMA, The Lancet, New England Journal of Medicine) for randomized controlled trials comparing mortality of therapies for critically ill adults over a ten year period. We abstracted data on the statistical design and results of these trials to compare the predicted delta (delta; the effect size of the therapy compared to control expressed as an absolute mortality reduction) to the observed delta to determine if there is a systematic overestimation of predicted delta that might explain the high prevalence of negative results in these trials. RESULTS: We found 38 trials meeting our inclusion criteria. Only 5/38 (13.2%) of the trials provided justification for the predicted delta. The mean predicted delta among the 38 trials was 10.1% and the mean observed delta was 1.4% (P<0.0001), resulting in a delta-gap of 8.7%. In only 2/38 (5.3%) of the trials did the observed delta exceed the predicted delta and only 7/38 (18.4%) of the trials demonstrated statistically significant results in the hypothesized direction; these trials had smaller delta-gaps than the remainder of the trials (delta-gap 0.9% versus 10.5%; P<0.0001). For trials showing non-significant trends toward benefit greater than 3%, large increases in sample size (380% - 1100%) would be required if repeat trials use the observed delta from the index trial as the predicted delta for a follow-up study. CONCLUSIONS: Investigators of therapies for critical illness systematically overestimate treatment effect size (delta) during the design of randomized controlled trials. This bias, which we refer to as "delta inflation", is a potential reason that these trials have a high rate of negative results." [Accessed June 9, 2010]. Available at: http://ccforum.com/content/14/2/R77.

Peter Bacchetti, Leslie E. Wolf, Mark R. Segal, Charles E. McCulloch. Ethics and Sample Size. Am. J. Epidemiol. 2005;161(2):105-110. Abstract: "The belief is widespread that studies are unethical if their sample size is not large enough to ensure adequate power. The authors examine how sample size influences the balance that determines the ethical acceptability of a study: the balance between the burdens that participants accept and the clinical or scientific value that a study can be expected to produce. The average projected burden per participant remains constant as the sample size increases, but the projected study value does not increase as rapidly as the sample size if it is assumed to be proportional to power or inversely proportional to confidence interval width. This implies that the value per participant declines as the sample size increases and that smaller studies therefore have more favorable ratios of projected value to participant burden. The ethical treatment of study participants therefore does not require consideration of whether study power is less than the conventional goal of 80% or 90%. Lower power does not make a study unethical. The analysis addresses only ethical acceptability, not optimality; large studies may be desirable for other than ethical reasons." [Accessed July 7, 2010]. Available at: http://aje.oxfordjournals.org/cgi/content/abstract/161/2/105.

Johnston M, Hays R, Hui K. Evidence-based effect size estimation: An illustration using the case of acupuncture for cancer-related fatigue. BMC Complementary and Alternative Medicine. 2009;9(1):1. Available at: http://www.biomedcentral.com/1472-6882/9/1 [Accessed February 24, 2009].

Ross Prentice. Invited Commentary: Ethics and Sample Size--Another View. Am. J. Epidemiol. 2005;161(2):111-112. Excerpt: "In their article entitled, "Ethics and Sample Size," Bacchetti et al. (1) provide a spirited justification, based on ethical considerations, for the conduct of clinical trials that may have little potential to provide powerful tests of therapeutic or public health hypotheses. This perspective is somewhat surprising given the longstanding encouragement by clinical trialists and bioethicists in favor of large trials (2�4). Heretofore, the defenders of smaller trials have essentially argued only that small, underpowered trials need not be unethical if well conducted given their contribution to intervention effect estimation and their potential contribution to meta-analyses (5, 6). However, Bacchetti et al. evidently go further on the basis of certain risk-benefit considerations, and they conclude: "In general, ethics committees and others concerned with the protection of research subjects need not consider whether a study is too small.... Indeed, a more legitimate ethical issue regarding sample size is whether it is too large" (1, p. 108)." [Accessed July 7, 2010]. Available at: http://aje.oxfordjournals.org.

Wei-Jiun Lin, Huey-Miin Hsueh, James J. Chen. Power and sample size estimation in microarray studies. BMC Bioinformatics. 2010;11(1):48. Abstract: "BACKGROUND: Before conducting a microarray experiment, one important issue that needs to be determined is the number of arrays required in order to have adequate power to identify differentially expressed genes. This paper discusses some crucial issues in the problem formulation, parameter specifications, and approaches that are commonly proposed for sample size estimation in microarray experiments. Common methods for sample size estimation are formulated as the minimum sample size necessary to achieve a specified sensitivity (proportion of detected truly differentially expressed genes) on average at a specified false discovery rate (FDR) level and specified expected proportion (pi1) of the true differentially expression genes in the array. Unfortunately, the probability of detecting the specified sensitivity in such a formulation can be low. We formulate the sample size problem as the number of arrays needed to achieve a specified sensitivity with 95% probability at the specified significance level. A permutation method using a small pilot dataset to estimate sample size is proposed. This method accounts for correlation and effect size heterogeneity among genes. RESULTS: A sample size estimate based on the common formulation, to achieve the desired sensitivity on average, can be calculated using a univariate method without taking the correlation among genes into consideration. This formulation of sample size problem is inadequate because the probability of detecting the specified sensitivity can be lower than 50%. On the other hand, the needed sample size calculated by the proposed permutation method will ensure detecting at least the desired sensitivity with 95% probability. The method is shown to perform well for a real example dataset using a small pilot dataset with 4-6 samples per group. CONCLUSIONS: We recommend that the sample size problem should be formulated to detect a specified proportion of differentially expressed genes with 95% probability. This formulation ensures finding the desired proportion of true positives with high probability. The proposed permutation method takes the correlation structure and effect size heterogeneity into consideration and works well using only a small pilot dataset." [Accessed February 1, 2010]. Available at: http://www.biomedcentral.com/1471-2105/11/48.

K Akazawa, T Nakamura, Y Palesch. Power of logrank test and Cox regression model in clinical trials with heterogeneous samples. Stat Med. 1997;16(5):583-597. Abstract: "This paper evaluates the loss of power of the simple and stratified logrank tests due to heterogeneity of patients in clinical trials and proposes a flexible and efficient method of estimating treatment effects adjusting for prognostic factors. The results of the paper are based on the analyses of survival data from a large clinical trial which includes more than 6000 cancer patients. Major findings from the simulation study on power are: (i) for a heterogeneous sample, such as advanced cancer patients, a simple logrank test can yield misleading results and should not be used; (ii) the stratified logrank test may suffer some power loss when many prognostic factors need to be considered and the number of patients within stratum is small. To address the problems due to heterogeneity, the Cox regression method with a special hazard model is recommended. We illustrate the method using data from a gastric cancer clinical trial." [Accessed June 16, 2010]. Available at: http://www3.interscience.wiley.com/journal/9725/abstract.

Dimidenko E. Power/Sample Size Calculation for Logistic Regression with Binary Covariate(s). Available at: http://www.dartmouth.edu/~eugened/power-samplesize.php [Accessed April 9, 2010].

Ian C. McKay. The Philosophy of Statistical Power Analysis. Excerpt: The arguments for using some kind of power analysis are based on very practical considerations and sometimes ethical considerations too. It is clearly not desirable to invest a lot of time, effort and expense on a scientific study that has no reasonable prospect of yielding any conclusions. A double-blind clinical trial of a polio vaccine comes to mind. The outcome was measured by comparing the incidence of polio among the vaccinated and control groups. None of the vaccinated volunteers caught polio in the course of the study, but neither did any of the control group. No conclusion could be drawn about the efficacy of the vaccine, and it became evident that a lot of volunteers had been needlessly inconvenienced and possibly put at some risk of side-effects. Particularly damning was the fact that an inconclusive outcome could easily have been predicted from a knowledge of the current incidence of polio, and so the costs and risks could have been avoided. Another mistake that can be prevented by power analysis is the wasteful collection of more experimental data than are needed. If you have good prospects of being able convincingly to demonstrate the effectiveness of a drug using 100 volunteers, then it is arguably wasteful and unethical to use 200. The above arguments are clear enough and will probably convince most people. But there are other aspects of power analysis that are much more debatable. URL: www.discourses.org.uk/statistics/power3.htm

Frank E Harrell Jr, Kerry L Lee, Robert M Califf, David B Pryor, Robert A Rosati. Regression modelling strategies for improved prognostic prediction. Statistics in Medicine 1984: 3; 143-152. Description: This article uses a simulation study of stepwise logistic regression to demonstrate that it performs poorly when the ratio of events to candidate independent variables is less than 10 to 1.

S. J. Walters. Sample size and power estimation for studies with health related quality of life outcomes: a comparison of four methods using the SF-36. Health Qual Life Outcomes 2004: 2; 26. [Medline] [Abstract] [Full text] [PDF]. Description: This article proposes three formulas for estimating sample size as well as a bootstrap method and then compares their performance using a quality of life outcome, SF-36.

Peter Bacchetti, Jacqueline Leung. Sample Size Calculations in Clinical Research : Anesthesiology. Anesthesiology. 2002;97(4):1028-1029. Excerpt: "We write to make the case that the practice of providing a priori sample size calculations, recently endorsed in an Anesthesiology editorial, is in fact undesirable. Presentation of confidence intervals serves the same purpose, but is superior because it more accurately reflects the actual data, is simpler to present, addresses uncertainty more directly, and encourages more careful interpretation of results." [Accessed July 7, 2010]. Available at: http://journals.lww.com/anesthesiology/Fulltext/2002/10000/Sample_Size_Calculations_in_Clinical_Research.50.aspx.

Kevin L. Delucchi. Sample Size Estimation in Research With Dependent Measures and Dichotomous Outcomes. Am J Public Health. 2004;94(3):372-377. Abstract: "I reviewed sample estimation methods for research designs involving nonindependent data and a dichotomous response variable to examine the importance of proper sample size estimation and the need to align methods of sample size estimation with planned methods of statistical analysis. Examples and references to published literature are provided in this article. When the method of sample size estimation is not in concert with the method of planned analysis, poor estimates may result. The effects of multiple measures over time also need to be considered. Proper sample size estimation is often overlooked. Alignment of the sample size estimation method with the planned analysis method, especially in studies involving nonindependent data, will produce appropriate estimates." Available at: http://ajph.aphapublications.org/cgi/content/full/94/3/372.

Carley S, Dosman S, Jones SR, Harrison M. Simple nomograms to calculate sample size in diagnostic studies. Emerg Med J. 2005;22(3):180-181. Abstract: Objectives: To produce an easily understood and accessible tool for use by researchers in diagnostic studies. Diagnostic studies should have sample size calculations performed, but in practice, they are performed infrequently. This may be due to a reluctance on the part of researchers to use mathematical formulae. Methods: Using a spreadsheet, we derived nomograms for calculating the number of patients required to determine the precision of a test�s sensitivity or specificity. Results: The nomograms could be easily used to determine the sensitivity and specificity of a test. Conclusions: In addition to being easy to use, the nomogram allows deduction of a missing parameter (number of patients, confidence intervals, prevalence, or sensitivity/specificity) if the other three are known. The nomogram can also be used retrospectively by the reader of published research as a rough estimating tool for sample size calculations. [Accessed November 16, 2009]. Available at: http://emj.bmj.com/cgi/content/abstract/22/3/180

Arnold BF, Hogan DR, Colford JM, Hubbard AE. Simulation methods to estimate design power: an overview for applied research. BMC Medical Research Methodology. 2011;11(1):94. doi:10.1186/1471-2288-11-94. Abstract: Estimating the required sample size and statistical power for a study is an integral part of study design. For standard designs, power equations provide an efficient solution to the problem, but they are unavailable for many complex study designs that arise in practice. For such complex study designs, computer simulation is a useful alternative for estimating study power. Although this approach is well known among statisticians, in our experience many epidemiologists and social scientists are unfamiliar with the technique. This article aims to address this knowledge gap. Available at: http://www.biomedcentral.com/1471-2288/11/94

Lenth RV. Some Practical Guidelines for Effective Sample Size Determination. The American Statistician 2001 (August), 55(3); 187-193. [Abstract] [PDF] Description: This article offers some practical suggestions on how to elicit an effect size and find the right standard deviation. It explains what to do if budget limitations restrict your sample size and criticizes the use of standardized effect sizes and post hoc power.

Steve Shiboski. Table of Calculators for Survival Outcomes. Description: This webpage highlights several different programs for power calculations for sirvial analysis. It includes a Java applet by Marc Bacsafra and SAS macros by Joanna Shih. [Accessed June 16, 2010]. Available at: http://cct.jhsph.edu/javamarc/index.htm.

Creative Commons License All of the material above this paragraph is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2017-06-15. The material below this paragraph links to my old website, StATS. Although I wrote all of the material listed below, my ex-employer, Children's Mercy Hospital, has claimed copyright ownership of this material. The brief excerpts shown here are included under the fair use provisions of U.S. Copyright laws.


32. Stats: Too much power and precision? (January 9, 2008).There was a discussion on EDSTAT-L about studies with too much power and precision. You can indeed have too much power/precision, and here is a pragmatic example.


31. Stats: Justifying the sample size for a microarray study (August 9, 2007). I'm helping out with a grant proposal that is using microarrays for part of the analysis. A microarray is system for quantitative measurement of circulating mRNA in human, animal, or plant tissue. A microarray will typically measure thousands or tens of thousands of different mRNA sequences. An important issue for this particular grant (and many grants involving microarray data) is how to justify the sample size. Here are a few references that I will use to develop such a justification.

30. Stats: What is an adequate sample size for establishing validity and reliability? (April 9, 2007). Someone from Mumbai, India wrote in asking whether a sample of 163 was sufficiently large for a study of reliability and validity. This was for a project that was already done, and this person was worried that someone would complain that 163 is too small.

29. Stats: IRB review of a pilot study (March 26, 2007). Dear Professor Mean: I am the new chair of the IRB at a county hospital. Many of the studies we review are pilot studies with small samples. I have been trying to locate criteria for the scientific review of pilot studies, but have not found a consensus in the literature that I have seen. Is a pilot study merely a "dry run" of the procedures that will be used in a later, larger-scale study? Or, is it reasonable for the IRB to demand that the investigator provide specific criteria for determining whether the pilot has been a success? And, should the IRB furthermore demand that specific hypotheses be formulated? My impression is that many investigators declare their studies to be pilots in order to avoid more rigorous scrutiny of their proposals.

28. Stats: Do your own power and sample size calculations (January 30, 2007). Someone asked me for some power calculations and the problem was stated very tersely and completely: "Alpha .05, Power 0.8. What is sample size to detect an outcome difference of 20% vs 30% for an adverse event. Thank you." Usually people have difficulty in elaborating the conditions of the power or sample size calculation, and I am always glad to help with that process. But if you already know the conditions, you can find very nice web sites that will do power calculations for you.

27. Stats: Variable cluster sizes and their impact on sample size calculations (January 3, 2007). A recently published article in the International Journal of Epidemiology discusses sample size requirements for cluster randomized trials when the size of the cluster itself varies. The authors develop an approximation that uses the coefficient of variation (CV) of the distribution of cluster sizes.


26. Stats: Be sure to account for dropouts in your sample size calculation (December 29, 2006). I helped out a colleague with an NIH grant, and when the critique came back, it mentioned two issues that I should have been aware of. First, they pointed out the need for an intention-to-treat analysis strategy. Second, they noted the long duration of the study, with a full year of evaluations on any particular patient, and seemed bothered that we presumed that 100% of the patients would complete the full study.

25. Stats: Is a 10% shortfall in sample size critical? (October 23, 2006). Dear Professor Mean, I'm reviewing a paper where they did a power calculation based on 60 patients per group, but in the research study, they ended up only getting 55/58 per group. Since their sample size was much less than what they originally planned for, does this mean that the study had inadequate power?

24. Stats: R libraries for sample size justification (July 28, 2006). There are a lot of good commercial and free sources for sample size justification. Note that most people use the term power calculation, but there is more than one way to justify a sample size, so I try to avoid the term "power calculation" as being too restrictive. Anyway, I just noted an email on the MedStats list that suggests two R libraries.

23. Stats: How many charts should I pull? (March 30, 2006). I got a question from someone doing a quality review. She needs to pull a certain number of medical records out of 892 and see whether the doctors followed the clinical guidelines properly. The question is how to determine the proper number of charts to pull.


22. Stats: Sample size for a binomial confidence interval (October 3, 2005). Someone asked me for some help with a homework question. I hesitate to offer too much advice in these situations because I don't want to disrupt the teacher's efforts to get the students to think on their own.

21. Stats: Sample size for a binary endpoint (August 12, 2005). Someone sent me an email asking for the sample size needed to detect a 10% shift in the probability of recurrence of an event after one of two different surgical procedures is done.

20. Stats: Confidence interval for a correlation coefficient (July 11, 2005). In many exploratory research studies, the goal is to examine associations among multiple demographic variables and some outcome variables. How can you justify the sample size for such an exploratory study? There are several approaches, but one simple way that I often use is to show that any correlation coefficients estimated by this research study will have reasonable precision. It may not be the most rigorous way to select a sample size, but it is convenient and easy to understand.

19. Stats: Sample size calculation for a nonparametric test (March 8, 2005). I got an email inquiry about how to calculate power for a Wilcoxon signed ranks test. I don't have a perfect reference for this, but I do have a brief discussion on sample size calculations for the Mann Whitney U test as part of my pages on selecting an appropriate sample size. The same considerations would apply for the Wilcoxon test.


18. Stats: Unequal sample sizes (November 24, 2004). For some reasons, it seems to unnerve people when the sample size in the treatment and control group are not the same. They worry about whether the tests that they would run on the data would be valid or not. As a general rule, there is no reason that you cannot analyze data with unequal sample sizes.

17. Stats: Ratio of observations to independent variables (November 17, 2004). A widely quoted rule is that you need 10 or 15 observations per independent variable in a regression model. The original source of this rule of thumb is difficult to find. I briefly commented on this in an earlier weblog entry, but here is a more complete elaboration.

16. Stats: Sample size for an ordinal outcome (September 22, 2004). Someone asked me for some help with calculating an appropriate sample size for a study comparing two treatments, where the outcome measure is ordinal (degree of skin toxicity: none, slight, moderate, severe). It turns out that an excellent discussion of the various approaches appears in a recent journal article with full free text on the web.

15. Stats: Sample size calculations in studies with a baseline (July 23, 2004). Many research studies evaluate all patients at baseline and then randomly assign the patients to groups, conduct the interventions, and then re-evaluate them at the end of the study. The sample size calculations for this type of study are a bit tricky.

14. Stats: Sample size for a diagnostic test (July 5, 2004). Someone asked me how to determine the sample size for a study involving a diagnostic test. It seems like a tricky thing, because most studies of diagnostic tests don't have a formal hypothesis. What you need to do instead is to specify a particular statistic that you are interested in estimating and then selecting a sample size so that the confidence interval for this estimate is reasonably precise.

13. Stats: Sample size for cluster randomized trials (May 27, 2004). One of my favorite people to work with, Vidya Sharma, was asking me how to compute the sample size in a cluster randomized trial. I had started to write a web page about this, but never found the time to finish it. A cluster randomized trial selects several large groups of patients and then randomly assigns a treatment to all of the patients within a group. A cluster randomized trial requires a larger sample size than for a simple randomized trial.

12. Stats: Sample size calculation example (May 20, 2004). I received a question in Hong Kong about how to double check a power calculation in a paper by Tugwell et all in the 1995 NEJM. In the paper, they state that "With the tender-joint count used as the primary outcome, a sample of 75 patients per group was needed in order to have a 5 percent probability of a Type I error and a power of 80 percent to detect a difference of 5 tender joints between groups, with a standard deviation of 9.5, and to allow for a 25 percent dropout rate."

11. Stats: Sample size for a survival data model (May 13, 2004). I got an email from Japan recently with an interesting question. The question was about an analysis of mortality for children under 5 years of age. There were 1992 patients and 272 of them died. I was asked if this was an adequate sample size and whether there was a problem because the median survival time was unavailable for some of the subgroups.


10. Stats: Cluster randomization (May 9, 2003). This appears to be a duplicate of the May 27, 2004 weblog entry.


9. Stats: Three things you need for a power calculation (November 8, 2001). Dear Professor Mean, I want to do research. Is forty subjects enough, or do I need more? Didn't I hear you mention something about three things you need for a power calculation? -- Eager Edward

8. Stats: Documenting negative results in a research paper (October 11, 2001). Dear Professor Mean, I have just finished a well-designed research study and my results are negative. I'm worried about publication bias; most journals will only accept papers that show positive results. How do I document the negative findings in a research paper in a way that will convince a journal to accept my paper? -- Apprehensive Arturo

7. Stats: Quick sample size calculations (October 11, 2001). Dear Professor Mean, I'm reading a research paper. I suspect that the sample size is way too small. I don't like the findings of the study anyway, so I'm hoping that you will help me discredit this study. Is there a quick sample size calculation that I can use? -- Cynical Chris

6. Stats: Confidence interval with zero events (January 19, 2001). Dear Professor Mean, I was working with a colleague on some confidence intervals for the probability of an adverse event during several different types of operations. One of the proportions was zero, since the event never occured. My friend computed a confidence interval and it went from zero to zero. I told him that this couldn't be right and computing a confidence interval with zero events is impossible. Isn't that right? -- Killjoy Karlina

5. Stats: The minimal impact of population size on power and precision (January 19, 2001). Dear Professor Mean, Can you explain why it is okay to have similar sample sizes for populations of very different sizes. For example, why is it that a sample size for a population of 10 million doesn't have to be much larger than a sample size for a population of 10 thousand? -- Skeptical Sam


4. Stats: Sample size for Mann-Whitney U (September 28, 2000). Dear Professor Mean, I need to calculate the sample size for the Mann-Whitney U test. How do I do this? -- Bewildered Bob

3. Stats: Binary outcome sample size calculations (August 23, 2000) Dear Professor Mean, I have to calculate a sample size for a binary outcome variable. The research study is on breast feeding failures within 7 to 10 days of birth for mothers who intended to breast feed. The rate of failure overall is expected to be about 12%. What sample size do I need? -- Baffled Bob

2. Stats: Sample size for a confidence interval (January 26, 2000). Dear Professor Mean, We have a large dataset with about 400 million records. We need to randomly select a subsample from it. However we need help in determining the sample size. What sample size do we need for the confidence interval calculations? -- Frantic Frank


1. Stats: Sample size for a diagnostic study (September 3, 1999). Dear Professor Mean, How big should a study of a diagnostic test be? I want to estimate a sample size for the sensitivity and specifity of a test. I guess confidence intervals would address this, but is there a calculation analogous to a power analysis that would apply to figure out the size of the groups beforehand? -- Jovial John

What now?

Browse other categories at this site

Browse through the most recent entries

Get help