Survival analysis sample size calculation

*Blog post
2024
Sample size justification
Survival analysis
Uses R code
Author

Steve Simon

Published

November 26, 2024

Someone asked about getting a sample size for a survival analysis project. Here is the scenario. This is a two group comparison. The control group is expected to see 75% survival at one year. The treatment group is expected to do better, with a hazard ratio of 0.8. Patients are going to be recruited for a full calendar year and then accrual will stop. Patients will continue to be followed for another two years. This means that the patients recruited on the first day of the study will be followed for three full years and patients recruited on the last day of accrual will be followed for two full years.

Let’s start with a simple model. Maybe too simple, but you have to start somewhere. Assume that events occur with an exponential distribution. The cumulative density function of the exponential distribution is

The value of lambda for the control group has to satisfy the equation

With a bit of algebra, you get

I always get nervous with this, because it is really easy to make a minor error and all of a sudden, you have something that is off by a full order of magnitude. So check your work. In R, the function

There are similar functions, in SAS, Stata, etc.

Now, you expect the treated group to have a hazard ratio of 0.8. For the exponential distribution, this means that they have a smaller parameter,

Now, if you follow a patient for an average of 2.5 years, what is the probability that they will experience the event?

For the control group, it is

and for the treatment group, it is

Now power is driven by the number of events, not the number of patients. How many events would you need to see? The formula found in Schoenfeld (1983)

Schoenfeld, D. A. (1983). Sample-size formula for the proportional-hazards regression model. Biometrics, 499-503. Available in html format is

where \(P_A\) and \(P_B\) are the proportions in the control and treatment groups and \(\Delta\) is the hazard ratio. Assuming equal sample sizes in each group, you would get

Now how many patients (not events) would you need? Solve the equation

to get

which is the number of patients in each group.

An earlier version of this page was published on new.pmean.com.