P.Mean: A simple Bayesian model for exponential accrual times (created 2008-05-26)

StATS: A simple Bayesian model for exponential accrual times (created 2008-05-26).

This page is moving to a new website.

Here is a simple Bayesian model for exponential accrual times. This model will help researchers to plan the estimated duration of a clinical trial. The same model will also allow the researcher to monitor the accrual during the trial itself and develop revised estimates for the duration or the sample size.

This web page represents a direct translation of a poster presentation, Predicting Accrual in Clinical Trials: Bayesian Posterior Predictive Distribution, Stephen D. Simon and Byron Gajewski, presented at the 2007 Joint Statistical Meetings in Salt Lake City, Utah. Here is a PDF version of the original poster. These results were also adapted for a paper

Gajewski BJ, Simon SD, Carlson S. "Predicting accrual in clinical trials with Bayesian posterior predictive distributions" Statistics in Medicine (2008): 27(13); 2328-2340. DOI: 10.1002/sim.3128 [Medline] [Abstract].

A similar accrual model based on the Poisson distribution was developed independently and published a few months earlier in the same journal

Anisimov VV, Fedorov VV. "Modelling, prediction and adaptive adjustment of recruitment in multicentre trials" Statistics in Medicine (2007); 26: 4958–4975. DOI: 10.1002/sim.2956 [Medline] [Abstract].

Suppose that the trial director wishes to assess the accrual process after m patients have been recruited. Let t₀ represent the time that the study started and let t₁, t₂, …, t_m represent the times that each new patient enters the trial. Without loss of generality, we can assume that the study starts at time t₀=0. Compute waiting times

We will assume that

and specify an inverse gamma as the prior distribution for theta. Eliciting a prior distribution is a difficult task. We start our elicitation by asking two simple questions

How long will it take to accrue n subjects?
On a scale of 1-10, how confident are you in your answer to 1?

Let T represent the answer to the first question. Divide the answer to the second question by 10 to get P. This produces the following prior distribution.

You can then present various properties of this prior distribution to the researcher to assess how realistic this prior distribution is.

After the trial has started, assume that m patients have entered the trial at time t_m. The posterior distribution is

which has posterior mean

This is approximately a weighted average of the prior mean and the mean waiting time for the observed data. The relative weights depend on the prior sample size (nP) and the sample size of patients observed so far (m).

We are interested in predicting the next n-m waiting times W_m+1,…,W_n. A simple and easily generalizable approach is to randomly select θ₁ from the posterior distribution and then randomly select waiting time n-m random variables from W_m+1,1 ,…,W_n,1 from an exponential distribution with parameter θ₁. Repeat this process for θ₂, θ₃,…,θ_b, where b is a large number (typically 1,000). The sum of observed and simulated waiting times, S_b(n) = w₁ + w₂ + … + w_m + W_m+1,b + … + W_n,b will represent b estimates of the total duration of the clinical trial.

Example: To illustrate the proposed method, consider an unnamed current phase III clinical trial (randomized, double-blind, and placebo-controlled) used to examine the efficacy and safety of a dietary supplement. This study was planned and accrual started prior to our development of these methods, but still serves to illustrate how this approach would work. The current protocol requires n=350 subjects, with balanced randomization to either treatment or placebo control. In the previous study, investigators were able to recruit, from a similar population, 350 subjects across 3 years.

At the design phase of the study it was felt that the previous clinical trial offered strong prior information. Setting P=0.5 results in k=175 and V=1.5 years. This corresponds to a prior mean of V/(k-1)=0.0086 years (3.1 days). This means that the researcher expects to see a new patient every third day, on average.

With this prior distribution, the researcher can predict the trial duration and account for uncertainty in the specification of the parameters and uncertainty due to the random nature of the exponential accrual model.

The graph above shows the predicted trial duration based solely on the prior distribution. The gray region represents the range between the 2.5 and 97.5 percentiles. The white line in the middle of this region represents the 50th percentile.

For this particular prior distribution, the 50th percentile for trial duration is 3.0 years. There is a 0.025 probability that the trial could finish in 2.5 years or less and a 0.025 probability that it could last 3.6 years or longer.

After the study was funded and the protocol approved, the investigative team began recruiting subjects. After 239 days the project director compiled a report that displays enrolled dates of 41 subjects. This represents an average waiting time of 5.8 days, much longer than prior expectation (3.1 days).

The Bayesian predictive distribution appears above. Notice that the estimated completion time is not a simple linear projection of the data, as the prior distribution still exerts enough influence to bend the projection back slightly. The 50th percentile for trial duration is 3.7 years, a substantial increase over the original belief that the trial would last about 3 years. The 2.5 and 97.5 percentiles for the Bayesian predictive distribution are 3.3 and 4.3. so even allowing for the uncertainties in the accrual pattern, there is a very high probability that this trial will finish later than planned.

We can produce a Bayesian predictive distribution for a non-informative prior by setting P=0. The results are shown above. Note that the median trial duration (5.6 years) is now a simple linear projection of the accrual trend through the first 41 subjects. This approach, however, properly accounts for the uncertainty associated with this trend. The 2.5 and 97.5 percentiles are 4.2 years and 7.5 years respectively.

The R code for producing the Bayesian predictive distribution for the trial duration is shown below. this simulation could easily be done in most other reasonable statistical software packages or even in a spreadsheet.

duration.plot <- function(n,T,P,m,tm,B=1000,Tmax=2*T,sample.paths=0) { # n is the target sample size # T is the target completion time # P is the prior certainty (0 <= P <= 1) # m is the sample observed to date # tm is the time to date # B is the number of simulated duration times # Tmax is the upper limit on the time axis of the graph # # set P to zero for a non-informative prior # set m and tm to zero if no data has been accumulated yet. k <- n*P+m V <- T*P+tm theta <- 1/rgamma(B,shape=k,rate=V) simulated.duration <- matrix(NA,nrow=n-m,ncol=B) for (i in 1:B) { wait <- rexp(n-m,1/theta[i]) simulated.duration[,i] <- tm+cumsum(wait) } lcl <- apply(simulated.duration,1,quantile,probs=0.025) mid <- apply(simulated.duration,1,quantile,probs=0.500) ucl <- apply(simulated.duration,1,quantile,probs=0.975) layout(matrix(c(1,2,2,2))) par(mar=c(0.1,4.1,0.1,0.1)) duration.hist <- cut(simulated.duration[n-m,], seq(0,Tmax,length=40)) barplot(table(duration.hist),horiz=FALSE, axes=FALSE,xlab=" ",ylab=" ",space=0, col="white",names.arg=rep(" ",39)) par(mar=c(4.1,4.1,0.1,0.1)) plot(c(0,Tmax),c(0,n),xlab="Time", ylab="Number of patients",type="n") polygon(c(lcl,rev(ucl)),c((m+1):n,n:(m+1)), density=-1,col="gray",border=NA) lines(mid,(m+1):n,col="white") segments(0,0,tm,m) while (sample.paths>0) { lines(simulated.duration[,sample.paths],(m+1):n) sample.paths <- sample.paths-1 } pctiles <- matrix(NA,nrow=2,ncol=3) dimnames(pctiles) <- list(c("Waiting time","Trial duration") ,c("2.5%","50%","97.5%")) pctiles[1,] <- 1/qgamma(c(0.975,0.5,0.025),shape=k,rate=V) pctiles[2,1] <- lcl[n-m] pctiles[2,2] <- mid[n-m] pctiles[2,3] <- ucl[n-m] list(pct=pctiles,sd=simulated.duration) }

The function calls that produced the three figures shown above are

p0 <- duration.plot(n=350,T=3,P=0.5,m= 0,tm= 0,Tmax=10) p1 <- duration.plot(n=350,T=3,P=0.5,m=41,tm=239/365,Tmax=10) p2 <- duration.plot(n=350,T=3,P=0, m=41,tm=239/365,Tmax=10)

Predicting sample size if the trial has a fixed duration. We can also use this process to obtain a posterior predictive sample size. Let T represent the time point at which the study must end. Compute partial sums S_b(m+1), S_b(m+2), . . until the partial sum exceeds T. The values n_b which represent the largest values where the partial sums do not exceed T, provide a predictive distribution of sample sizes.

Example: Let's use the same prior distribution, but produce a simulation that estimates the final sample size under the assumption that the trial must end at exactly 3 years.

The figure above shows the Bayesian predictive distribution based only on the prior distribution. The median estimated sample size is 349 and the 2.5 and 9.5 percentiles are 291 and 413.

The figure above shows how the projected sample size changes after 239 days and 41 patients. The median is much lower now (274), and the 2.5 and 97.5 percentiles (233 and 321 respectively) are both well below the original expectation of recruiting 350 patients.

A Bayesian predictive sample size ignoring the prior information is shown above. The median sample size is 188, which is a simple linear projection of the data. The 2.5 and 97.5 percentiles are 143 and 241, respectively.

This simulation approach also allows you to examine more complex accrual patterns, such an accrual goal of recruiting until 50 patients have volunteered, or until 6 months have elapsed, whichever comes first.

The R code for producing the Bayesian predictive sample size appears below.

accrual.plot <- function(n,T,P,m,tm,B=1000,nmax=2*n,sample.paths=0) { # n is the target sample size # T is the target completion time # P is the prior certainty (0 <= P <= 1) # m is the sample observed to date # tm is the time to date # B is the number of simulated accrual times # nmax is the upper limit on the sample size axis of the graph # # set P to zero for a non-informative prior # set m and tm to zero if no data has been accumulated yet. k <- n*P+m V <- T*P+tm theta <- 1/rgamma(B,shape=k,rate=V) simulated.duration <- matrix(NA,nrow=nmax-m,ncol=B) for (i in 1:B) { wait <- rexp(nmax-m,1/theta[i]) simulated.duration[,i] <- tm+cumsum(wait) } tlist <- seq(tm,T,length=100) accrual.count <- function(x,t) {m+sum(x<=t)} simulated.accrual <- matrix(NA,nrow=100,ncol=B) for (i in 1:100) { time <- tlist[i] simulated.accrual[i,] <- apply(simulated.duration,2, accrual.count,t=time) } lcl <- apply(simulated.accrual,1,quantile,probs=0.025) mid <- apply(simulated.accrual,1,quantile,probs=0.500) ucl <- apply(simulated.accrual,1,quantile,probs=0.975) layout(matrix(c(2,2,2,1),nrow=1)) par(mar=c(4.1,0.1,0.1,0.1)) accrual.hist <- cut(simulated.accrual[100,], seq(0,nmax,length=40)) barplot(table(accrual.hist),horiz=TRUE, axes=FALSE,xlab=" ",ylab=" ",space=0, col="white",names.arg=rep(" ",39)) par(mar=c(4.1,4.1,0.1,0.1)) plot(c(0,T),c(0,nmax),xlab="Time", ylab="Number of patients",type="n") polygon(c(tlist,rev(tlist)),c(lcl,rev(ucl)), density=-1,col="gray",border=NA) lines(tlist,mid,col="white") segments(0,0,tm,m) while (sample.paths>0) { lines(tlist,simulated.accrual[,sample.paths]) sample.paths <- sample.paths-1 } pctiles <- matrix(NA,nrow=2,ncol=3) dimnames(pctiles) <- list(c("Waiting time", "Trial accrual"),c("2.5%","50%","97.5%")) pctiles[1,] <- 1/qgamma(c(0.975,0.5,0.025),shape=k,rate=V) pctiles[2,1] <- lcl[100] pctiles[2,2] <- mid[100] pctiles[2,3] <- ucl[100] list(pct=pctiles,sa=simulated.accrual) }

The function calls that produced the three figures shown above are

d0 <- accrual.plot(n=350,T=3,P=0.5,m= 0,tm= 0,nmax=500) d1 <- accrual.plot(n=350,T=3,P=0, m=41,tm=239/365,nmax=500) d2 <- accrual.plot(n=350,T=3,P=0.5,m=41,tm=239/365,nmax=500)

Future work: The model we propose is easily extended in a variety of ways:

Use of alternatives to the exponential distribution for modeling waiting times.
Examination of alternative prior distributions.
Use hierarchical models to predict accrual across multiple centers in a multi-center trial.

Conclusion: Predicting accrual across a fixed time period of a planned study is not an easy procedure. As demonstrated in this paper there are uncertainties that should be incorporated into this prediction. We hope that the method in this paper encourages investigators to account for these uncertainties when planning and monitoring accrual in a clinical trial.

This page was written by Steve Simon while working at Children's Mercy Hospital. Although I do not hold the copyright for this material, I am reproducing it here as a service, as it is no longer available on the Children's Mercy Hospital website. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Accrual problems in clinical trials.