P.Mean: Accrual with refusals, exclusions, or dropouts (created 2012-04-22). News: Sign up for "The Monthly Mean," the newsletter that dares to call itself average, www.pmean.com/news.

A common issue with slow accrual is higher than expected rates of refusals, exclusions, or dropouts. If you have information on these rates, you can incorporate them into a Bayesian model of accrual. Here are the details.

You plan to recruit a total of N = X + Y patients with the understanding that X of these patients will be excluded for various reasons and only Y patients will be available for the clinical trial. You expect to take a total of T days for the study. After t days, you recruit n = x + y patients, x of whom are excluded and y of whom are available for the clinical trial. If you run the trial until Y patients enter the trial, how long will the trial last? If the trial ends at time T, can you predict the total number of patients available for the study?

Let the number of patients recruited be distributed as

where s represent an arbitrary number of days of recruitment. The letter vu by itself represents the Poisson distribution for s=1 day. Equivalently, let the amount of time to recruit patients (both those eventually included in the trial and those eventually excluded) be distributed as

where m represents an arbitrary number of patients. The letter tau by itself represents a gamma distribution with shape parameter 1 (or more simply an exponential distribution) and represents the waiting time between two successive patients. The interrelationship between Poisson counts and exponential waiting times is well known and offers you two conventient modeling approaches. There are times when you might prefer to model the number of remaining patients and times when you might prefer to model the amount of time remaining in the study.

Finally, let the count of the number of patients out of m who make it into the study be distributed as

The letter gamma by itself represents the inclusion/exclusion of a single subject, which has a Bernoulli distribution. Given these distributions, you need to determine the distribution of phi(s) which represents the number of included patients across s days. Phi is a mixture of binomial and Poisson distributions. Assume that on a given day, you recruit i patients, where i comes from a Poisson distribution. If pi represents the probabilty that any one patient is not excluded and finds their way into the trial, then the chance of getting exactly j patients out of i is a convolution of the Poisson and binomial probabilities.

Pull any terms outside the summation that do not depend on i and regroup inside the summation.

After recoding k=i-j, the summation is the Taylor series expansion for the exponential function. This simplifies further to

This demonstrates that a Poisson variable subject to an exclusion process based on the binomial distribution is also Poisson at a rate reduced by the proportion, pi, that are are expected to survive the exclusion process.

In a similar fashion, you can show that a random sum of exponentials is also exponential with a slower rate where the number being summed depends on an exclusion process with survival probability pi. The new exponential has a rate that is the product of the original exponential rate multiplied by the survival probability.

Use the gamma distribution as a prior for lambda and the beta distribution as a prior for pi. Let P0 and P1 represent the strengths of the two prior distributions.

The joint posterior distribution is also well known, because of the choice of conjugate prior distributions. The posterior distribution of lambda and pi, given the data t, n, i, and x is

After dropping some proportionality constants and re-arranging, you get

The two left terms in this equation represent a gamma posterior for lambda once the proper normalizing constants are added. The right two terms in this equation represent a beta posterior for pi, again once the proper normalizing constants are added.

With a closed form posterior distribution, it is easy to simulate the distribution of phi(T-t), the number of patients that you expect to see in the remaining T-t days. But you can also get a close form solution here as well (I think that you can, but I'm still working on the math). You know that

so if you integrate out the parameters, this will produce a posterior predictive distribution. Think of it as a weighted average of Poisson distributions where the weights are given by the posterior distributions of lambda and pi.

Pull anything unrelated to pi outside the integral and regroup inside the integral.

Except for some normalizing constants, the inside integral is the moment generating function for a beta distribution, evaluated at lambda*(T-t). The general form for the mgf is

and when you start plugging in the values, it looks like

There looks like there is a beta binomial distribution trying to emerge from this mess. The pdf for the beta binomial distribution is

Recall that

With a bit of work, the inside integral works out to be

Recombine this with what we pulled out from the inside integral to get

Here's an alternative approach

Integrate first with respect to lambda. Pull everything out of the integral that is not a function of lambda and regroup inside the integral.

The inside integral looks like a gamma distribution. Place the correct constants inside the interval and the inverse of those constants outside the integral, and the inside integrates to 1. This leaves you with

The left side of the integral is a negative binomial distribution (you need to squint a bit to see this). The right side is a beta distribution. This is a well-known distribution, the negative beta-binomial distribution. The general density function for this distribution is

where

The only way to tackle this integral is repeated integration by parts, and we need to simplify the notation a bit first. Define

and the integral can be rewritten as

or