Category: Bayesian statistics (created 2007-05-30). In Bayesian
statistics, the researcher specifies a probability distribution prior to the
start of the experiment that represents his/her degree of belief about the
possible values of a process being studied. After data is collected, the
Bayesian analysis produces a posterior distribution that combines
information from data with information from the prior distribution. Articles are
arranged by date with the most recent entries at the top. Other entries about Bayesian statistics can be found in the
Bayesian statistics page at the
StATS website.

2008
- P.Mean: What does the FDA think about
Bayesian statistics (created 2008-07-08). The FDA is, in general, a
cautious agency (as it should be), but they are allowing newer approaches for
establishing efficacy and safety of new drugs. Many of these new approaches
involve Bayesian methods. A draft guidance "Guidance for the Use of Bayesian
Statistics in Medical Device Clinical Trials - Draft Guidance for Industry
and FDA Staff" is available in HTML format or PDF format.
- P.Mean: Distrust of a Bayesian
meta-analysis (created 2008-07-01). A regular correspondent on the
evidence based health email discussion group (BA) raised some questions about
the use of a Bayesian hierarchical model in a meta-analysis. He was worried
about whether this approach would be appropriate for this type of data.
Outside resources:
- Joseph G. Ibrahim, Ming-Hui Chen, Robert J. Gray. Bayesian Models for
Gene Expression with DNA Microarray Data. Journal of the American
Statistical Association. 2002;97(457):88-99. Abstract: "Two of the critical
issues that arise when examining DNA microarray data are (I) determination of
which genes best discriminate among the different types of tissue, and (2)
characterization of expression patterns in tumor tissues. For (1), there are
many genes that characterize DNA expression, and it is of critical importance
to try and identify a small set of genes that best discriminate between normal
and tumor tissues. For (2), it is critical to be able to characterize the DNA
expression of the normal and tumor tissue samples and develop suitable models
that explain patterns of DNA expression for these types of tissues. Toward
this goal,. we propose a novel Bayesian model for analyzing DNA microarray
data and propose a model selection methodology for identifying subsets of
genes that show different expression levels between normal and cancer tissues.
In addition, we propose a novel class of hierarchical priors for the
parameters that allow us to borrow strength across genes for making inference.
The properties of the priors are examined in detail. We introduce a Bayesian
model selection criterion for assessing the various models, and develop Markov
chain Monte Carlo algorithms for sampling from the posterior distributions of
the parameters and for computing the criterion. We present a detailed case
study in endometrial cancer to demonstrate our proposed methodology."
[Accessed December 2, 2009]. Available at:
http://www.jstor.org/stable/3085761.
- Jing Cao, Xian-Jin Xie, Song Zhang, Angelique Whitehurst, Michael White.
Bayesian optimal discovery procedure for simultaneous significance testing.
BMC Bioinformatics. 2009;10(1):5. Abstract: "BACKGROUND: In high throughput
screening, such as differential gene expression screening, drug sensitivity
screening, and genome-wide RNAi screening, tens of thousands of tests need to
be conducted simultaneously. However, the number of replicate measurements per
test is extremely small, rarely exceeding 3. Several current approaches
demonstrate that test statistics with shrinking variance estimates have more
power over the traditional t statistic. RESULTS: We propose a Bayesian
hierarchical model to incorporate the shrinkage concept by introducing a
mixture structure on variance components. The estimates from the Bayesian
model are utilized in the optimal discovery procedure (ODP) proposed by Storey
in 2007, which was shown to have optimal performance in multiple significance
tests. We compared the performance of the Bayesian ODP with several competing
test statistics. CONCLUSION: We have conducted simulation studies with 2 to 6
replicates per gene. We have also included test results from two real
datasets. The Bayesian ODP outperforms the other methods in our study,
including the original ODP. The advantage of the Bayesian ODP becomes more
significant when there are few replicates per test. The improvement over the
original ODP is based on the fact that Bayesian model borrows strength across
genes in estimating unknown parameters. The proposed approach is efficient in
computation due to the conjugate structure of the Bayesian model. The R code
(see Additional file 1) to calculate the Bayesian ODP is provided."
[Accessed February 23, 2009]. Available at:
http://www.biomedcentral.com/1471-2105/10/5.
- Michael Coory, Rachael Wills, Adrian Barnett. Bayesian versus
frequentist statistical inference for investigating a one-off cancer cluster
reported to a health department. BMC Medical Research Methodology.
2009;9(1):30. Abstract: BACKGROUND: The problem of silent multiple
comparisons is one of the most difficult statistical problems faced by
scientists. It is a particular problem for investigating a one-off cancer
cluster reported to a health department because any one of hundreds, or
possibly thousands, of neighbourhoods, schools, or workplaces could have
reported a cluster, which could have been for any one of several types of
cancer or any one of several time periods. METHODS: This paper contrasts the
frequentist approach with a Bayesian approach for dealing with silent multiple
comparisons in the context of a one-off cluster reported to a health
department. Two published cluster investigations were re-analysed using the
Dunn-Sidak method to adjust frequentist p-values and confidence intervals for
silent multiple comparisons. Bayesian methods were based on the Gamma
distribution. RESULTS: Bayesian analysis with non-informative priors produced
results similar to the frequentist analysis, and suggested that both clusters
represented a statistical excess. In the frequentist framework, the
statistical significance of both clusters was extremely sensitive to the
number of silent multiple comparisons, which can only ever be a subjective
"guesstimate". The Bayesian approach is also subjective: whether there is an
apparent statistical excess depends on the specified prior. CONCLUSIONS: In
cluster investigations, the frequentist approach is just as subjective as the
Bayesian approach, but the Bayesian approach is less ambitious in that it
treats the analysis as a synthesis of data and personal judgements (possibly
poor ones), rather than objective reality. Bayesian analysis is (arguably) a
useful tool to support complicated decision-making, because it makes the
uncertainty associated with silent multiple comparisons explicit."
[Accessed May 19, 2009]. Available at:
http://www.biomedcentral.com/1471-2288/9/30.
- N Stallard, P F Thall, J Whitehead. Decision theoretic designs for
phase II clinical trials with multiple outcomes. Biometrics.
1999;55(3):971-977. Abstract: "In many phase II clinical trials, it is
essential to assess both efficacy and safety. Although several phase II
designs that accommodate multiple outcomes have been proposed recently, none
are derived using decision theory. This paper describes a Bayesian decision
theoretic strategy for constructing phase II designs based on both efficacy
and adverse events. The gain function includes utilities assigned to patient
outcomes, a reward for declaring the new treatment promising, and costs
associated with the conduct of the phase II trial and future phase III
testing. A method for eliciting gain function parameters from medical
collaborators and for evaluating the design's frequentist operating
characteristics is described. The strategy is illustrated by application to a
clinical trial of peripheral blood stem cell transplantation for multiple
myeloma." [Accessed December 2, 2009]. Available at:
http://www.ncbi.nlm.nih.gov/pubmed/11315037.
- Sander Greenland, James M. Robins. Empirical-Bayes Adjustments for
Multiple Comparisons Are Sometimes Useful. Epidemiology.
1991;2(4):244-251. Abstract: "Rothman recommends against adjustments for
multiple comparisons. Implicit in his recommendation, however, is an
assumption that the sole objective of the data analysis is to report and
scientifically interpret the data. We concur with his recommendation when this
assumption is correct and one is willing to abandon frequentist
interpretations of the summary statistics. Nevertheless, there are situations
in which an additional or even primary goal of analysis is to reach a set of
decisions based on the data. In such situations, Bayes and empirical-Bayes
adjustments can provide a better basis for the decisions than conventional
procedures." [Accessed December 2, 2009]. Available at:
http://www.jstor.org/stable/20065674.
- U.S. Food and Drug Administration. Guidance for the Use of Bayesian
Statistics in Medical Device Clinical Trials. Guidance for the Use of
Bayesian Statistics in Medical Device Clinical Trials. Excerpt: "This
document provides guidance on statistical aspects of the design and analysis
of clinical trials for medical devices that use Bayesian statistical methods.
The purpose of this guidance is to discuss important statistical issues in
Bayesian clinical trials for medical devices and not to describe the content
of a medical device submission. Further, while this document provides guidance
on many of the statistical issues that arise in Bayesian clinical trials, it
is not intended to be all-inclusive. The statistical literature is rich with
books and papers on Bayesian theory and methods; a selected bibliography has
been included for further discussion of specific topics. FDA’s guidance
documents, including this guidance, do not establish legally enforceable
responsibilities. Instead, guidances describe the Agency’s current thinking on
a topic and should be viewed only as recommendations, unless specific
regulatory or statutory requirements are cited. The use of the word should in
Agency guidances means that something is suggested or recommended, but not
required." [Accessed October 13, 2009]. Available at:
http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm071072.htm.
- Byron Gajewski, Jonathan Mahnken, Nancy Dunton. Improving quality
indicator report cards through Bayesian modeling. BMC Medical Research
Methodology. 2008;8(1):77. Abstract: "BACKGROUND: The National Database for
Nursing Quality Indicators(R) (NDNQI(R)) was established in 1998 to assist
hospitals in monitoring indicators of nursing quality (eg, falls and pressure
ulcers). Hospitals participating in NDNQI transmit data from nursing units to
an NDNQI data repository. Data are summarized and published in reports that
allow participating facilities to compare the results for their units with
those from other units across the nation. A disadvantage of this reporting
scheme is that the sampling variability is not explicit. For example, suppose
a small nursing unit that has 2 out of 10 (rate of 20%) patients with pressure
ulcers. Should the nursing unit immediately undertake a quality improvement
plan because of the rate difference from the national average (7%). METHODS:
In this paper, we propose approximating 95% credible intervals (CrIs) for
unit-level data using statistical models that account for the variability in
unit rates for report cards. RESULTS: Bayesian CrIs communicate the level of
uncertainty of estimates more clearly to decision makers than other
significance tests. CONCLUSION: A benefit of this approach is that nursing
units would be better able to distinguish problematic or beneficial trends
from fluctuations likely due to chance." [Accessed January 3, 2009].
Available at:
http://www.biomedcentral.com/1471-2288/8/77.
All of the material above this paragraph is licensed under a
Creative Commons Attribution 3.0 United States License. This page was written by
Steve Simon and was last modified on
2009-12-01. The material
below this paragraph links to
my old website, StATS. Although I wrote all of the material
listed below, my ex-employer, Children's Mercy Hospital, has claimed copyright
ownership of this material. The brief excerpts shown here are included under
the fair use provisions of U.S. Copyright laws.
2008
- Stats: Eliciting a prior distribution
for rejection/refusal rates (June 7, 2008). I got a question about the
Bayesian model for rejection/refusal rates. I had used three prior
distributions in my calculations, a Beta(10,40), a Beta(45,5), and a
Beta(25,25). The question was, how did I select those prior distributions.
- Stats: Why does a Bayesian approach make
sense for monitoring accrual? (May 8, 2008). I'm working with Byron
Gajewski to develop some models for monitoring the progress of clinical
trials. Too many researchers overpromise and undeliver on the planned sample
size and the planned completion date of their research This leads to serious
delays in the research and inadequate precision and power when the research
is completed. We want to develop some tools that will let researchers plan
the pattern of patient accrual in their studies. These tools will also let
the researchers carefully monitor the progress of their studies and let them
take action quickly if accrual rates are suffering. We've adopted a Bayesian
approach for these tools. While a Bayesian approach to Statistics is
controversial, we feel that there should be no controversy with regard to
using Bayesian models in modeling accrual.
2007
- Stats: Fitting a beta
binomial model using BUGS (April 17, 2007). I've spent a bit of time
trying to learn how to run a program called BUGS. The acronym stands for
Bayes Using Gibbs Sampling. Here is my first serious attempt to run a BUGS
program.
- Stats: A simple
illustration of the Metropolis algorithm (April 13, 2007). In many
situations, you need to generate a random sample from a distribution that is
rather complex. When simpler methods for generating a random sample
don't work, there are a series of approaches based on the Markov chain
principle that can help. There are several of these methods: Gibbs sampling,
the Metropolis algorithm, the Metropolis-Hastings algorithm, that are
collectively called Markov Chain Monte Carlo (MCMC). These approaches are
especially valuable in Bayesian data analysis. The simplest of the three
methods is the Metropolis algorithm, and here is a simple example of how it
works.
- Stats: What I'm working on
right now (March 18, 2007). There are several research projects where I
am actively looking for collaborators. I thought I'd outline these topics
briefly here.
2006
- Stats: A simple Bayesian
model for accrual (November 17, 2006). Suppose you are a researcher in
charge of a long term study. You plan to collect data on 120 patients. The
goal is to finish your study in ten years, which means getting 12 patients
per year or one every thirty days on average. Recruiting patients though
appears to be harder than you had expected. You recruited your first patient
on day 56, 26 days behind schedule. The second patient is not recruited
until day 93. About two years into the study (day 768), you have just
recruited your 10th patient. It looks like recruitment might be behind
schedule. Is it time to take action? A Bayesian model of accrual times can
help you to discern whether recruitment is behind schedule and project an
estimated completion date allowing for uncertainty.
- Stats: Articles on Bayesian
data analysis (March 30, 2006). The Journal of Data Science has a couple
of interesting Bayesian papers in the April 2006 issue. The first article
addresses a thorny topic, multiple comparisons in an ANOVA model. The second
article discusses the teaching of Bayesian statistics.
2005
- Stats: Technology to end spam (March
8, 2005). In my job I get a lot of spam, partly because I listed my
email address on my web site until just recently. The research community is
trying to find technological solutions to spam (unsolicited commercial
email), and some of the approaches are quite fascinating. The folks at
Microsoft have looked at a system that limits the amount of email that
someone can send out in a single day by asking the sender to solve a
moderately difficult computational challenge for each piece of email sent.
Another interesting approach uses Bayesian Statistics to produce a
probability estimate that the message is spam. This approach looks at words
that appear commonly in spam messages and uncommonly in legitimate messages.
- Stats: Steps in a typical Bayesian model
(January 24, 2005). I editorialized a year ago about this on the
evidence-based Health List. "Should proponents of EBM be concerned about
understanding the Bayesian philosophy? In my opinion, no. I think we'll
gradually see Bayesian philosophy creep in to the design and analysis of
clinical trials. For example, there are good Bayesian solutions, I
understand, to the tricky issue of early stopping of clinical trials. But I
doubt that we will see a wholesale rejection of both p-values AND confidence
intervals in my lifetime. Too many people like me fail to fully understand
the Bayesian paradigm for this to happen. So from a practical viewpoint,
most of the medical research for the foreseeable future will be analyzed
using the Frequentist paradigm."
What now?
Browse other categories at this site
Browse through the most recent
entries
Get help