P.Mean >> Category >> Bayesian statistics (created 2007-05-30).

 In Bayesian statistics, the researcher specifies a probability distribution prior to the start of the experiment that represents his/her degree of belief about the possible values of a process being studied. After data is collected, the Bayesian analysis produces a posterior distribution that combines  information from data with information from the prior distribution. Articles are arranged by date with the most recent entries at the top. Other entries about Bayesian statistics can be found in the Bayesian statistics page at the StATS website.

2008

  1. P.Mean: What does the FDA think about Bayesian statistics (created 2008-07-08). The FDA is, in general, a cautious agency (as it should be), but they are allowing newer approaches for establishing efficacy and safety of new drugs. Many of these new approaches involve Bayesian methods. A draft guidance "Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials - Draft Guidance for Industry and FDA Staff" is available in HTML format or PDF format.
  2. P.Mean: Distrust of a Bayesian meta-analysis (created 2008-07-01). A regular correspondent on the evidence based health email discussion group (BA) raised some questions about the use of a Bayesian hierarchical model in a meta-analysis. He was worried about whether this approach would be appropriate for this type of data.

Outside resources:

  1. Joseph G. Ibrahim, Ming-Hui Chen, Robert J. Gray. Bayesian Models for Gene Expression with DNA Microarray Data. Journal of the American Statistical Association. 2002;97(457):88-99. Abstract: "Two of the critical issues that arise when examining DNA microarray data are (I) determination of which genes best discriminate among the different types of tissue, and (2) characterization of expression patterns in tumor tissues. For (1), there are many genes that characterize DNA expression, and it is of critical importance to try and identify a small set of genes that best discriminate between normal and tumor tissues. For (2), it is critical to be able to characterize the DNA expression of the normal and tumor tissue samples and develop suitable models that explain patterns of DNA expression for these types of tissues. Toward this goal,. we propose a novel Bayesian model for analyzing DNA microarray data and propose a model selection methodology for identifying subsets of genes that show different expression levels between normal and cancer tissues. In addition, we propose a novel class of hierarchical priors for the parameters that allow us to borrow strength across genes for making inference. The properties of the priors are examined in detail. We introduce a Bayesian model selection criterion for assessing the various models, and develop Markov chain Monte Carlo algorithms for sampling from the posterior distributions of the parameters and for computing the criterion. We present a detailed case study in endometrial cancer to demonstrate our proposed methodology." [Accessed December 2, 2009]. Available at: http://www.jstor.org/stable/3085761.
  2. Jing Cao, Xian-Jin Xie, Song Zhang, Angelique Whitehurst, Michael White. Bayesian optimal discovery procedure for simultaneous significance testing. BMC Bioinformatics. 2009;10(1):5. Abstract: "BACKGROUND: In high throughput screening, such as differential gene expression screening, drug sensitivity screening, and genome-wide RNAi screening, tens of thousands of tests need to be conducted simultaneously. However, the number of replicate measurements per test is extremely small, rarely exceeding 3. Several current approaches demonstrate that test statistics with shrinking variance estimates have more power over the traditional t statistic. RESULTS: We propose a Bayesian hierarchical model to incorporate the shrinkage concept by introducing a mixture structure on variance components. The estimates from the Bayesian model are utilized in the optimal discovery procedure (ODP) proposed by Storey in 2007, which was shown to have optimal performance in multiple significance tests. We compared the performance of the Bayesian ODP with several competing test statistics. CONCLUSION: We have conducted simulation studies with 2 to 6 replicates per gene. We have also included test results from two real datasets. The Bayesian ODP outperforms the other methods in our study, including the original ODP. The advantage of the Bayesian ODP becomes more significant when there are few replicates per test. The improvement over the original ODP is based on the fact that Bayesian model borrows strength across genes in estimating unknown parameters. The proposed approach is efficient in computation due to the conjugate structure of the Bayesian model. The R code (see Additional file 1) to calculate the Bayesian ODP is provided." [Accessed February 23, 2009]. Available at: http://www.biomedcentral.com/1471-2105/10/5.
  3. Michael Coory, Rachael Wills, Adrian Barnett. Bayesian versus frequentist statistical inference for investigating a one-off cancer cluster reported to a health department. BMC Medical Research Methodology. 2009;9(1):30. Abstract: BACKGROUND: The problem of silent multiple comparisons is one of the most difficult statistical problems faced by scientists. It is a particular problem for investigating a one-off cancer cluster reported to a health department because any one of hundreds, or possibly thousands, of neighbourhoods, schools, or workplaces could have reported a cluster, which could have been for any one of several types of cancer or any one of several time periods. METHODS: This paper contrasts the frequentist approach with a Bayesian approach for dealing with silent multiple comparisons in the context of a one-off cluster reported to a health department. Two published cluster investigations were re-analysed using the Dunn-Sidak method to adjust frequentist p-values and confidence intervals for silent multiple comparisons. Bayesian methods were based on the Gamma distribution. RESULTS: Bayesian analysis with non-informative priors produced results similar to the frequentist analysis, and suggested that both clusters represented a statistical excess. In the frequentist framework, the statistical significance of both clusters was extremely sensitive to the number of silent multiple comparisons, which can only ever be a subjective "guesstimate". The Bayesian approach is also subjective: whether there is an apparent statistical excess depends on the specified prior. CONCLUSIONS: In cluster investigations, the frequentist approach is just as subjective as the Bayesian approach, but the Bayesian approach is less ambitious in that it treats the analysis as a synthesis of data and personal judgements (possibly poor ones), rather than objective reality. Bayesian analysis is (arguably) a useful tool to support complicated decision-making, because it makes the uncertainty associated with silent multiple comparisons explicit." [Accessed May 19, 2009]. Available at: http://www.biomedcentral.com/1471-2288/9/30.
  4. Casey Olives, Marcello Pagano. Bayes-LQAS: classifying the prevalence of global acute malnutrition. Emerging Themes in Epidemiology. 2010;7(1):3. Abstract: "Lot Quality Assurance Sampling (LQAS) applications in health have generally relied on frequentist interpretations for statistical validity. Yet health professionals often seek statements about the probability distribution of unknown parameters to answer questions of interest. The frequentist paradigm does not pretend to yield such information, although a Bayesian formulation might. This is the source of an error made in a recent paper published in this journal. Many applications lend themselves to a Bayesian treatment, and would benefit from such considerations in their design. We discuss Bayes-LQAS (B-LQAS), which allows for incorporation of prior information into the LQAS classification procedure, and thus shows how to correct the aforementioned error. Further, we pay special attention to the formulation of Bayes Operating Characteristic Curves and the use of prior information to improve survey designs. As a motivating example, we discuss the classification of Global Acute Malnutrition prevalence and draw parallels between the Bayes and classical classifications schemes. We also illustrate the impact of informative and non-informative priors on the survey design. Results indicate that using a Bayesian approach allows the incorporation of expert information and/or historical data and is thus potentially a valuable tool for making accurate and precise classifications." [Accessed July 16, 2010]. Available at: http://www.ete-online.com/content/7/1/3.
  5. N Stallard, P F Thall, J Whitehead. Decision theoretic designs for phase II clinical trials with multiple outcomes. Biometrics. 1999;55(3):971-977. Abstract: "In many phase II clinical trials, it is essential to assess both efficacy and safety. Although several phase II designs that accommodate multiple outcomes have been proposed recently, none are derived using decision theory. This paper describes a Bayesian decision theoretic strategy for constructing phase II designs based on both efficacy and adverse events. The gain function includes utilities assigned to patient outcomes, a reward for declaring the new treatment promising, and costs associated with the conduct of the phase II trial and future phase III testing. A method for eliciting gain function parameters from medical collaborators and for evaluating the design's frequentist operating characteristics is described. The strategy is illustrated by application to a clinical trial of peripheral blood stem cell transplantation for multiple myeloma." [Accessed December 2, 2009]. Available at: http://www.ncbi.nlm.nih.gov/pubmed/11315037.
  6. Sander Greenland, James M. Robins. Empirical-Bayes Adjustments for Multiple Comparisons Are Sometimes Useful. Epidemiology. 1991;2(4):244-251. Abstract: "Rothman recommends against adjustments for multiple comparisons. Implicit in his recommendation, however, is an assumption that the sole objective of the data analysis is to report and scientifically interpret the data. We concur with his recommendation when this assumption is correct and one is willing to abandon frequentist interpretations of the summary statistics. Nevertheless, there are situations in which an additional or even primary goal of analysis is to reach a set of decisions based on the data. In such situations, Bayes and empirical-Bayes adjustments can provide a better basis for the decisions than conventional procedures." [Accessed December 2, 2009]. Available at: http://www.jstor.org/stable/20065674.
  7. U.S. Food and Drug Administration. Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials. Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials. Excerpt: "This document provides guidance on statistical aspects of the design and analysis of clinical trials for medical devices that use Bayesian statistical methods. The purpose of this guidance is to discuss important statistical issues in Bayesian clinical trials for medical devices and not to describe the content of a medical device submission. Further, while this document provides guidance on many of the statistical issues that arise in Bayesian clinical trials, it is not intended to be all-inclusive. The statistical literature is rich with books and papers on Bayesian theory and methods; a selected bibliography has been included for further discussion of specific topics. FDA’s guidance documents, including this guidance, do not establish legally enforceable responsibilities. Instead, guidances describe the Agency’s current thinking on a topic and should be viewed only as recommendations, unless specific regulatory or statutory requirements are cited. The use of the word should in Agency guidances means that something is suggested or recommended, but not required." [Accessed October 13, 2009]. Available at: http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm071072.htm.
  8. Byron Gajewski, Jonathan Mahnken, Nancy Dunton. Improving quality indicator report cards through Bayesian modeling. BMC Medical Research Methodology. 2008;8(1):77. Abstract: "BACKGROUND: The National Database for Nursing Quality Indicators(R) (NDNQI(R)) was established in 1998 to assist hospitals in monitoring indicators of nursing quality (eg, falls and pressure ulcers). Hospitals participating in NDNQI transmit data from nursing units to an NDNQI data repository. Data are summarized and published in reports that allow participating facilities to compare the results for their units with those from other units across the nation. A disadvantage of this reporting scheme is that the sampling variability is not explicit. For example, suppose a small nursing unit that has 2 out of 10 (rate of 20%) patients with pressure ulcers. Should the nursing unit immediately undertake a quality improvement plan because of the rate difference from the national average (7%). METHODS: In this paper, we propose approximating 95% credible intervals (CrIs) for unit-level data using statistical models that account for the variability in unit rates for report cards. RESULTS: Bayesian CrIs communicate the level of uncertainty of estimates more clearly to decision makers than other significance tests. CONCLUSION: A benefit of this approach is that nursing units would be better able to distinguish problematic or beneficial trends from fluctuations likely due to chance." [Accessed January 3, 2009]. Available at: http://www.biomedcentral.com/1471-2288/8/77.

Creative Commons License All of the material above this paragraph is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2010-07-16. The material below this paragraph links to my old website, StATS. Although I wrote all of the material listed below, my ex-employer, Children's Mercy Hospital, has claimed copyright ownership of this material. The brief excerpts shown here are included under the fair use provisions of U.S. Copyright laws.

2008

  1. Stats: Eliciting a prior distribution for rejection/refusal rates (June 7, 2008). I got a question about the Bayesian model for rejection/refusal rates. I had used three prior distributions in my calculations, a Beta(10,40), a Beta(45,5), and a Beta(25,25). The question was, how did I select those prior distributions.
  2. Stats: Why does a Bayesian approach make sense for monitoring accrual? (May 8, 2008). I'm working with Byron Gajewski to develop some models for monitoring the progress of clinical trials. Too many researchers overpromise and undeliver on the planned sample size and the planned completion date of their research This leads to serious delays in the research and inadequate precision and power when the research is completed. We want to develop some tools that will let researchers plan the pattern of patient accrual in their studies. These tools will also let the researchers carefully monitor the progress of their studies and let them take action quickly if accrual rates are suffering. We've adopted a Bayesian approach for these tools. While a Bayesian approach to Statistics is controversial, we feel that there should be no controversy with regard to using Bayesian models in modeling accrual.

    2007
     
  3. Stats: Fitting a beta binomial model using BUGS (April 17, 2007). I've spent a bit of time trying to learn how to run a program called BUGS. The acronym stands for Bayes Using Gibbs Sampling. Here is my first serious attempt to run a BUGS program.
  4. Stats: A simple illustration of the Metropolis algorithm (April 13, 2007). In many situations, you need to generate a random sample from a distribution that is rather complex.  When simpler methods for generating a random sample don't work, there are a series of approaches based on the Markov chain principle that can help. There are several of these methods: Gibbs sampling, the Metropolis algorithm, the Metropolis-Hastings algorithm, that are collectively called Markov Chain Monte Carlo (MCMC). These approaches are especially valuable in Bayesian data analysis. The simplest of the three methods is the Metropolis algorithm, and here is a simple example of how it works.
  5. Stats: What I'm working on right now (March 18, 2007). There are several research projects where I am actively looking for collaborators. I thought I'd outline these topics briefly here.

    2006
     
  6. Stats: A simple Bayesian model for accrual (November 17, 2006). Suppose you are a researcher in charge of a long term study. You plan to collect data on 120 patients. The goal is to finish your study in ten years, which means getting 12 patients per year or one every thirty days on average. Recruiting patients though appears to be harder than you had expected. You recruited your first patient on day 56, 26 days behind schedule. The second patient is not recruited until day 93. About two years into the study (day 768), you have just recruited your 10th patient. It looks like recruitment might be behind schedule. Is it time to take action? A Bayesian model of accrual times can help you to discern whether recruitment is behind schedule and project an estimated completion date allowing for uncertainty.
  7. Stats: Articles on Bayesian data analysis (March 30, 2006). The Journal of Data Science has a couple of interesting Bayesian papers in the April 2006 issue. The first article addresses a thorny topic, multiple comparisons in an ANOVA model. The second article discusses the teaching of Bayesian statistics.

    2005
     
  8. Stats: Technology to end spam (March 8, 2005). In my job I get a lot of spam, partly because I listed my email address on my web site until just recently. The research community is trying to find technological solutions to spam (unsolicited commercial email), and some of the approaches are quite fascinating. The folks at Microsoft have looked at a system that limits the amount of email that someone can send out in a single day by asking the sender to solve a moderately difficult computational challenge for each piece of email sent. Another interesting approach uses Bayesian Statistics to produce a probability estimate that the message is spam. This approach looks at words that appear commonly in spam messages and uncommonly in legitimate messages.
  9. Stats: Steps in a typical Bayesian model (January 24, 2005). I editorialized a year ago about this on the evidence-based Health List. "Should proponents of EBM be concerned about understanding the Bayesian philosophy? In my opinion, no. I think we'll gradually see Bayesian philosophy creep in to the design and analysis of clinical trials. For example, there are good Bayesian solutions, I understand, to the tricky issue of early stopping of clinical trials. But I doubt that we will see a wholesale rejection of both p-values AND confidence intervals in my lifetime. Too many people like me fail to fully understand the Bayesian paradigm for this to happen. So from a practical viewpoint, most of the medical research for the foreseeable future will be analyzed using the Frequentist paradigm."

What now?

Browse other categories at this site

Browse through the most recent entries

Get help