Category: Bayesian statistics

P.Mean >> Category >> Bayesian statistics (created 2007-05-30).

In Bayesian statistics, the researcher specifies a probability distribution prior to the start of the experiment that represents his/her degree of belief about the possible values of a process being studied. After data is collected, the Bayesian analysis produces a posterior distribution that combines information from data with information from the prior distribution. Articles are arranged by date with the most recent entries at the top.

Most of the new content will be added to my blog (blog.pmean.com).

Review all blog entries related to Bayesian statistics.

2013

21. P.Mean: Running JAGS from R, a simple example (created 2013-09-04). I was bemoaning the problems with BUGS yesterday, so today I investigated using JAGS instead. This is a stand-alone program, like BUGS, and also like BUGS it has an interface within R. I want to run from inside R so I can compare different models and run a few simple simulations. The first step, like the first step with BUGS was to run a simple beta-binomial model. This model is trivial, and does not need BUGS or JAGS or any other fancy package. It is just a quick way to test things.

20. P.Mean: Confusion about BUGS (created 2013-09-03). I dabble in various Bayesian statistical models, but the problem is that I get interested and start something, but then I get distracted and months or even years pass by before I look at this again. That makes it hard for me to make progress. One reason is that I have to relearn everything. That's not the biggest problem, though. I find when I return to the problem, that the world has changed around me. That appears to be true for a recent effort to run BUGS code that I had originally written in April 2012.

2011

19. P.Mean: Why use a Bayesian adaptive trial? (created 2012-03-07). The Bayesian adaptive trial controls the probability of randomizing a patient to each of the proposed dose groups. As data emerges during the study, the probabilities are updated so that you are less likely to randomize a patient to a dose level that has far too much toxicity, far too little efficacy, or which does not contribute much information about the dose-response curve. The Bayesian adaptive trial also allows you to close certain arms of the trial if the dose is clearly inappropriate for further study.

18. P.Mean: How you can teach Bayesian methods in an introductory Statistics class and why you should (created 2011-10-17).There's been a discussion among members of the Statistics in Epidemiology Section of the American Statistical Association about what topics should be covered in an introductory Statistics class. Within that discussion there has been a polite but heated debate about whether it is worthwhile to teach Bayesian methods in such a class. Some people were for it, but others thought it would be too confusing. Here's what I wrote about the topic.

17. P.Mean: A simple segmented linear regression model, borrowed from the BUGS manual (created 2011-05-25). I am interested in various extensions to the simple Bayesian model for accrual that Byron Gajewski and I derived and published in Statistics in Medicine. An important extension would be a segmented regression model for accrual that would allow for slow accrual at the start of the study, gradually rising to a steady state of accrual. Before I tackle that extension, I want to see how a simpler segmented regression model works in BUGS. I'm borrowing an example from the BUGS manual.

16. A simple hierarchical model for the Poisson distribution, borrowed from the BUGS manual (created 2011-05-20). I am interested in various extensions to the simple Bayesian model for accrual that Byron Gajewski and I derived and published in Statistics in Medicine. An important extension would be accrual in multi-center trials. A hierarchical model makes a lot of sense in this case, so I wanted to examine a simple hierarchical model that appears in the BUGS manual.

2010

15. P.Mean: Transforming the parameter also transforms the prior distribution (created 2010-11-25). All my work on Bayesian models recently has forced me to remember some of my mathematical statistics that I had not touched since college. Here's another example of this. Suppose you have a prior distribution on a parameter θ and you want to find the comparable prior for a transformation φ=u(θ).

14. P.Mean: BUGS is more than just one program (created 2010-11-19). I am working on some Bayesian models that use a program called BUGS. BUGS stands for Bayesian Inference Using Gibbs Sampling. There are several ways you can run BUGS, and it is worthwhile to note why there are multiple programs.

13. P.Mean: Ambiguity in the definition of the exponential distribution (created 2010-11-16). I'm trying to run some Bayesian analyses using a program called BUGS (Bayes Using Gibbs Sampler), and this requires me to specify a prior distribution for the parameter associated with an exponential waiting time. I'm having more trouble that I should because the exponential distribution is defined two different ways.

2009

12. The Monthly Mean: You, too, can understand Bayesian data analysis (July/August 2009)

2008

11. P.Mean: What does the FDA think about Bayesian statistics (created 2008-07-08). The FDA is, in general, a cautious agency (as it should be), but they are allowing newer approaches for establishing efficacy and safety of new drugs. Many of these new approaches involve Bayesian methods. A draft guidance "Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials - Draft Guidance for Industry and FDA Staff" is available in HTML format or PDF format.

10. P.Mean: Distrust of a Bayesian meta-analysis (created 2008-07-01). A regular correspondent on the evidence based health email discussion group (BA) raised some questions about the use of a Bayesian hierarchical model in a meta-analysis. He was worried about whether this approach would be appropriate for this type of data.

Outside resources:

Peter D. Congdon. Applied Bayesian Hierarchical Methods. Chapman and Hall/CRC; 2010. Excerpt: "The use of Markov chain Monte Carlo (MCMC) methods for estimating hierarchical models involves complex data structures and is often described as a revolutionary development. An intermediate-level treatment of Bayesian hierarchical models and their applications, Applied Bayesian Hierarchical Methods demonstrates the advantages of a Bayesian approach to data sets involving inferences for collections of related units or variables and in methods where parameters can be treated as random collections. Emphasizing computational issues, the book provides examples of the following application settings: meta-analysis, data structured in space or time, multilevel and longitudinal data, multivariate data, nonlinear regression, and survival time data. For the worked examples, the text mainly employs the WinBUGS package, allowing readers to explore alternative likelihood assumptions, regression structures, and assumptions on prior densities. It also incorporates BayesX code, which is particularly useful in nonlinear regression. To demonstrate MCMC sampling from first principles, the author includes worked examples using the R package. Through illustrative data analysis and attention to statistical computing, this book focuses on the practical implementation of Bayesian hierarchical methods. It also discusses several issues that arise when applying Bayesian techniques in hierarchical and random effects models."

Journal article: Gabriela Espino-Hernandez, Paul Gustafson, Igor Burstyn. Bayesian adjustment for measurement error in continuous exposures in an individually matched case-control study BMC Medical Research Methodology. 2011;11(1):67. Abstract: "BACKGROUND: In epidemiological studies explanatory variables are frequently subject to measurement error. The aim of this paper is to develop a Bayesian method to correct for measurement error in multiple continuous exposures in individually matched case-control studies. This is a topic that has not been widely investigated. The new method is illustrated using data from an individually matched case-control study of the association between thyroid hormone levels during pregnancy and exposure to perfluorinated acids. The objective of the motivating study was to examine the risk of maternal hypothyroxinemia due to exposure to three perfluorinated acids measured on a continuous scale. Results from the proposed method are compared with those obtained from a naive analysis. METHODS: Using a Bayesian approach, the developed method considers a classical measurement error model for the exposures, as well as the conditional logistic regression likelihood as the disease model, together with a random-effect exposure model. Proper and diffuse prior distributions are assigned, and results from a quality control experiment are used to estimate the perfluorinated acids' measurement error variability. As a result, posterior distributions and 95% credible intervals of the odds ratios are computed. A sensitivity analysis of method's performance in this particular application with different measurement error variability was performed. RESULTS: The proposed Bayesian method to correct for measurement error is feasible and can be implemented using statistical software. For the study on perfluorinated acids, a comparison of the inferences which are corrected for measurement error to those which ignore it indicates that little adjustment is manifested for the level of measurement error actually exhibited in the exposures. Nevertheless, a sensitivity analysis shows that more substantial adjustments arise if larger measurement errors are assumed. CONCLUSIONS: In individually matched case-control studies, the use of conditional logistic regression likelihood as a disease model in the presence of measurement error in multiple continuous exposures can be justified by having a random-effect exposure model. The proposed method can be successfully implemented in WinBUGS to correct individually matched case-control studies for several mismeasured continuous exposures under a classical measurement error model." [Accessed on May 17, 2011]. Avaialble at: http://www.biomedcentral.com/1471-2288/11/67

Joseph G. Ibrahim, Ming-Hui Chen, Robert J. Gray. Bayesian Models for Gene Expression with DNA Microarray Data. Journal of the American Statistical Association. 2002;97(457):88-99. Abstract: "Two of the critical issues that arise when examining DNA microarray data are (I) determination of which genes best discriminate among the different types of tissue, and (2) characterization of expression patterns in tumor tissues. For (1), there are many genes that characterize DNA expression, and it is of critical importance to try and identify a small set of genes that best discriminate between normal and tumor tissues. For (2), it is critical to be able to characterize the DNA expression of the normal and tumor tissue samples and develop suitable models that explain patterns of DNA expression for these types of tissues. Toward this goal,. we propose a novel Bayesian model for analyzing DNA microarray data and propose a model selection methodology for identifying subsets of genes that show different expression levels between normal and cancer tissues. In addition, we propose a novel class of hierarchical priors for the parameters that allow us to borrow strength across genes for making inference. The properties of the priors are examined in detail. We introduce a Bayesian model selection criterion for assessing the various models, and develop Markov chain Monte Carlo algorithms for sampling from the posterior distributions of the parameters and for computing the criterion. We present a detailed case study in endometrial cancer to demonstrate our proposed methodology." [Accessed December 2, 2009]. Available at: http://www.jstor.org/stable/3085761.

Jing Cao, Xian-Jin Xie, Song Zhang, Angelique Whitehurst, Michael White. Bayesian optimal discovery procedure for simultaneous significance testing. BMC Bioinformatics. 2009;10(1):5. Abstract: "BACKGROUND: In high throughput screening, such as differential gene expression screening, drug sensitivity screening, and genome-wide RNAi screening, tens of thousands of tests need to be conducted simultaneously. However, the number of replicate measurements per test is extremely small, rarely exceeding 3. Several current approaches demonstrate that test statistics with shrinking variance estimates have more power over the traditional t statistic. RESULTS: We propose a Bayesian hierarchical model to incorporate the shrinkage concept by introducing a mixture structure on variance components. The estimates from the Bayesian model are utilized in the optimal discovery procedure (ODP) proposed by Storey in 2007, which was shown to have optimal performance in multiple significance tests. We compared the performance of the Bayesian ODP with several competing test statistics. CONCLUSION: We have conducted simulation studies with 2 to 6 replicates per gene. We have also included test results from two real datasets. The Bayesian ODP outperforms the other methods in our study, including the original ODP. The advantage of the Bayesian ODP becomes more significant when there are few replicates per test. The improvement over the original ODP is based on the fact that Bayesian model borrows strength across genes in estimating unknown parameters. The proposed approach is efficient in computation due to the conjugate structure of the Bayesian model. The R code (see Additional file 1) to calculate the Bayesian ODP is provided." [Accessed February 23, 2009]. Available at: http://www.biomedcentral.com/1471-2105/10/5.

Journal article: Eric B Meltzer, William T Barry, Thomas A D'Amico, Robert D Davis, Shu S Lin, Mark W Onaitis, Lake D Morrison, Thomas A Sporn, Mark P Steele, et al. Bayesian Probit Regression Model for the Diagnosis of Pulmonary Fibrosis: Proof-of-Principle BMC Medical Genomics. 2011;4(1):70. ABSTRACT: "BACKGROUND: The accurate diagnosis of idiopathic pulmonary fibrosis (IPF) is a major clinical challenge. We developed a model to diagnose IPF by applying Bayesian probit regression (BPR) modelling to gene expression profiles of whole lung tissue. METHODS: Whole lung tissue was obtained from patients with idiopathic pulmonary fibrosis (IPF) undergoing surgical lung biopsy or lung transplantation. Controls were obtained from normal organ donors. We performed cluster analyses to explore differences in our dataset. No significant difference was found between samples obtained from different lobes of the same patient. A significant difference was found between samples obtained at biopsy versus explant. Following preliminary analysis of the complete dataset, we selected three subsets for the development of diagnostic gene signatures: the first signature was developed from all IPF samples (as compared to controls); the second signature was developed from the subset of IPF samples obtained at biopsy; the third signature was developed from IPF explants. To assess the validity of each signature, we used an independent cohort of IPF and normal samples. Each signature was used to predict phenotype (IPF versus normal) in samples from the validation cohort. We compared the models' predictions to the true phenotype of each validation sample, and then calculated sensitivity, specificity and accuracy. RESULTS: Surprisingly, we found that all three signatures were reasonably valid predictors of diagnosis, with small differences in test sensitivity, specificity and overall accuracy. CONCLUSIONS: This study represents the first use of BPR on whole lung tissue; previously, BPR was primarily used to develop predictive models for cancer. This also represents the first report of an independently validated IPF gene expression signature. In summary, BPR is a promising tool for the development of gene expression signatures from non-neoplastic lung tissue. In the future, BPR might be used to develop definitive diagnostic gene signatures for IPF, prognostic gene signatures for IPF or gene signatures for other non-neoplastic lung disorders such as bronchiolitis obliterans." [Accessed on October 11, 2011].

Laurence Freedman. Bayesian statistical methods. BMJ. 1996;313(7057):569 -570. Excerpt: "In this week's BMJ, Lilford and Braunholtz (p 603) explain the basis of Bayesian statistical theory.1 They explore its use in evaluating evidence from medical research and incorporating such evidence into policy decisions about public health. When drawing inferences from statistical data, Bayesian theory is an alternative to the frequentist theory that has predominated in medical research over the past half century." [Accessed September 24, 2010]. Available at: http://www.bmj.com/content/313/7057/569.short.

Peter Congdon. Bayesian Statistical Modelling. 2nd ed. Wiley; 2007. Description: A fairly technical book, but what book about Bayesian methods is not? The first chapter provides a detailed explanation of Markov Chain Monte Carlo. The remaining chapter provide some very sophisticated examples. Excerpt: "Bayesian methods combine the evidence from the data at hand with previous quantitative knowledge to analyse practical problems in a wide range of areas. The calculations were previously complex, but it is now possible to routinely apply Bayesian methods due to advances in computing technology and the use of new sampling methods for estimating parameters. Such developments together with the availability of freeware such as WINBUGS and R have facilitated a rapid growth in the use of Bayesian methods, allowing their application in many scientific disciplines, including applied statistics, public health research, medical science, the social sciences and economics. Following the success of the first edition, this reworked and updated book provides an accessible approach to Bayesian computing and analysis, with an emphasis on the principles of prior selection, identification and the interpretation of real data sets. The second edition: * Provides an integrated presentation of theory, examples, applications and computer algorithms. * Discusses the role of Markov Chain Monte Carlo methods in computing and estimation. * Includes a wide range of interdisciplinary applications, and a large selection of worked examples from the health and social sciences. * Features a comprehensive range of methodologies and modelling techniques, and examines model fitting in practice using Bayesian principles. * Provides exercises designed to help reinforce the reader�s knowledge and a supplementary website containing data sets and relevant programs. Bayesian Statistical Modelling is ideal for researchers in applied statistics, medical science, public health and the social sciences, who will benefit greatly from the examples and applications featured. The book will also appeal to graduate students of applied statistics, data analysis and Bayesian methods, and will provide a great source of reference for both researchers and students."

Michael Coory, Rachael Wills, Adrian Barnett. Bayesian versus frequentist statistical inference for investigating a one-off cancer cluster reported to a health department. BMC Medical Research Methodology. 2009;9(1):30. Abstract: BACKGROUND: The problem of silent multiple comparisons is one of the most difficult statistical problems faced by scientists. It is a particular problem for investigating a one-off cancer cluster reported to a health department because any one of hundreds, or possibly thousands, of neighbourhoods, schools, or workplaces could have reported a cluster, which could have been for any one of several types of cancer or any one of several time periods. METHODS: This paper contrasts the frequentist approach with a Bayesian approach for dealing with silent multiple comparisons in the context of a one-off cluster reported to a health department. Two published cluster investigations were re-analysed using the Dunn-Sidak method to adjust frequentist p-values and confidence intervals for silent multiple comparisons. Bayesian methods were based on the Gamma distribution. RESULTS: Bayesian analysis with non-informative priors produced results similar to the frequentist analysis, and suggested that both clusters represented a statistical excess. In the frequentist framework, the statistical significance of both clusters was extremely sensitive to the number of silent multiple comparisons, which can only ever be a subjective "guesstimate". The Bayesian approach is also subjective: whether there is an apparent statistical excess depends on the specified prior. CONCLUSIONS: In cluster investigations, the frequentist approach is just as subjective as the Bayesian approach, but the Bayesian approach is less ambitious in that it treats the analysis as a synthesis of data and personal judgements (possibly poor ones), rather than objective reality. Bayesian analysis is (arguably) a useful tool to support complicated decision-making, because it makes the uncertainty associated with silent multiple comparisons explicit." [Accessed May 19, 2009]. Available at: http://www.biomedcentral.com/1471-2288/9/30.

Casey Olives, Marcello Pagano. Bayes-LQAS: classifying the prevalence of global acute malnutrition. Emerging Themes in Epidemiology. 2010;7(1):3. Abstract: "Lot Quality Assurance Sampling (LQAS) applications in health have generally relied on frequentist interpretations for statistical validity. Yet health professionals often seek statements about the probability distribution of unknown parameters to answer questions of interest. The frequentist paradigm does not pretend to yield such information, although a Bayesian formulation might. This is the source of an error made in a recent paper published in this journal. Many applications lend themselves to a Bayesian treatment, and would benefit from such considerations in their design. We discuss Bayes-LQAS (B-LQAS), which allows for incorporation of prior information into the LQAS classification procedure, and thus shows how to correct the aforementioned error. Further, we pay special attention to the formulation of Bayes Operating Characteristic Curves and the use of prior information to improve survey designs. As a motivating example, we discuss the classification of Global Acute Malnutrition prevalence and draw parallels between the Bayes and classical classifications schemes. We also illustrate the impact of informative and non-informative priors on the survey design. Results indicate that using a Bayesian approach allows the incorporation of expert information and/or historical data and is thus potentially a valuable tool for making accurate and precise classifications." [Accessed July 16, 2010]. Available at: http://www.ete-online.com/content/7/1/3.

N Stallard, P F Thall, J Whitehead. Decision theoretic designs for phase II clinical trials with multiple outcomes. Biometrics. 1999;55(3):971-977. Abstract: "In many phase II clinical trials, it is essential to assess both efficacy and safety. Although several phase II designs that accommodate multiple outcomes have been proposed recently, none are derived using decision theory. This paper describes a Bayesian decision theoretic strategy for constructing phase II designs based on both efficacy and adverse events. The gain function includes utilities assigned to patient outcomes, a reward for declaring the new treatment promising, and costs associated with the conduct of the phase II trial and future phase III testing. A method for eliciting gain function parameters from medical collaborators and for evaluating the design's frequentist operating characteristics is described. The strategy is illustrated by application to a clinical trial of peripheral blood stem cell transplantation for multiple myeloma." [Accessed December 2, 2009]. Available at: http://www.ncbi.nlm.nih.gov/pubmed/11315037.

Sander Greenland, James M. Robins. Empirical-Bayes Adjustments for Multiple Comparisons Are Sometimes Useful. Epidemiology. 1991;2(4):244-251. Abstract: "Rothman recommends against adjustments for multiple comparisons. Implicit in his recommendation, however, is an assumption that the sole objective of the data analysis is to report and scientifically interpret the data. We concur with his recommendation when this assumption is correct and one is willing to abandon frequentist interpretations of the summary statistics. Nevertheless, there are situations in which an additional or even primary goal of analysis is to reach a set of decisions based on the data. In such situations, Bayes and empirical-Bayes adjustments can provide a better basis for the decisions than conventional procedures." [Accessed December 2, 2009]. Available at: http://www.jstor.org/stable/20065674.

R J Lilford, D Braunholtz. For Debate: The statistical basis of public policy: a paradigm shift is overdue. BMJ. 1996;313(7057):603 -607. Excerpt: "The recent controversy over the increased risk of venous thrombosis with third generation oral contraceptives illustrates the public policy dilemma that can be created by relying on conventional statistical tests and estimates: case-control studies showed a significant increase in risk and forced a decision either to warn or not to warn. Conventional statistical tests are an improper basis for such decisions because they dichotomise results according to whether they are or are not significant and do not allow decision makers to take explicit account of additional evidence�for example, of biological plausibility or of biases in the studies. A Bayesian approach overcomes both these problems. A Bayesian analysis starts with a �prior� probability distribution for the value of interest (for example, a true relative risk)�based on previous knowledge�and adds the new evidence (via a model) to produce a �posterior� probability distribution. Because different experts will have different prior beliefs sensitivity analyses are important to assess the effects on the posterior distributions of these differences. Sensitivity analyses should also examine the effects of different assumptions about biases and about the model which links the data with the value of interest. One advantage of this method is that it allows such assumptions to be handled openly and explicitly. Data presented as a series of posterior probability distributions would be a much better guide to policy, reflecting the reality that degrees of belief are often continuous, not dichotomous, and often vary from one person to another in the face of inconclusive evidence." [Accessed September 24, 2010]. Available at: http://www.bmj.com/content/313/7057/603.short.

U.S. Food and Drug Administration. Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials. Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials. Excerpt: "This document provides guidance on statistical aspects of the design and analysis of clinical trials for medical devices that use Bayesian statistical methods. The purpose of this guidance is to discuss important statistical issues in Bayesian clinical trials for medical devices and not to describe the content of a medical device submission. Further, while this document provides guidance on many of the statistical issues that arise in Bayesian clinical trials, it is not intended to be all-inclusive. The statistical literature is rich with books and papers on Bayesian theory and methods; a selected bibliography has been included for further discussion of specific topics. FDA�s guidance documents, including this guidance, do not establish legally enforceable responsibilities. Instead, guidances describe the Agency�s current thinking on a topic and should be viewed only as recommendations, unless specific regulatory or statutory requirements are cited. The use of the word should in Agency guidances means that something is suggested or recommended, but not required." [Accessed October 13, 2009]. Available at: http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm071072.htm.

Byron Gajewski, Jonathan Mahnken, Nancy Dunton. Improving quality indicator report cards through Bayesian modeling. BMC Medical Research Methodology. 2008;8(1):77. Abstract: "BACKGROUND: The National Database for Nursing Quality Indicators(R) (NDNQI(R)) was established in 1998 to assist hospitals in monitoring indicators of nursing quality (eg, falls and pressure ulcers). Hospitals participating in NDNQI transmit data from nursing units to an NDNQI data repository. Data are summarized and published in reports that allow participating facilities to compare the results for their units with those from other units across the nation. A disadvantage of this reporting scheme is that the sampling variability is not explicit. For example, suppose a small nursing unit that has 2 out of 10 (rate of 20%) patients with pressure ulcers. Should the nursing unit immediately undertake a quality improvement plan because of the rate difference from the national average (7%). METHODS: In this paper, we propose approximating 95% credible intervals (CrIs) for unit-level data using statistical models that account for the variability in unit rates for report cards. RESULTS: Bayesian CrIs communicate the level of uncertainty of estimates more clearly to decision makers than other significance tests. CONCLUSION: A benefit of this approach is that nursing units would be better able to distinguish problematic or beneficial trends from fluctuations likely due to chance." [Accessed January 3, 2009]. Available at: http://www.biomedcentral.com/1471-2288/8/77.

Gillian D. Sanders, Lurdes Inoue, Gregory Samsa, Shalini Kulasingam, David Matchar. Use of Bayesian Techniques in Randomized Clinical Trials: A CMS Case Study. Excerpt: "We provide a basic tutorial on Bayesian statistics and the possible uses of such statistics in clinical trial design and analysis. We conducted a synthesis of existing published research focusing on how Bayesian techniques can modify inferences that affect policy-level decisionmaking. Noting that subgroup analysis is a particularly fruitful application of Bayesian methodology, and an area of particular interest to CMS, we focused our efforts there rather on the design of such trials. We used simulation studies and a case study of patient-level data from eight trials to explore Bayesian techniques in the CMS decisional context in the clinical domain of the prevention of sudden cardiac death and the use of the implantable cardioverter defibrillator (ICD). We combined knowledge gained through the literature review, simulation studies, and the case study to provide findings concerning the use of Bayesian approaches specific to the CMS context." [Accessed September 24, 2010]. Available at: http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/use_of_bayesian.html.

All of the material above this paragraph is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon. Anything below this paragraph represents material from my old website, StATS. Until recently (June 2012), this material was available through Children's Mercy Hospital, but is no longer available there. Although I do not hold clear copyright for this material, I am reproducing it here as a service. See my old website page for more details.

2008

9. Stats: Eliciting a prior distribution for rejection/refusal rates (June 7, 2008). I got a question about the Bayesian model for rejection/refusal rates. I had used three prior distributions in my calculations, a Beta(10,40), a Beta(45,5), and a Beta(25,25). The question was, how did I select those prior distributions.

8. Stats: Why does a Bayesian approach make sense for monitoring accrual? (May 8, 2008). I'm working with Byron Gajewski to develop some models for monitoring the progress of clinical trials. Too many researchers overpromise and undeliver on the planned sample size and the planned completion date of their research This leads to serious delays in the research and inadequate precision and power when the research is completed. We want to develop some tools that will let researchers plan the pattern of patient accrual in their studies. These tools will also let the researchers carefully monitor the progress of their studies and let them take action quickly if accrual rates are suffering. We've adopted a Bayesian approach for these tools. While a Bayesian approach to Statistics is controversial, we feel that there should be no controversy with regard to using Bayesian models in modeling accrual.

2007

7. Stats: Fitting a beta binomial model using BUGS (April 17, 2007). I've spent a bit of time trying to learn how to run a program called BUGS. The acronym stands for Bayes Using Gibbs Sampling. Here is my first serious attempt to run a BUGS program.

6. Stats: A simple illustration of the Metropolis algorithm (April 13, 2007). In many situations, you need to generate a random sample from a distribution that is rather complex. When simpler methods for generating a random sample don't work, there are a series of approaches based on the Markov chain principle that can help. There are several of these methods: Gibbs sampling, the Metropolis algorithm, the Metropolis-Hastings algorithm, that are collectively called Markov Chain Monte Carlo (MCMC). These approaches are especially valuable in Bayesian data analysis. The simplest of the three methods is the Metropolis algorithm, and here is a simple example of how it works.

5. Stats: What I'm working on right now (March 18, 2007). There are several research projects where I am actively looking for collaborators. I thought I'd outline these topics briefly here.

2006

4. Stats: A simple Bayesian model for accrual (November 17, 2006). Suppose you are a researcher in charge of a long term study. You plan to collect data on 120 patients. The goal is to finish your study in ten years, which means getting 12 patients per year or one every thirty days on average. Recruiting patients though appears to be harder than you had expected. You recruited your first patient on day 56, 26 days behind schedule. The second patient is not recruited until day 93. About two years into the study (day 768), you have just recruited your 10th patient. It looks like recruitment might be behind schedule. Is it time to take action? A Bayesian model of accrual times can help you to discern whether recruitment is behind schedule and project an estimated completion date allowing for uncertainty.

3. Stats: Articles on Bayesian data analysis (March 30, 2006). The Journal of Data Science has a couple of interesting Bayesian papers in the April 2006 issue. The first article addresses a thorny topic, multiple comparisons in an ANOVA model. The second article discusses the teaching of Bayesian statistics.

2005

2. Stats: Technology to end spam (March 8, 2005). In my job I get a lot of spam, partly because I listed my email address on my web site until just recently. The research community is trying to find technological solutions to spam (unsolicited commercial email), and some of the approaches are quite fascinating. The folks at Microsoft have looked at a system that limits the amount of email that someone can send out in a single day by asking the sender to solve a moderately difficult computational challenge for each piece of email sent. Another interesting approach uses Bayesian Statistics to produce a probability estimate that the message is spam. This approach looks at words that appear commonly in spam messages and uncommonly in legitimate messages.

1. Stats: Steps in a typical Bayesian model (January 24, 2005). I editorialized a year ago about this on the evidence-based Health List. "Should proponents of EBM be concerned about understanding the Bayesian philosophy? In my opinion, no. I think we'll gradually see Bayesian philosophy creep in to the design and analysis of clinical trials. For example, there are good Bayesian solutions, I understand, to the tricky issue of early stopping of clinical trials. But I doubt that we will see a wholesale rejection of both p-values AND confidence intervals in my lifetime. Too many people like me fail to fully understand the Bayesian paradigm for this to happen. So from a practical viewpoint, most of the medical research for the foreseeable future will be analyzed using the Frequentist paradigm."

What now?

Browse other categories at this site

Browse through the most recent entries

Get help