P.Mean >> Category >> Observational studies (created 2007-06-26).

Observational studies are studies where the experimenter does not choose who gets into the control group and the treatment/exposure group. Rather the patients and/or their physicians make this choice, or the groups were intact prior to the start of the research. Observational studies raise some important methodological challenges, but when they are used carefully, they provide valuable insights that are not possible with other research designs. Also see Category: Covariate adjustment or Category: Randomization in research.


15. P.Mean: Data sources for a proposed course on secondary data analysis (created 2013-08-07). I am giving a talk at the Joint Statistical Meetings (JSM) in Toronto. I'm still tweaking the slides just a few hours before the talk. The title is "Data sources for a proposed course on secondary data analysis." On this page, I want to provide a link to the PDF file of the slides and share a story about this talk.


14. P.Mean: Debating the validity of snowball sampling (created 2012-10-01). Someone on a discussion forum for IRB members criticized snowball sampling for a range of reasons, but (interesting from my perspective) for the reason that it is bad research. He asked "Why would anybody want to use snowball sampling? As non-probability sampling the results can't be generalized to a known universe." That's an interesting perspective, but one I disagree with. Here are my thoughts on the issue.


13. The Monthly Mean: Is this a case control design? (May/June 2009) and P.Mean: Is this a case-control design (created 2009-04-28). I have a stats study design question. If I were to look at the association of curly hair for instance with a rash on the forehead, I pick a case control study design. When I analyze this I find that 45% of kids in the clinic (surprise) had curly hair. But I look at two groups curly vs non curly and the outcome of interest is the rash on the forehead, instead of cases vs controls so now, has this become an observational study instead of case control? Hope I am making sense, this is only a theoretical question.


12. P.Mean: Comparisons involving distinct groups collected at different times and with different methods (created 2008-09-12). I have a data set of 100 children with a specific health problem. In this set I have medical histories of the children. In another study, I have collected a data set of 65 children without that specific health problem. In this set I also have medical histories of the children. Is it possible to compare the two samples in some way to determine whether there are significant differences in the medical histories in the two sets of children?

Outside resources:

Medical University of South Carolina. Bias Glossary. Description: This website provides concise definitions of thirteen types of biases that are likely to affect research findings. BROKEN LINK. Former URL was www.musc.edu/dc/icrebm/bias.html

Journal article: Oded Yitschaky, Michael Yitschaky, Yehuda Zadik. Case report on trial: Do you, Doctor, swear to tell the truth, the whole truth and nothing but the truth? Journal of Medical Case Reports. 2011;5(1):179. Abstract: "We are in the era of "evidence based medicine" in which our knowledge is stratified from top to bottom in a hierarchy of evidence. Many in the medical and dental communities highly value randomized clinical trials as the gold standard of care and undervalue clinical reports. The aim of this editorial is to emphasize the benefits of case reports in dental and oral medicine, and encourage those of us who write and read them." [Accessed on May 17, 2011]. Available at: http://www.jmedicalcasereports.com/content/5/1/179

Journal article: Bonnie Kaplan, Gerald Giesbrecht, Scott Shannon, Kevin McLeod. Evaluating treatments in health care: The instability of a one-legged stool BMC Medical Research Methodology. 2011;11(1):65. Abstract: "BACKGROUND: Both scientists and the public routinely refer to randomized controlled trials (RCTs) as being the "gold standard" of scientific evidence. Although there is no question that placebo-controlled RCTs play a significant role in the evaluation of new pharmaceutical treatments, especially when it is important to rule out placebo effects, they have many inherent limitations which constrain their ability to inform medical decision making. The purpose of this paper is to raise questions about over-reliance on RCTs and to point out an additional perspective for evaluating healthcare evidence, as embodied in the Hill criteria. The arguments presented here are generally relevant to all areas of health care, though mental health applications provide the primary context for this essay. DISCUSSION: This article first traces the history of RCTs, and then evaluates five of their major limitations: they often lack external validity, they have the potential for increasing health risk in the general population, they are no less likely to overestimate treatment effects than many other methods, they make a relatively weak contribution to clinical practice, and they are excessively expensive (leading to several additional vulnerabilities in the quality of evidence produced). Next, the nine Hill criteria are presented and discussed as a richer approach to the evaluation of health care treatments. Reliance on these multi-faceted criteria requires more analytical thinking than simply examining RCT data, but will also enhance confidence in the evaluation of novel treatments. SUMMARY: Excessive reliance on RCTs tends to stifle funding of other types of research, and publication of other forms of evidence. We call upon our research and clinical colleagues to consider additional methods of evaluating data, such as the Hill criteria. Over-reliance on RCTs is similar to resting all of health care evidence on a one-legged stool. [Accessed on May 24, 2011]. http://www.biomedcentral.com/1471-2288/11/65.

Journal article: Jennifer Frankovich, Christopher A. Longhurst, Scott M. Sutherland. Evidence-Based Medicine in the EMR Era New England Journal of Medicine. 2011:111102140011006. [Accessed on November 3, 2011]. Excerpt: "Without clear evidence to guide us and needing to make a decision swiftly, we turned to a new approach, using the data captured in our institution's electronic medical record (EMR) and an innovative research data warehouse. The platform, called the Stanford Translational Research Integrated Database Environment (STRIDE), acquires and stores all patient data contained in the EMR at our hospital and provides immediate advanced text searching capability.1 Through STRIDE, we could rapidly review data on an SLE cohort that included pediatric patients with SLE cared for by clinicians in our division between October 2004 and July 2009. This “electronic cohort” was originally created for use in studying complications associated with pediatric SLE and exists under a protocol approved by our institutional review board." http://www.nejm.org/doi/full/10.1056/NEJMp1108726.

Alastair H MacLennan. HRT: a reappraisal of the risks and benefits. MJA 2007; 186 (12): 643-646 [Full text] [PDF]. Description: Research goes in cycles. Ten years ago, hormone replacement therapy (HRT) was recommended for most women on the basis of observational studies that showed that it reduced the risk of heart attacks. Two studies published near the turn of the century indicated that this might not be the case. These were randomized studies and were thought to be more definitive than the observational studies. There was a difference, though, in the conduct of the randomized trials and the observational studies, most notably the age at which HRT was initiated. A recent analysis of the data seems to suggest that HRT is protective if it is initiated early. I'm not an expert on HRT, but the lesson to be learned here is that no trials are capable of producing perfectly accurate results and you need to react to these trials carefully rather than with a checklist mentality (randomized=good, observational=bad).

P Brennan, P Croft. Interpreting the results of observational research: chance is not such a fine thing. BMJ. 1994;309(6956):727 -730. Excerpt: "In a randomised controlled trial, if the design is not flawed, different outcomes in the study groups must be due to the intervention itself or to chance imbalances between the groups. Because of this tests of statistical significance are used to assess the validity of results from randomised studies. Most published papers in medical research, however, describe observational studies which do not include randomised intervention. This paper argues that the continuing application of tests of significance to such non-randomised investigations is inappropriate." [Accessed November 9, 2010]. Available at: http://www.bmj.com/content/309/6956/727.short.

GA Wells, B Shea, D O'Connell, J Peterson, V Welch, M Losos, P Tugwell. The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. Description: If you are conducting a systematic overview of nonrandomized studies, you need an objective method for evaluating the quality of these studies. The Newcastle-Ottawa scale provides a numeric score that you can use for excluding low quality studies, giving greater weight to higher quality studies, or for sensitivity analysis. This website was last verified on August 7, 2007. URL: www.ohri.ca/programs/clinical_epidemiology/oxford.htm

Denise Grady. Patient Safety Is Not Improving in Hospitals, Study Finds. The New York Times. 2010. Abstract: "Efforts to make hospitals safer for patients are falling short, researchers report in the first large study in a decade to analyze harm from medical care and to track it over time." [Accessed November 25, 2010]. Available at: http://www.nytimes.com/2010/11/25/health/research/25patient.html?hpw.

Journal article: Paul Glasziou, Iain Chalmers, Michael Rawlins, Peter McCulloch. When are randomised trials unnecessary? Picking signal from noise BMJ. 2007;334(7589):349 -351. Abstract: "Although randomised trials are widely accepted as the ideal way of obtaining unbiased estimates of treatment effects, some treatments have dramatic effects that are highly unlikely to reflect inadequately controlled biases. We compiled a list of historical examples of such effects and identified the features of convincing inferences about treatment effects from sources other than randomised trials. A unifying principle is the size of the treatment effect (signal) relative to the expected prognosis (noise) of the condition. A treatment effect is inferred most confidently when the signal to noise ratio is large and its timing is rapid compared with the natural course of the condition. For the examples we considered in detail the rate ratio often exceeds 10 and thus is highly unlikely to reflect bias or factors other than a treatment effect. This model may help to reduce controversy about evidence for treatments whose effects are so dramatic that randomised trials are unnecessary." [Accessed on April 4, 2011]. See Critical Appraisal for related links and pages. http://www.bmj.com/content/334/7589/349.abstract

Jacqueline A French. When Should We Pay Attention to Unfavorable News from Pregnancy Registries? Epilepsy Curr. 2007 March; 7(2): 36�37. doi: 10.1111/j.1535-7511.2007.00161.x. [Medline] [Abstract] [Full text] [PDF]. Description: Coming soon!

Nick Black. Why we need observational studies to evaluate the effectiveness of health care. BMJ. 1996;312(7040):1215 -1218. Excerpt: "The view is widely held that experimental methods (randomised controlled trials) are the �gold standard� for evaluation and that observational methods (cohort and case control studies) have little or no value. This ignores the limitations of randomised trials, which may prove unnecessary, inappropriate, impossible, or inadequate. Many of the problems of conducting randomised trials could often, in theory, be overcome, but the practical implications for researchers and funding bodies mean that this is often not possible. The false conflict between those who advocate randomised trials in all situations and those who believe observational data provide sufficient evidence needs to be replaced with mutual recognition of the complementary roles of the two approaches. Researchers should be united in their quest for scientific rigour in evaluation, regardless of the method used." [Accessed November 9, 2010]. Available at: http://www.bmj.com/content/312/7040/1215.short.

Creative Commons License All of the material above this paragraph is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2017-06-15. The material below this paragraph links to my old website, StATS. Although I wrote all of the material listed below, my ex-employer, Children's Mercy Hospital, has claimed copyright ownership of this material. The brief excerpts shown here are included under the fair use provisions of U.S. Copyright laws.


11. Stats: I don't want to use a randomized trial (July 18, 2007). An email on the MedStats group outlines a new treatment that is: 1. without any significant competing treatments, 2. utilized in a heterogenous patient population, and 3. difficult to study in a randomized trial. There are a variety of alternatives to a randomized study, but I suspect that this person wants to use a historical control study. It sounds like he wants an informal endorsement from a group of professional statisticians to use a historical control study instead of a randomized study.

10. Stats: How two bad control groups can add up to one good comparison (June 28, 2007). Many observational studies are criticized (often deservedly) for having a bad control group. If you choose a bad control group, you create an unfair (apples to oranges) comparison. But surprisingly, two controls groups, even if both are imperfect, can lead to a strong conclusion. The trick is to recognize that if one control group has a positive bias (it makes the treatment group look better than it should) and the other one has a negative bias (it makes the treatment group look worse than it should), then these two control groups bracket the ideal control group.

9. Stats: The debate about historical control groups (June 27, 2007). Someone on the Evidence Based Medicine email discussion group asked about how to appraise a "before and after" design. This is effectively the same as using a historical control group.  Historical control groups have a bad reputation.

8. Stats: The trouble with apples and oranges (June 25, 2007). I am still working on the details of a presentation for the Kansas City University of Medicine and Biosciences. They want me to talk at lunch during the 2007 Homecoming CME and Reunion weekend. The new title is "Medical Journals - The Trouble with Apples and Oranges."

7. Stats: When bad control groups happen to good researchers (June 15, 2007). The Kansas City University of Medicine and Biosciences wants me to give a light humorous talk at lunch during the 2007 Homecoming CME and Reunion weekend. Somehow, they provided me with a title for my talk, "Humor, Databases and Grant Proposals: What Strange Bedfellows" which is a fine title, but not the one I would have chosen. I'll talk it over with the organizers, but here's a possible choice: "When bad control groups happen to good researchers".


6. Stats: Abstainer errors in study of alcohol abuse (April 19, 2006). A correspondent in the MedStats email discussion group (RR), mentioned an interesting example of problems in defining groups in observational studies. The actual publication is Kaye Fillmore et al. "Moderate alcohol use and reduced mortality risk: systematic error in prospective studies." Addiction Research and Theory. Advanced online publication March 30, 2006.


5. Stats: Case cohort design (August 11, 2005). During a consultation about an NIH research grant, the term "case cohort design" came up. The Case Cohort design is similar to a nest Case Control design, but also has some important differences.

4. Stats: The paired availability design (May 31, 2005). In the quest to finish my book on Statistical Evidence, I had to leave some material on the cutting room floor. One of the nicer descriptions was about the paired availability design.

3. Stats: Non-random samples (March 25, 2005). Someone sent me an email asking about a project that involved interviews of women at higher levels of management in an organization. This is a rather small group, and might require a non-random selection process. What are the limitations of a non-random sample?

2. Stats: A collection of randomized and non-randomized studies (March 22, 2005). I'm updating some of my training classes to use examples from open source journals, because it is easier for me to include content of these articles directly in the web pages. An example of this is practice exercises for my training class Statistical Evidence: Apples or Oranges? But the previous practice exercise, which used a wider range of journals had some cute articles in the mix. I'll especially miss the article on episiotomy.

1. Stats: Spectrum Bias (January 4, 2005). I tried to start a page on diagnostic tests a while back, but have not had the time to fully develop it. One of the important issues for diagnostic tests is spectrum bias. The sensitivity and specificity of a diagnostic test can depend on who exactly is being tested. Think of disease as a range of possibilities from slight to moderate to extreme. If only a portion of the disease range is included, you may get an incorrect impression of how well a diagnostic test works. This is known as spectrum bias.

What now?

Browse other categories at this site

Browse through the most recent entries

Get help