P.Mean: Is there a scientific basis for EBM? (created 2008-08-20).

This page is moving to a new website.

A pair of articles in Chest, along with two rebuttals examines two sides to the debate over the validity of Evidence-Based Medicine (EBM).

The full free text of these articles will not be available until 12 months after publication (May 2009). I'll try to remember to add links to the full free text when they become available.

The Karanicolas et al article starts out with a perspective on what was available prior to EBM.

Prior to the promulgation of EBM and the systematic reviews that lie at the heart of EBM, expert evidence reviews and opinions disseminated in narrative textbooks and review articles were often idiosyncratic and arbitrary. The resulting recommendations were inconsistent, often lagged behind the evidence, and were sometimes contrary to the evidence.

Then Karanicolas points out a classic example (in Antam et al) of how textbook recommendations lagged behind the published evidence by several years. Karincolas then points out the need for a hierarchy of evidence, although decrying a simplistic belief that randomized trials are always the best.

EBM recognizes that RCTs may sometimes provide only low-quality evidence. GRADE identifies five categories of limitations that may downgrade quality of evidence from RCTs. First, methodologic limitations (including poorly concealed group allocation, lack of patient, clinician, or outcome assessor blinding, large loss to follow-up, or stopping early for efficacy) may bias study results. Second, small sample size with consequent wide confidence intervals may produce untrustworthy results. Third, RCTs may provide indirect evidence if the participants, interventions, comparators, or outcomes differ from those under consideration. For example, many trials measure the impact of an intervention on surrogate outcomes, such as BP or FEV1, but patients and clinicians are far more interested in outcomes such as mortality and quality of life. Fourth, inconsistent results leave us less certain. Fifth, selective publication of positive findings may bias results of systematic evidence summaries.

Finally, Karincolas shows how individual patient values can be incorporated into an EBM framework.

EBM scholars have long recognized that patients' and clinicians' values may differ systematically. Consider, for example, treatment of patients in atrial fibrillation with anticoagulation to prevent strokes. Treatment with warfarin reduces the risk of stroke in these patients but increases the risk of serious GI bleeds. Traditionally, clinicians might have considered the best available evidence, and decided to administer an anticoagulant if they believed the benefits outweighed the risks, or elected to withhold treatment if they believed the risks were too great. Implicitly, this approach relies on clinicians' values and preferences. Devereaux and colleagues asked patients and physicians how many additional serious GI bleeds they would be willing to accept to prevent eight strokes—four minor and four major—in 100 patients. The results demonstrate that patients are far more stroke averse than clinicians, and that there is huge diversity in values and preferences among both patients and physicians.

Perhaps my one criticism of this article is that Karincolas defines EBM from an idealized viewpoint. In practice, EBM may be excessively reliant on randomized trials, and may tend to ignore individual patient values. Clearly EBM defines a good way to practice medicine, but it is unclear if those who adopt EBM live up to those principles.

The counterpoint by Tobin attacks the concept of a hierarchy of evidence. Tobin cites a meta-analysis of homeopathy

Homeopathy uses drugs in which less than one molecule of active agent is present. Benefit with dilution beyond Avogadro number contradicts pharmacologic theory. A metaanalysis of 89 placebo-controlled trials revealed a combined odds of 2.45 in favor of homeopathy. EBM grades metaanalysis as level 1 evidence but completely ignores scientific theory. There is nothing necessarily wrong with this particular metaanalysis, but the example illustrates how a system that grades findings of all metaanalyses as level 1 evidence is inherently flawed. A grading system that ranks homeopathy as sounder evidence than centuries of pharmacologic science commits the reductio ad absurdum fallacy in logic.

Tobin then ties EBM to a philosophical viewpoint known as positivism.

This school contained some of the brightest minds of the early twentieth century. It dominated analytic philosophy of that period. Positivists developed a verifiability criterion, which demarcated "meaningful" from "meaningless" research statements. Popper and others pointed out two fundamental flaws of positivism; thereafter, positivism lost all supporters.

Tobin lists eight criteria for reliable research and argues that it is impossible to rank these.

Avoid assignment bias
Minimize random error
Minimize systematic error
Ensure accurate taxonomy
Ensure internal validity
Ensure external validity
Findings that fit within the corpus of knowledge
Reproducibility (withstands falsification attempts)

True enough, but then Tobin makes an error akin to his own criticism of the rigidity of the research hierarchy by claiming that

If one is absent, the research is no longer reliable.

First of all, none of these items is ever wholly present or absent, except perhaps in trivially simple situations. Just about every study that I have seen has some degree of systematic error, for example. So a rigid application of this list would imply that all research is unreliable.

Second, research can have some problems and yet still be reliable. A weak research design, for example, that is replicated across a variety of different settings is reliable.

Third, the requirement that all findings fit within the corpus of knowledge (this is a fancy way of asking whether there is a plausible biological mechanism) is too rigid. It does not allow for innovation and serendipitous discoveries. Even worse, there is no consensus about what fits or does not fit into the corpus of knowledge.

The list of eight criteria is a valuable one, but you cannot rely on it rigidly anymore than you can rely on an EBM research hierarchy rigidly.

These two articles are followed by rebuttals from each author to the other authors claim. There is little new in the comments, and it seems like both Karincolas and Tobin are talking past each other. Tobin does take an interesting swipe at Statistics in his rebuttal, though.

The fourth paragraph under "A Hierarchy of Evidence" gives five criteria for adjusting grading ratings. Compare these with the eight requirements for reliable research in my Table 3. The stark contrast emphasizes that EBM is built on statistics. Statistics, however, do not undo systematic errors or breaches of internal validity. Statistics are not the fount of new penicillins. Statistics do not confer the wisdom needed to make the right decision for a particular patient.

As a statistician, I can't disagree with the limitation on statistics, though I would say that proper use and understanding of statistics is one of several things required before you can consider yourself truly wise.

Although I tend to side with Karincolas, I think both sets of articles are worth reviewing.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2010-04-01. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Critical appraisal.