P.Mean >> Category >> Systematic overviews (created 2007-06-14). 

These pages discuss issues associated with a systematic overview (systematic review, meta-analysis). Also see Category: Information searching and Category: Publication bias.


22. P.Mean: Meta-analysis with non-comparable procedures (created 2011-10-31). Dear Professor Mean, .In publications on meta-analysis where vast numbers of papers must be culled from the analyzable dataset due to non-comparable procedures. The resulting smaller sample sizes can reduce power which then limits the ability to detect significance. Isn't this a problem?

21. P.Mean: How much work does that second reviewer have to do in a meta-analysis (created 2011-06-20). Someone asked about the process of using a second reviewer in a meta-analysis to abstract data from studies. The rationale for a second reviewer, of course, is to establish that there is no serious subjectivity involved with the recording of information from individual studies. By showing that two independent reviewers produced roughly comparable data set, you have established objectivity in the data abstraction step. The question arises, though, do you have to use the second reviewer on all studies, or can you just do this for a certain percentage of the studies. If so, is there a certain percentage that is generally accepted?


20. What is a L'Abbe plot? (December 2010)

19. P.Mean: Pooling different measures of risk in a meta-analysis (created 2010-07-26). Someone on the MEDSTATS email discussion group asked about how to pool results in a meta-analysis where some of the summary measures are reported as odds ratios, others as relative risks, and still others as hazard ratios. There's actually a fourth measure that is commonly used when the outcome measure is binary (live/dead, improved/not improved, relapsed/relapse free, etc.). That is the risk difference, and its inverse, the number needed to treat. Here's what I wrote in response.

18. P.Mean: When should research in a given area end? (created 2010-07-26). Someone asked a rather philosophical question, is there ever an end to research in a given area? Will there ever be a "last word" on a research topic. Here's what I wrote in response.

17. What is a forest plot? (April 2010)

16. P.Mean: Meta-analysis for a single mean estimate (created 2010-02-11). Someone noted that the usual meta analysis is carried out for the study on two treatment groups, usually for a difference in means. What if you had several studies estimating not a difference in means, but just estimates of a single mean. Could you conduct a meta-analysis in this situation?

15. The Monthly Mean: Heterogeneity in clinical trials--is it a bad thing or a good thing? (January 2010)


14. The Monthly Mean: Combining measures on different scales (December 2008). Someone was asking about meta-analysis and the process of combining outcomes measured on different scales. Some of the papers in the meta-analysis described their outcomes as a percentage change from baseline, and others as a simple difference in means. The difference in means has a unit attached to it (e.g. mg/ dL, mmol/ L, etc.), but the percentage change is unitless. There is a way to combine these measures, of course. Simply convert them to a standardized scale (Z-score) and then combine the Z-scores. The question is whether this is a legitimate approach.

13. P.Mean: My very first meta-analysis (created 2008-07-23). I am a research student embarking upon a systematic review and possible metanalysis. I am currently in the process of developing a protocol. I have been having difficulty understanding the statistical issues especially since I am not very good at mathematics. Could you kindly refer me to a source that would help me understand in a step by step way the concepts needed in doing a meta analysis? For example heterogeneity and the tests used for it which one is preferred and when; when to use subgroup analysis and when to use metaregression. I have been reading the Cochrane hand book for the purpose as advised by my supervisor but have not been able to understand the concepts. Any help from you would be greatly appreciated.

Outside resources:

Santiago Moreno, Alex Sutton, A Ades, et al. Assessment of regression-based methods to adjust for publication bias through a comprehensive simulation study. BMC Medical Research Methodology. 2009;9(1):2. Abstract: "BACKGROUND: In meta-analysis, the presence of funnel plot asymmetry is attributed to publication or other small-study effects, which causes larger effects to be observed in the smaller studies. This issue potentially mean inappropriate conclusions are drawn from a meta-analysis. If meta-analysis is to be used to inform decision-making, a reliable way to adjust pooled estimates for potential funnel plot asymmetry is required. METHODS: A comprehensive simulation study is presented to assess the performance of different adjustment methods including the novel application of several regression-based methods (which are commonly applied to detect publication bias rather than adjust for it) and the popular Trim & Fill algorithm. Meta-analyses with binary outcomes, analysed on the log odds ratio scale, were simulated by considering scenarios with and without i) publication bias and; ii) heterogeneity. Publication bias was induced through two underlying mechanisms assuming the probability of publication depends on i) the study effect size; or ii) the p-value. RESULTS: The performance of all methods tended to worsen as unexplained heterogeneity increased and the number of studies in the meta-analysis decreased. Applying the methods conditional on an initial test for the presence of funnel plot asymmetry generally provided poorer performance than the unconditional use of the adjustment method. Several of the regression based methods consistently outperformed the Trim & Fill estimators. CONCLUSIONS: Regression-based adjustments for publication bias and other small study effects are easy to conduct and outperformed more established methods over a wide range of simulation scenarios." [Accessed February 24, 2009]. Available at: http://www.biomedcentral.com/1471-2288/9/2.

Jacques LeLorier, Genevieve Gregoire, Abdeltif Benhaddad, Julie Lapierre, Francois Derderian. Discrepancies between Meta-Analyses and Subsequent Large Randomized, Controlled Trials. N Engl J Med. 1997;337(8):536-542. Abstract: "Background: Meta-analyses are now widely used to provide evidence to support clinical strategies. However, large randomized, controlled trials are considered the gold standard in evaluating the efficacy of clinical interventions. Methods: We compared the results of large randomized, controlled trials (involving 1000 patients or more) that were published in four journals (the New England Journal of Medicine, the Lancet, the Annals of Internal Medicine, and the Journal of the American Medical Association) with the results of meta-analyses published earlier on the same topics. Regarding the principal and secondary outcomes, we judged whether the findings of the randomized trials agreed with those of the corresponding meta-analyses, and we determined whether the study results were positive (indicating that treatment improved the outcome) or negative (indicating that the outcome with treatment was the same or worse than without it) at the conventional level of statistical significance (P<0.05). Results: We identified 12 large randomized, controlled trials and 19 meta-analyses addressing the same questions. For a total of 40 primary and secondary outcomes, agreement between the meta-analyses and the large clinical trials was only fair (kappa = 0.35; 95 percent confidence interval, 0.06 to 0.64). The positive predictive value of the meta-analyses was 68 percent, and the negative predictive value 67 percent. However, the difference in point estimates between the randomized trials and the meta-analyses was statistically significant for only 5 of the 40 comparisons (12 percent). Furthermore, in each case of disagreement a statistically significant effect of treatment was found by one method, whereas no statistically significant effect was found by the other. Conclusions: The outcomes of the 12 large randomized, controlled trials that we studied were not predicted accurately 35 percent of the time by the meta-analyses published previously on the same topics." [Accessed March 7, 2009]. Available at: http://content.nejm.org/cgi/content/abstract/337/8/536.

Simona Vecchi, Valeria Belleudi, Laura Amato, Marina Davoli, Carlo Perucci. Does direction of results of abstracts submitted to scientific conferences on drug addiction predict full publication?. BMC Medical Research Methodology. 2009;9(1):23. Abstract: "BACKGROUND: Data from scientific literature show that about 63% of abstracts presented at biomedical conferences will be published in full. Some studies have indicated that full publication is associated with the direction of results (publication bias). No study has looked into the occurrence of publication bias in the field of addiction. Objectives: To investigate whether the significance or direction of results of abstracts presented at the major international scientific conference on addiction is associated with full publication. METHODS: The conference proceedings of the US Annual Meeting of the College on Problems of Drug Dependence (CPDD), were handsearched for abstracts of randomized controlled trials and controlled clinical trials that evaluated interventions for prevention, rehabilitation and treatment of drug addiction in humans (years searched 1993-2002). Data regarding the study designs and outcomes reported were extracted. Subsequent publication in peer reviewed journals was searched in MEDLINE and EMBASE databases, as of March 2006. RESULTS: Out of 5919 abstracts presented, 581 met the inclusion criteria; 359 (62%) conference abstracts had been published in a broad variety of peer reviewed journals (average time of publication 2.6 years, SD +/- 1.78). The proportion of published studies was almost the same for randomized controlled trials (62.4 %) and controlled clinical trials (59.5 %) while studies that reported positive results were significantly more likely to be published (74.5%) than those that did not report statistical results (60.9%), negative or null results (47.1%) and no results (38.6%). Abstracts reporting positive results had a significantly higher probability of being published in full, while abstracts reporting null or negative results were half as likely to be published compared with positive ones (HR=0.48; 95%CI 0.30-0.74). CONCLUSIONS: Clinical trials were the minority of abstracts presented at the CPDD; we found evidence of possible publication bias in the field of addiction, with negative or null results having half the likelihood of being published than positive ones." [Accessed April 23, 2009]. Available at: http://www.biomedcentral.com/1471-2288/9/23.

Jonathan J. Shuster. Empirical vs natural weighting in random effects meta-analysis. Statistics in Medicine. 2010;29(12):1259-1265. Abstract: "This article brings into serious question the validity of empirically based weighting in random effects meta-analysis. These methods treat sample sizes as non-random, whereas they need to be part of the random effects analysis. It will be demonstrated that empirical weighting risks substantial bias. Two alternate methods are proposed. The first estimates the arithmetic mean of the population of study effect sizes per the classical model for random effects meta-analysis. We show that anything other than an unweighted mean of study effect sizes will risk serious bias for this targeted parameter. The second method estimates a patient level effect size, something quite different from the first. To prevent inconsistent estimation for this population parameter, the study effect sizes must be weighted in proportion to their total sample sizes for the trial. The two approaches will be presented for a meta-analysis of a nasal decongestant, while at the same time will produce counter-intuitive results for the DerSimonian-Laird approach, the most popular empirically based weighted method. It is concluded that all past publications based on empirically weighted random effects meta-analysis should be revisited to see if the qualitative conclusions hold up under the methods proposed herein. It is also recommended that empirically based weighted random effects meta-analysis not be used in the future, unless strong cautions about the assumptions underlying these analyses are stated, and at a minimum, some form of secondary analysis based on the principles set forth in this article be provided to supplement the primary analysis. Copyright � 2009 John Wiley & Sons, Ltd." [Accessed June 29, 2010]. Available at: http://dx.doi.org/10.1002/sim.3607.

Journal article: Stela Pudar Hozo, Benjamin Djulbegovic, Iztok Hozo. Estimating the mean and variance from the median, range, and the size of a sample. BMC Med Res Methodol. 2005;5:13. Abstract: "BACKGROUND: Usually the researchers performing meta-analysis of continuous outcomes from clinical trials need their mean value and the variance (or standard deviation) in order to pool data. However, sometimes the published reports of clinical trials only report the median, range and the size of the trial. METHODS: In this article we use simple and elementary inequalities and approximations in order to estimate the mean and the variance for such trials. Our estimation is distribution-free, i.e., it makes no assumption on the distribution of the underlying data. RESULTS: We found two simple formulas that estimate the mean using the values of the median (m), low and high end of the range (a and b, respectively), and n (the sample size). Using simulations, we show that median can be used to estimate mean when the sample size is larger than 25. For smaller samples our new formula, devised in this paper, should be used. We also estimated the variance of an unknown sample using the median, low and high end of the range, and the sample size. Our estimate is performing as the best estimate in our simulations for very small samples (n < or = 15). For moderately sized samples (15 < n < or = 70), our simulations show that the formula range/4 is the best estimator for the standard deviation (variance). For large samples (n > 70), the formula range/6 gives the best estimator for the standard deviation (variance). We also include an illustrative example of the potential value of our method using reports from the Cochrane review on the role of erythropoietin in anemia due to malignancy. CONCLUSION: Using these formulas, we hope to help meta-analysts use clinical trials in their analysis even when not all of the information is available and/or reported." [Accessed on May 10, 2012]. http://www.ncbi.nlm.nih.gov/pubmed/15840177.

Journal article: John P. A. Ioannidis. Excess Significance Bias in the Literature on Brain Volume Abnormalities Arch Gen Psychiatry. 2011;68(8):773-780. Abstract: "Context: Many studies report volume abnormalities in diverse brain structures in patients with various mental health conditions. Objective: To evaluate whether there is evidence for an excess number of statistically significant results in studies of brain volume abnormalities that suggest the presence of bias in the literature. Data Sources: PubMed (articles published from January 2006 to December 2009). Study Selection: Recent meta-analyses of brain volume abnormalities in participants with various mental health conditions vs control participants with 6 or more data sets included, excluding voxel-based morphometry. Data Extraction: Standardized effect sizes were extracted in each data set, and it was noted whether the results were "positive" (P < .05) or not. For each data set in each meta-analysis, I estimated the power to detect at = .05 an effect equal to the summary effect of the respective meta-analysis. The sum of the power estimates gives the number of expected positive data sets. The expected number of positive data sets can then be compared against the observed number. Data Synthesis: From 8 articles, 41 meta-analyses with 461 data sets were evaluated (median, 10 data sets per meta-analysis) pertaining to 7 conditions. Twenty-one of the 41 meta-analyses had found statistically significant associations, and 142 of 461 (31%) data sets had positive results. Even if the summary effect sizes of the meta-analyses were unbiased, the expected number of positive results would have been only 78.5 compared with the observed number of 142 (P < .001). Conclusion: There are too many studies with statistically significant results in the literature on brain volume abnormalities. This pattern suggests strong biases in the literature, with selective outcome reporting and selective analyses reporting being possible explanations." [Accessed on January 4, 2012].

Gordon H Guyatt, Andrew D Oxman, Gunn E Vist, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008;336(7650):924-926. Excerpt: "Guideline developers around the world are inconsistent in how they rate quality of evidence and grade strength of recommendations. As a result, guideline users face challenges in understanding the messages that grading systems try to communicate. Since 2006 the BMJ has requested in its "Instructions to Authors" on bmj.com that authors should preferably use the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for grading evidence when submitting a clinical guidelines article. What was behind this decision? In this first in a series of five articles we will explain why many organisations use formal systems to grade evidence and recommendations and why this is important for clinicians; we will focus on the GRADE approach to recommendations. In the next two articles we will examine how the GRADE system categorises quality of evidence and strength of recommendations. The final two articles will focus on recommendations for diagnostic tests and GRADE�s framework for tackling the impact of interventions on use of resources." [Accessed January 3, 2009]. Available at: http://www.bmj.com/cgi/content/full/336/7650/924.

Grade working group. The Grading of Recommendations Assessment, Development and Evaluation (short GRADE) Working Group. Excerpt: "The Grading of Recommendations Assessment, Development and Evaluation (short GRADE) Working Group began in the year 2000 as an informal collaboration of people with an interest in addressing the shortcomings of present grading systems in health care. The working group has developed a common, sensible and transparent approach to grading quality of evidence and strength of recommendations. Many international organizations have provided input into the development of the approach and have started using it." [Accessed March 10, 2009]. Available at: http://www.gradeworkinggroup.org/index.htm.

Kaveh G. Shojania, Margaret Sampson, Mohammed T. Ansari, et al. How Quickly Do Systematic Reviews Go Out of Date? A Survival Analysis. Ann Intern Med. 2007;147(4):224-233. Abstract: "Background: Systematic reviews are often advocated as the best source of evidence to guide clinical decisions and health care policy, yet we know little about the extent to which they require updating. Objective: To estimate the average time to changes in evidence that are sufficiently important to warrant updating systematic reviews. Design: Survival analysis of 100 quantitative systematic reviews. Sample: Systematic reviews published from 1995 to 2005 and indexed in ACP Journal Club. Eligible reviews evaluated a specific drug or class of drug, device, or procedure and included only randomized or quasi-randomized, controlled trials. Measurements: Quantitative signals for updating were changes in statistical significance or relative changes in effect magnitude of at least 50% involving 1 of the primary outcomes of the original systematic review or any mortality outcome. Qualitative signals included substantial differences in characterizations of effectiveness, new information about harm, and caveats about the previously reported findings that would affect clinical decision making. Results: The cohort of 100 systematic reviews included a median of 13 studies and 2663 participants per review. A qualitative or quantitative signal for updating occurred for 57% of reviews (95% CI, 47% to 67%). Median duration of survival free of a signal for updating was 5.5 years (CI, 4.6 to 7.6 years). However, a signal occurred within 2 years for 23% of reviews and within 1 year for 15%. In 7%, a signal had already occurred at the time of publication. Only 4% of reviews had a signal within 1 year of the end of the reported search period; 11% had a signal within 2 years of the search. Shorter survival was associated with cardiovascular topics (hazard ratio, 2.70 [CI, 1.36 to 5.34]) and heterogeneity in the original review (hazard ratio, 2.15 [CI, 1.12 to 4.11]). Limitation: Judgments of the need for updating were made without involving content experts. Conclusion: In a cohort of high-quality systematic reviews directly relevant to clinical practice, signals for updating occurred frequently and within a relatively short time." [Accessed March 10, 2009]. Available at: http://www.annals.org/cgi/content/abstract/147/4/224.

C. Elizabeth McCarron, Eleanor Pullenayegum, Lehana Thabane, Ron Goeree, Jean-Eric Tarride. The importance of adjusting for potential confounders in Bayesian hierarchical models synthesising evidence from randomised and non-randomised studies: an application comparing treatments for abdominal aortic aneurysms. BMC Medical Research Methodology. 2010;10(1):64. Abstract: "BACKGROUND: Informing health care decision making may necessitate the synthesis of evidence from different study designs (e.g., randomised controlled trials, non-randomised/observational studies). Methods for synthesising different types of studies have been proposed, but their routine use requires development of approaches to adjust for potential biases, especially among non-randomised studies. The objective of this study was to extend a published Bayesian hierarchical model to adjust for bias due to confounding in synthesising evidence from studies with different designs. METHODS: In this new methodological approach, study estimates were adjusted for potential confounders using differences in patient characteristics (e.g., age) between study arms. The new model was applied to synthesise evidence from randomised and non-randomised studies from a published review comparing treatments for abdominal aortic aneurysms. We compared the results of the Bayesian hierarchical model adjusted for differences in study arms with: 1) unadjusted results, 2) results adjusted using aggregate study values and 3) two methods for downweighting the potentially biased non-randomised studies. Sensitivity of the results to alternative prior distributions and the inclusion of additional covariates were also assessed. RESULTS: In the base case analysis, the estimated odds ratio was 0.32 (0.13,0.76) for the randomised studies alone and 0.57 (0.41,0.82) for the non-randomised studies alone. The unadjusted result for the two types combined was 0.49 (0.21,0.98). Adjusted for differences between study arms, the estimated odds ratio was 0.37 (0.17,0.77), representing a shift towards the estimate for the randomised studies alone. Adjustment for aggregate values resulted in an estimate of 0.60 (0.28,1.20). The two methods used for downweighting gave odd ratios of 0.43 (0.18,0.89) and 0.35 (0.16,0.76), respectively. Point estimates were robust but credible intervals were wider when using vaguer priors. CONCLUSIONS: Covariate adjustment using aggregate study values does not account for covariate imbalances between treatment arms and downweighting may not eliminate bias. Adjustment using differences in patient characteristics between arms provides a systematic way of adjusting for bias due to confounding. Within the context of a Bayesian hierarchical model, such an approach could facilitate the use of all available evidence to inform health policy decisions." [Accessed July 14, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/64.

Kimiko Broeze, Brent Opmeer, Lucas Bachmann, et al. Individual patient data meta-analysis of diagnostic and prognostic studies in obstetrics, gynaecology and reproductive medicine.. BMC Medical Research Methodology. 2009;9(1):22. Abstract: "BACKGROUND: In clinical practice a diagnosis is based on a combination of clinical history, physical examination and additional diagnostic tests. At present, studies on diagnostic research often report the accuracy of tests without taking into account the information already known from history and examination. Due to this lack of information, together with variations in design and quality of studies, conventional meta-analyses based on these studies will not show the accuracy of the tests in real practice. By using individual patient data (IPD) to perform meta-analyses, the accuracy of tests can be assessed in relation to other patient characteristics and allows the development or evaluation of diagnostic algorithms for individual patients. In this study we will examine these potential benefits in four clinical diagnostic problems in the field of gynaecology, obstetrics and reproductive medicine. METHODS: Based on earlier systematic reviews for each of the four clinical problems, studies are considered for inclusion. The first authors of the included studies will be invited to participate and share their original data. After assessment of validity and completeness, the acquired datasets are merged. Based on these data, a series of analyses will be performed, including a systematic comparison of the results of the IPD meta-analysis with those of a conventional meta-analysis, development of multivariable models for clinical history alone and for the combination of history, physical examination and relevant diagnostic tests, and development of clinical prediction rules for individual patients. These will be made accessible for clinicians. DISCUSSION: The use of IPD meta-analysis will allow evaluating accuracy of diagnostic tests in relation to other relevant information. Ultimately, this could increase the efficiency of the diagnostic work-up, e.g. by reducing the need for invasive tests and/or improving the accuracy of the diagnostic workup. This study will assess whether these benefits of IPD meta-analysis over conventional meta-analysis can be exploited and will provide a framework for future IPD meta-analyses in diagnostic research." [Accessed March 30, 2009]. Available at: http://www.biomedcentral.com/1471-2288/9/22.

Ian Shrier, Jean-Francois Boivin, Robert Platt, et al. The interpretation of systematic reviews with meta-analyses: an objective or subjective process? BMC Medical Informatics and Decision Making. 2008;8(1):19. Abstract: "BACKGROUND: Discrepancies between the conclusions of different meta-analyses (quantitative syntheses of systematic reviews) are often ascribed to methodological differences. The objective of this study was to determine the discordance in interpretations when meta-analysts are presented with identical data. METHODS: We searched the literature for all randomized clinical trials (RCT) and review articles on the efficacy of intravenous magnesium in the early post-myocardial infarction period. We organized the articles chronologically and grouped them in packages. The first package included the first RCT, and a summary of the review articles published prior to first RCT. The second package contained the second and third RCT, a meta-analysis based on the data, and a summary of all review articles published prior to the third RCT. Similar packages were created for the 5th RCT, 10th RCT, 20th RCT and 23rd RCT (all articles). We presented the packages one at a time to eight different reviewers and asked them to answer three clinical questions after each package based solely on the information provided. The clinical questions included whether 1) they believed magnesium is now proven beneficial, 2) they believed magnesium will eventually be proven to be beneficial, and 3) they would recommend its use at this time. RESULTS: There was considerable disagreement among the reviewers for each package, and for each question. The discrepancies increased when the heterogeneity of the data increased. In addition, some reviewers became more sceptical of the effectiveness of magnesium over time, and some reviewers became less sceptical. CONCLUSION: The interpretation of the results of systematic reviews with meta-analyses includes a subjective component that can lead to discordant conclusions that are independent of the methodology used to obtain or analyse the data." [Accessed December 29, 2010]. Available at: http://www.biomedcentral.com/1472-6947/8/19.

A Caveman. The invited review - or, my field, from my standpoint, written by me using only my data and my ideas, and citing only my publications. J Cell Sci. 2000;113(18):3125-3126. Comment: The title is better than any summary I could write. [Accessed September 27, 2010]. Available at: http://jcs.biologists.org/cgi/content/abstract/113/18/3125.

Wim Van Biesen, Francis Verbeke, Raymond Vanholder. An infallible recipe? A story of cinnamon, souffle and meta-analysis. Nephrol. Dial. Transplant. 2008;23(9):2729-2732. Excerpt: "Meta-analyses certainly do have their place in scientific research. Like herbs, if used in the correct dish, and not too much or too often, they can give that extra bit of flavour that turns �food� into a �delicious dish�. However, meta-analyses are like cinnamon: very tasteful in small quantities and in the right dish, but if you use them too much or in the wrong dish, it ruins all other flavours and you get nausea. Just as for the cinnamon, it requires skills and insight to know when and how to use a meta-analysis." [Accessed May 27, 2010]. Available at: http://ndt.oxfordjournals.org/cgi/content/full/23/9/2729.

Gunther Eysenbach, Per Egil Kummervold. "Is Cybermedicine Killing You?" - The Story of a Cochrane Disaster. J Med Internet Res. 2005;7(2):e21. Abstract: "This editorial briefly reviews the series of unfortunate events that led to the publication, dissemination, and eventual retraction of a flawed Cochrane systematic review on interactive health communication applications (IHCAs), which was widely reported in the media with headlines such as "Internet Makes Us Sick," "Knowledge May Be Hazardous to Web Consumers' Health," "Too Much Advice Can Be Bad for Your Health," "Click to Get Sick?" and even "Is Cybermedicine Killing You?" While the media attention helped to speed up the identification of errors, leading to a retraction of the review after only 13 days, a paper published in this issue of JMIR by Rada shows that the retraction, in contrast to the original review, remained largely unnoticed by the public. We discuss the three flaws of the review, which include (1) data extraction and coding errors, (2) the pooling of heterogeneous studies, and (3) a problematic and ambiguous scope and, possibly, some overlooked studies. We then discuss "retraction ethics" for researchers, editors/publishers, and journalists. Researchers and editors should, in the case of retractions, match the aggressiveness of the original dissemination campaign if errors are detected. It is argued that researchers and their organizations may have an ethical obligation to track down journalists who reported stories on the basis of a flawed study and to specifically ask them to publish an article indicating the error. Journalists should respond to errors or retractions with reports that have the same prominence as the original story. Finally, we look at some of the lessons for the Cochrane Collaboration, which include (1) improving the peer-review system by routinely sending out pre-prints to authors of the original studies, (2) avoiding downplay of the magnitude of errors if they occur, (3) addressing the usability issues of RevMan, and (4) making critical articles such as retraction notices open access." [Accessed October 26, 2010]. Available at: http://www.jmir.org/2005/2/e21/.

Jonathan J. Deeks. Issues in the selection of a summary statistic for meta-analysis of clinical trials with binary outcomes. Statistics in Medicine. 2002;21(11):1575-1600. Abstract: "Meta-analysis of binary data involves the computation of a weighted average of summary statistics calculated for each trial. The selection of the appropriate summary statistic is a subject of debate due to conflicts in the relative importance of mathematical properties and the ability to intuitively interpret results. This paper explores the process of identifying a summary statistic most likely to be consistent across trials when there is variation in control group event rates. Four summary statistics are considered: odds ratios (OR); risk differences (RD) and risk ratios of beneficial (RR(B)); and harmful outcomes (RR(H)). Each summary statistic corresponds to a different pattern of predicted absolute benefit of treatment with variation in baseline risk, the greatest difference in patterns of prediction being between RR(B) and RR(H). Selection of a summary statistic solely based on identification of the best-fitting model by comparing tests of heterogeneity is problematic, principally due to low numbers of trials. It is proposed that choice of a summary statistic should be guided by both empirical evidence and clinically informed debate as to which model is likely to be closest to the expected pattern of treatment benefit across baseline risks. Empirical investigations comparing the four summary statistics on a sample of 551 systematic reviews provide evidence that the RR and OR models are on average more consistent than RD, there being no difference on average between RR and OR. From a second sample of 114 meta-analyses evidence indicates that for interventions aimed at preventing an undesirable event, greatest absolute benefits are observed in trials with the highest baseline event rates, corresponding to the model of constant RR(H). The appropriate selection for a particular meta-analysis may depend on understanding reasons for variation in control group event rates; in some situations uncertainty about the choice of summary statistic will remain. Copyright � 2002 John Wiley & Sons, Ltd." [Accessed December 18, 2009]. Available at: http://dx.doi.org/10.1002/sim.1188.

Gluud LL, Sørensen TI, Gøtzsche PC, Gluud C. The journal impact factor as a predictor of trial quality and outcomes: cohort study of hepatobiliary randomized clinical trials. Am J Gastroenterol. 2005 Nov;100(11):2431-5.

Tam Cam Ha, Say Beng Tan, Khee Chee Soo. The journal impact factor: too much of an impact? Ann Acad Singapore 2006; 35: 911-6

C David Naylor. Meta-analysis and the meta-epidemiology of clinical research. BMJ. 1997;315:617-9. Excerpt: "This week's BMJ contains a pot-pourri of materials that deal with the research methodology of meta-analysis. Meta-analysis in clinical research is based on simple principles: systematically searching out, and, when possible, quantitatively combining the results of all studies that have addressed a similar research question. Given the information explosion in clinical research, the logic of basing research reviews on systematic searching and careful quantitative compilation of study results is incontrovertible. However, one aspect of meta-analysis as applied to randomised trials has always been controversial1 2 �combining data from multiple studies into single estimates of treatment effect." [Accessed May 19, 2010]. Available at: http://www.bmj.com/cgi/content/extract/315/7109/617.

Byron Wallace, Christopher Schmid, Joseph Lau, Thomas Trikalinos. Meta-Analyst: software for meta-analysis of binary, continuous and diagnostic data. BMC Medical Research Methodology. 2009;9(1):80. Abstract: "BACKGROUND: Meta-analysis is increasingly used as a key source of evidence synthesis to inform clinical practice. The theory and statistical foundations of meta-analysis continually evolve, providing solutions to many new and challenging problems. In practice, most meta-analyses are performed in general statistical packages or dedicated meta-analysis programs. RESULTS: Herein, we introduce Meta-Analyst, a novel, powerful, intuitive, and free meta-analysis program for the meta-analysis of a variety of problems. Meta-Analyst is implemented in C# atop of the Microsoft .NET framework, and features a graphical user interface. The software performs several meta-analysis and meta-regression models for binary and continuous outcomes, as well as analyses for diagnostic and prognostic test studies in the frequentist and Bayesian frameworks. Moreover, Meta-Analyst includes a flexible tool to edit and customize generated meta-analysis graphs (e.g., forest plots) and provides output in many formats (images, Adobe PDF, Microsoft Word-ready RTF). The software architecture employed allows for rapid changes to be made to either the Graphical User Interface (GUI) or to the analytic modules. We verified the numerical precision of Meta-Analyst by comparing its output with that from standard meta-analysis routines in Stata over a large database of 11,803 meta-analyses of binary outcome data, and 6,881 meta-analyses of continuous outcome data from the Cochrane Library of Systematic Reviews. Results from analyses of diagnostic and prognostic test studies have been verified in a limited number of meta-analyses versus MetaDisc and MetaTest. Bayesian statistical analyses use the OpenBUGS calculation engine (and are thus as accurate as the standalone OpenBUGS software). CONCLUSION: We have developed and validated a new program for conducting meta-analyses that combines the advantages of existing software for this task." [Accessed December 9, 2009]. Available at: http://www.biomedcentral.com/1471-2288/9/80.

Edward Mills, Beth Rachlis, Chris O'Regan, Lehana Thabane, Dan Perri. Metastatic renal cell cancer treatments: An indirect comparison meta-analysis. BMC Cancer. 2009;9(1):34. Abstract: "BACKGROUND: Treatment for metastatic renal cell cancer (mRCC) has advanced dramatically with understanding of the pathogenesis of the disease. New treatment options may provide improved progression-free survival (PFS). We aimed to determine the relative effectiveness of new therapies in this field. METHODS: We conducted comprehensive searches of 11 electronic databases from inception to April 2008. We included randomized trials (RCTs) that evaluated bevacizumab, sorafenib, and sunitinib. Two reviewers independently extracted data, in duplicate. Our primary outcome was investigator-assessed PFS. We performed random-effects meta-analysis with a mixed treatment comparison analysis. RESULTS: We included 3 bevacizumab (2 of bevacizumab plus interferon-a [IFN-a]), 2 sorafenib, 1 sunitinib, and 1 temsirolimus trials (total n=3,957). All interventions offer advantages for PFS. Using indirect comparisons with interferon-alpha as the common comparator, we found that sunitinib was superior to both sorafenib (HR 0.58, 95% CI, 0.38-0.86, P=<0.001) and bevacizumab + IFN-a (HR 0.75, 95% CI, 0.60-0.93, P=0.001). Sorafenib was not statistically different from bevacizumab +IFN-a in this same indirect comparison analysis (HR 0.77, 95% CI, 0.52-1.13, P=0.23). Using placebo as the similar comparator, we were unable to display a significant difference between sorafenib and bevacizumab alone (HR 0.81, 95% CI, 0.58-1.12, P=0.23). Temsirolimus provided significant PFS in patients with poor prognosis (HR 0.69, 95% CI, 0.57-0.85). CONCLUSIONS: New interventions for mRCC offer a favourable PFS for mRCC compared to interferon-alpha and placebo." [Accessed January 30, 2009]. Available at: http://www.biomedcentral.com/1471-2407/9/34.

Laura Rosen, Michal Ben Noach, Elliot Rosenberg. Missing the forest (plot) for the trees? A critique of the systematic review in tobacco control. BMC Medical Research Methodology. 2010;10(1):34. Abstract: "BACKGROUND: The systematic review (SR) lies at the core of evidence-based medicine. While it may appear that the SR provides a reliable summary of existing evidence, standards of SR conduct differ. The objective of this research was to examine systematic review (SR) methods used by the Cochrane Collaboration ("Cochrane") and the Task Force on Community Preventive Services ("the Guide") for evaluation of effectiveness of tobacco control interventions. METHODS: We searched for all reviews of tobacco control interventions published by Cochrane (4th quarter 2008) and the Guide. We recorded design rigor of included studies, data synthesis method, and setting. RESULTS: About a third of the Cochrane reviews and two thirds of the Guide reviews of interventions in the community setting included uncontrolled trials. Most (74%) Cochrane reviews in the clinical setting, but few (15%) in the community setting, provided pooled estimates from RCTs. Cochrane often presented the community results narratively. The Guide did not use inferential statistical approaches to assessment of effectiveness. CONCLUSIONS: Policy makers should be aware that SR methods differ, even among leading producers of SRs and among settings studied. The traditional SR approach of using pooled estimates from RCTs is employed frequently for clinical but infrequently for community-based interventions. The common lack of effect size estimates and formal tests of significance limit the contribution of some reviews to evidence-based decision making. Careful exploration of data by subgroup, and appropriate use of random effects models, may assist researchers in overcoming obstacles to pooling data." [Accessed May 1, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/34.

Leon Bax, Noriaki Ikeda, Naohito Fukui, et al. More Than Numbers: The Power of Graphs in Meta-Analysis. Am. J. Epidemiol. 2009;169(2):249-255. Abstract: "In meta-analysis, the assessment of graphs is widely used in an attempt to identify or rule out heterogeneity and publication bias. A variety of graphs are available for this purpose. To date, however, there has been no comparative evaluation of the performance of these graphs. With the objective of assessing the reproducibility and validity of graph ratings, the authors simulated 100 meta-analyses from 4 scenarios that covered situations with and without heterogeneity and publication bias. From each meta-analysis, the authors produced 11 types of graphs (box plot, weighted box plot, standardized residual histogram, normal quantile plot, forest plot, 3 kinds of funnel plots, trim-and-fill plot, Galbraith plot, and L'Abbe plot), and 3 reviewers assessed the resulting 1,100 plots. The intraclass correlation coefficients (ICCs) for reproducibility of the graph ratings ranged from poor (ICC = 0.34) to high (ICC = 0.91). Ratings of the forest plot and the standardized residual histogram were best associated with parameter heterogeneity. Association between graph ratings and publication bias (censorship of studies) was poor. Meta-analysts should be selective in the graphs they choose for the exploration of their data." [Accessed May 19, 2010]. Available at: http://aje.oxfordjournals.org/cgi/content/abstract/169/2/249.

Beth Woods, Neil Hawkins, David Scott. Network meta-analysis on the log-hazard scale, combining count and hazard ratio statistics accounting for multi-arm trials: A tutorial. BMC Medical Research Methodology. 2010;10(1):54. Abstract: "BACKGROUND: Data on survival endpoints are usually summarised using either hazard ratio, cumulative number of events, or median survival statistics. Network meta-analysis, an extension of traditional pairwise meta-analysis, is typically based on a single statistic. In this case, studies which do not report the chosen statistic are excluded from the analysis which may introduce bias. METHODS: In this paper we present a tutorial illustrating how network meta-analyses of survival endpoints can combine count and hazard ratio statistics in a single analysis on the hazard ratio scale. We also describe methods for accounting for the correlations in relative treatment effects (such as hazard ratios) that arise in trials with more than two arms. Combination of count and hazard ratio data in a single analysis is achieved by estimating the cumulative hazard for each trial arm reporting count data. Correlation in relative treatment effects in multi-arm trials is preserved by converting the relative treatment effect estimates (the hazard ratios) to arm-specific outcomes (hazards). RESULTS: A worked example of an analysis of mortality data in chronic obstructive pulmonary disease (COPD) is used to illustrate the methods. The data set and WinBUGS code for fixed and random effects models are provided. CONCLUSIONS: By incorporating all data presentations in a single analysis, we avoid the potential selection bias associated with conducting an analysis for a single statistic and the potential difficulties of interpretation, misleading results and loss of available treatment comparisons associated with conducting separate analyses for different summary statistics." [Accessed June 14, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/54.

Karim Hirji. No short-cut in assessing trial quality: a case study. Trials. 2009;10(1):1. Abstract: "BACKGROUND: Assessing the quality of included trials is a central part of a systematic review. Many check-list type of instruments for doing this exist. Using a trial of antibiotic treatment for acute otitis media, Burke et al., BMJ, 1991, as the case study, this paper illustrates some limitations of the check-list approach to trial quality assessment. RESULTS: The general verdict from the check list type evaluations in nine relevant systematic reviews was that Burke et al. (1991) is a good quality trial. All relevant meta-analyses extensively used its data to formulate therapeutic evidence. My comprehensive evaluation, on the other hand, brought to the surface a series of serious problems in the design, conduct, analysis and report of this trial that were missed by the earlier evaluations. CONCLUSION: A check-list or instrument based approach, if used as a short-cut, may at times rate deeply flawed trials as good quality trials. Check lists are crucial but they need to be augmented with an in-depth review, and where possible, a scrutiny of the protocol, trial records, and original data. The extent and severity of the problems I uncovered for this particular trial warrant an independent audit before it is included in a systematic review." [Accessed February 23, 2009]. Available at: http://www.trialsjournal.com/content/10/1/1.

Santiago G Moreno, Alex J Sutton, Erick H Turner, et al. Novel methods to deal with publication biases: secondary analysis of antidepressant trials in the FDA trial registry database and related journal publications. BMJ. 2009;339(aug07_1):b2981. Abstract: "Objective: To assess the performance of novel contour enhanced funnel plots and a regression based adjustment method to detect and adjust for publication biases. Design Secondary analysis of a published systematic literature review. Data sources: Placebo controlled trials of antidepressants previously submitted to the US Food and Drug Administration (FDA) and matching journal publications. Methods: Publication biases were identified using novel contour enhanced funnel plots, a regression based adjustment method, Egger's test, and the trim and fill method. Results were compared with a meta-analysis of the gold standard data submitted to the FDA. Results: Severe asymmetry was observed in the contour enhanced funnel plot that appeared to be heavily influenced by the statistical significance of results, suggesting publication biases as the cause of the asymmetry. Applying the regression based adjustment method to the journal data produced a similar pooled effect to that observed by a meta-analysis of the FDA data. Contrasting journal and FDA results suggested that, in addition to other deviations from study protocol, switching from an intention to treat analysis to a per protocol one would contribute to the observed discrepancies between the journal and FDA results. Conclusion: Novel contour enhanced funnel plots and a regression based adjustment method worked convincingly and might have an important part to play in combating publication biases." [Accessed August 9, 2009]. Available at: http://www.bmj.com/cgi/content/abstract/339/aug07_1/b2981.

Salla A Munro, Simon A Lewin, Helen J Smith, et al. Patient Adherence to Tuberculosis Treatment: A Systematic Review of Qualitative Research. PLoS Med. 2007;4(7):e238. Excerpt: "From a systematic review of qualitative research, Munro and coauthors found that a range of interacting factors can lead to patients deciding not to complete their course of tuberculosis treatment." [Accessed October 26, 2010]. Available at: http://dx.doi.org/10.1371/journal.pmed.0040238.

Mark J. Eisenberg, Kristian B. Filion, Daniel Yavin, et al. Pharmacotherapies for smoking cessation: a meta-analysis of randomized controlled trials. CMAJ. 2008;179(2):135-144. Description: "This paper is an illustrative example of the use of Bayesian methods for meta-analysis." Abstract: "Background: Many placebo-controlled trials have demonstrated the efficacy of individual pharmacotherapies approved for smoking cessation. However, few direct or indirect comparisons of such interventions have been conducted. We performed a meta-analysis to compare the treatment effects of 7 approved pharmacologic interventions for smoking cessation. Methods: We searched the US Centers for Disease Control and Prevention's Tobacco Information and Prevention database as well as MEDLINE, EMBASE and the Cochrane Library for published reports of placebo-controlled, double-blind randomized controlled trials of pharmacotherapies for smoking cessation. We included studies that reported biochemically validated measures of abstinence at 6 and 12 months. We used a hierarchical Bayesian random-effects model to summarize the results for each intervention. Results: We identified 70 published reports of 69 trials involving a total of 32 908 patients. Six of the 7 pharmacotherapies studied were found to be more efficacious than placebo: varenicline (odds ratio [OR] 2.41, 95% credible interval [CrI] 1.91-3.12), nicotine nasal spray (OR 2.37, 95% CrI 1.12-5.13), bupropion (OR 2.07, 95% CrI 1.73-2.55), transdermal nicotine (OR 2.07, 95% CrI 1.69-2.62), nicotine tablet (OR 2.06, 95% CrI 1.12-5.13) and nicotine gum (OR 1.71, 95% CrI 1.35-2.21). Similar results were obtained regardless of which measure of abstinence was used. Although the point estimate favoured nicotine inhaler over placebo (OR 2.17), these results were not conclusive because the credible interval included unity (95% CrI 0.95-5.43). When all 7 interventions were included in the same model, all were more efficacious than placebo. In our analysis of data from the varenicline trials that included bupropion control arms, we found that varenicline was superior to bupropion (OR 2.18, 95% CrI 1.09-4.08). Interpretation: Varenicline, bupropion and the 5 nicotine replacement therapies were all more efficacious than placebo at promoting smoking abstinence at 6 and 12 months." [Accessed January 30, 2009]. Available at: http://www.cmaj.ca/cgi/content/abstract/179/2/135.

Jayne Tierney, Lesley Stewart, Davina Ghersi, Sarah Burdett, Matthew Sydes. Practical methods for incorporating summary time-to-event data into meta-analysis. Trials. 2007;8(1):16. Abstract: "BACKGROUND: In systematic reviews and meta-analyses, time-to-event outcomes are most appropriately analysed using hazard ratios (HRs). In the absence of individual patient data (IPD), methods are available to obtain HRs and/or associated statistics by carefully manipulating published or other summary data. Awareness and adoption of these methods is somewhat limited, perhaps because they are published in the statistical literature using statistical notation. METHODS: This paper aims to 'translate' the methods for estimating a HR and associated statistics from published time-to-event-analyses into less statistical and more practical guidance and provide a corresponding, easy-to-use calculations spreadsheet, to facilitate the computational aspects. RESULTS: A wider audience should be able to understand published time-to-event data in individual trial reports and use it more appropriately in meta-analysis. When faced with particular circumstances, readers can refer to the relevant sections of the paper. The spreadsheet can be used to assist them in carrying out the calculations. CONCLUSION: The methods cannot circumvent the potential biases associated with relying on published data for systematic reviews and meta-analysis. However, this practical guide should improve the quality of the analysis and subsequent interpretation of systematic reviews and meta-analyses that include time-to-event outcomes. [Accessed March 3, 2009]. Available at: http://www.trialsjournal.com/content/8/1/16.

Ole Olsen, Philippa Middleton, Jeanette Ezzo, et al. Quality of Cochrane reviews: assessment of sample from 1998. BMJ. 2001;323(7317):829 -832. Abstract: "Objective: To assess the quality of Cochrane reviews. Design: Ten methodologists affiliated with the Cochrane Collaboration independently examined, in a semistructured way, the quality of reviews first published in 1998. Each review was assessed by two people; if one of them noted any major problems, they agreed on a common assessment. Predominant types of problem were categorised. Setting: Cyberspace collaboration coordinated from the Nordic Cochrane Centre. Studies: All 53 reviews first published in issue 4 of the Cochrane Library in 1998. Main outcome measure: Proportion of reviews with various types of major problem. Results: No problems or only minor ones were found in most reviews. Major problems were identified in 15 reviews (29%). The evidence did not fully support the conclusion in nine reviews (17%), the conduct or reporting was unsatisfactory in 12 reviews (23%), and stylistic problems were identified in 12 reviews (23%). The problematic conclusions all gave too favourable a picture of the experimental intervention. Conclusions: Cochrane reviews have previously been shown to be of higher quality and less biased on average than other systematic reviews, but improvement is always possible. The Cochrane Collaboration has taken steps to improve editorial processes and the quality of its reviews. Meanwhile, the Cochrane Library remains a key source of evidence about the effects of healthcare interventions. Its users should interpret reviews cautiously, particularly those with conclusions favouring experimental interventions and those with many typographical errors." [Accessed October 26, 2010]. Available at: http://www.bmj.com/content/323/7317/829.abstract.

Rolf Groenwold, Maroeska Rovers, Jacobus Lubsen, Geert van der Heijden. Subgroup effects despite homogeneous heterogeneity test results. BMC Medical Research Methodology. 2010;10(1):43. Abstract: "BACKGROUND: Statistical tests of heterogeneity are very popular in meta-analyses, as heterogeneity might indicate subgroup effects. Lack of demonstrable statistical heterogeneity, however, might obscure clinical heterogeneity, meaning clinically relevant subgroup effects. METHODS: A qualitative, visual method to explore the potential for subgroup effects was provided by a modification of the forest plot, i.e., adding a vertical axis indicating the proportion of a subgroup variable in the individual trials. Such a plot was used to assess the potential for clinically relevant subgroup effects and was illustrated by a clinical example on the effects of antibiotics in children with acute otitis media. RESULTS: Statistical tests did not indicate heterogeneity in the meta-analysis on the effects of amoxicillin on acute otitis media (Q=3.29, p=0.51; I2=0%; T2=0). Nevertheless, in a modified forest plot, in which the individual trials were ordered by the proportion of children with bilateral otitis, a clear relation between bilaterality and treatment effects was observed (which was also found in an individual patient data meta-analysis of the included trials: p-value for interaction 0.021). CONCLUSIONS: A modification of the forest plot, by including an additional (vertical) axis indicating the proportion of a certain subgroup variable, is a qualitative, visual, and easy-to-interpret method to explore potential subgroup effects in studies included in meta-analyses." [Accessed June 14, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/43.

Phil Alderson, Iain Chalmers. Survey of claims of no effect in abstracts of Cochrane reviews. BMJ. 2003;326(7387):475. Excerpt: "It is never correct to claim that treatments have no effect or that there is no difference in the effects of treatments. It is impossible to prove a negative or that two treatments have the same effect. There will always be some uncertainty surrounding estimates of treatment effects, and a small difference can never be excluded." [Accessed October 26, 2010]. Available at: http://www.bmj.com/content/326/7387/475.short.

R Andrew Moore, Jodie Barden. Systematic review of dexketoprofen in acute and chronic pain. BMC Clinical Pharmacology. 2008;8(1):11. Abstract: "BACKGROUND: Dexketoprofen, an NSAID used in the management of acute and chronic pains, is licensed in several countries but has not previously been the subjected of a systematic review. We used published and unpublished information from randomised clinical trials (RCTs) of dexketoprofen in painful conditions to assess evidence on efficacy and harm. METHODS: PubMed and Cochrane Central were searched for RCTs of dexketoprofen for pain of any aetiology. Reference lists of retrieved articles and reviews were also searched. Menarini Group produced copies of published and unpublished studies (clinical trial reports). Data were abstracted into a standard form. For studies reporting results of single dose administration, the number of patients with at least 50% pain relief was derived and used to calculate the relative benefit (RB) and number-needed-to-treat (NNT) for one patient to achieve at least 50% pain relief compared with placebo. RESULTS: Thirty-five trials were found in acute pain and chronic pain; 6,380 patients were included, 3,381 receiving dexketoprofen. Information from 16 trials (almost half the total patients) was obtained from clinical trial reports from previously unpublished trials or abstracts. Almost all of the trials were of short duration in acute conditions or recent onset pain.All 12 randomised trials that compared dexketoprofen (any dose) with placebo found dexketoprofen to be statistically superior. Five trials in postoperative pain yielded NNTs for 12.5 mg dexketoprofen of 3.5 (2.7 to 4.9), 25 mg dexketoprofen of 3.0 (2.4 to 3.9), and 50 mg dexketoprofen of 2.1 (1.5 to 3.5). In 29/30 active comparator trials, dexketoprofen at the dose used was at least equivalent in efficacy to comparator drugs. Adverse event withdrawal rates were low in postoperative pain and somewhat higher in trials of longer duration; no serious adverse events were reported. CONCLUSION: Dexketoprofen was at least as effective as other NSAIDs and paracetamol/opioid combinations. While adverse event withdrawal was not different between dexketoprofen and comparator analgesics, the different conditions and comparators studies precluded any formal analysis. Exposure was limited, and no conclusions could be drawn about safety in terms of serious adverse events like gastrointestinal bleeding or cardiovascular events." [Accessed December 31, 2010]. Available at: http://www.biomedcentral.com/1472-6904/8/11.

Steffen Mickenautsch. Systematic reviews, systematic error and the acquisition of clinical knowledge. BMC Medical Research Methodology. 2010;10(1):53. Abstract: "BACKGROUND: Since its inception, evidence-based medicine and its application through systematic reviews, has been widely accepted. However, it has also been strongly criticised and resisted by some academic groups and clinicians. One of the main criticisms of evidence-based medicine is that it appears to claim to have unique access to absolute scientific truth and thus devalues and replaces other types of knowledge sources. DISCUSSION: The various types of clinical knowledge sources are categorised on the basis of Kant's categories of knowledge acquisition, as being either 'analytic' or 'synthetic'. It is shown that these categories do not act in opposition but rather, depend upon each other. The unity of analysis and synthesis in knowledge acquisition is demonstrated during the process of systematic reviewing of clinical trials. Systematic reviews constitute comprehensive synthesis of clinical knowledge but depend upon plausible, analytical hypothesis development for the trials reviewed. The dangers of systematic error regarding the internal validity of acquired knowledge are highlighted on the basis of empirical evidence. It has been shown that the systematic review process reduces systematic error, thus ensuring high internal validity. It is argued that this process does not exclude other types of knowledge sources. Instead, amongst these other types it functions as an integrated element during the acquisition of clinical knowledge. CONCLUSIONS: The acquisition of clinical knowledge is based on the interaction between analysis and synthesis. Systematic reviews provide the highest form of synthetic knowledge acquisition in terms of achieving internal validity of results. In that capacity it informs the analytic knowledge of the clinician but does not replace it." [Accessed June 14, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/53.

Gerta Rucker, Guido Schwarzer, James Carpenter, Martin Schumacher. Undue reliance on I^2 in assessing heterogeneity may mislead. BMC Medical Research Methodology. 2008;8(1):79. Abstract: "BACKGROUND: The heterogeneity statistic I^2, interpreted as the percentage of variability due to heterogeneity between studies rather than sampling error, depends on precision, that is, the size of the studies included. METHODS: Based on a real meta-analysis, we simulate artificially `inflating' the sample size under the random effects model. For a given inflation factor M = 1, 2, 3, ... and for each trial i, we create a M-inflated trial by drawing a treatment effect estimate from the random effects model, using s_i^2/M as within-trial sampling variance. RESULTS: As precision increases, while estimates of the heterogeneity variance tau^2 remain unchanged on average, estimates of I^2 increase rapidly to nearly 100%. A similar phenomenon is apparent in a sample of 157 meta-analyses. CONCLUSION: When deciding whether or not to pool treatment estimates in a meta-analysis, the yard-stick should be the clinical relevance of any heterogeneity present. tau^2, rather than I^2, is the appropriate measure for this purpose." [Accessed January 2, 2009]. Available at: http://www.biomedcentral.com/1471-2288/8/79/abstract.

Corrado Barbui, Andrea Cipriani, Lara Malvini, and Michele Tansella. Validity of the Impact Factor of Journals as a Measure of Randomized Controlled Trial Quality. J Clin Psychiatry 2006;67:37-40.

Creative Commons License All of the material above this paragraph is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2017-06-15. The material below this paragraph links to my old website, StATS. Although I wrote all of the material listed below, my ex-employer, Children's Mercy Hospital, has claimed copyright ownership of this material. The brief excerpts shown here are included under the fair use provisions of U.S. Copyright laws.


12. Stats: Criticism of random effects in a meta-analysis (June 14, 2008). There are two approaches to combining results in a meta-analysis. They are called the fixed effects model and the random effects model. The fixed effects model effectively weights each study by the sample size, or by a measurement that is closely related to the sample size, such as the inverse of the standard error of the estimate. A random effects meta-analysis, in contrast, will assume that an estimate from a single study has two sources of error. One error is the same as in the fixed effects analysis and varies by the sample size of the study. The other error is a random component that is independent of the sample size and represents uncertainties due to conditions in this particular study that differ from conditions in other studies.

11. Stats: Finding only the important studies (January 21, 2008). Someone wrote into the MedStats listserv asking about a process that they had chosen to select "important" articles in a particular research area. This was, I presume, a qualitative summary of interesting results in a broad medical area rather than a quantitative synthesis of all available research addressing a specific medical treatment. The reason I suspect this is that the person mentioned that they had used the statistical significance of the studies as a filter and eliminated any negative studies from further consideration.


10. Stats: Cherry picking the literature (December 20, 2006). I have a relative who loves to send me articles supporting a particular religious and political viewpoint that he endorses. While that viewpoint he espouses is usually conservative, the problems with the articles he cites are problems that plague both sides. These articles always have an impressive bibliography, as if to say "Look! It was published and peer-reviewed, so it must be true." The problem with these articles though is that the bibliography was created using a process called "cherry picking."

9. Stats: Meta-analysis and diagnostic tests (February 14, 2006). I will be giving a talk at the graduate seminar series for the department of Mathematics and Statistics at UMKC on February 23. The title of the talk will be "Meta-analysis and diagnostic tests."


8. Stats: A controversial meta-analysis (December 20, 2005). Back in August 2005, the Lancet published an interesting meta-analysis on homeopathy. Are the clinical effects of homoeopathy placebo effects? Comparative study of placebo-controlled trials of homoeopathy and allopathy. Shang A, Huwiler-Muntener K, Nartey L, Juni P, Dorig S, Sterne JA, Pewsner D, Egger M. Lancet 2005: 366(9487); 726-32. The researchers identified 110 placebo controlled homeopathy trials and matched them with 110 placebo controlled conventional-medicine trials. Both sets of trials showed that smaller studies showed stronger effects. Both also showed that lower quality studies showed stronger effects. But when the analysis was restricted to large trials of high quality, the effect of conventional medicine was still statistically significant (odds ratio 0.58, 95% CI 0.39 to 0.85) but the effect of homeopathy was not (odds ratio 0.88, 95% CI 0.65 to 1.19). The critics of this meta-analysis raise some interesting objections, and you can read some of them in the in correspondence section of the December 17, 2005 issue of the Lancet.

7. Stats: Responding to a critique of meta-analysis (October 10, 2005). A contributor to the Evidence-Based Medicine list offered a possible criticism of meta-analysis. The criticism was along the lines of (I am paraphrasing and summarizing): Suppose we have two randomized trials coming up with exactly the opposite conclusion. Assume that bias and confounding are not an issue here. Then one study may be wrong. When meta-analysis takes an average of a correct value and an incorrect value, you will get a meaningless result. Now assume that the two results differ because they were studying two very different patient populations. An average here is also misleading, unless you weight by the proportion of the true overall population that these two patient populations come from. As an aside this reminds me of the old joke that a statistician is the only person who could stick his head in an oven and his feet in a bucket of ice and say that he feels fine on average. Here's the gist of my response.

6. Stats: Some articles on meta-analysis (June 10, 2005). I found a couple of interesting articles on meta-analysis that are difficult to classify: Interpreting epidemiological evidence: how meta-analysis and causal inference methods are related. Weed DL. Int J Epidemiol 2000: 29(3); 387-90 and Systematic reviews: a cross-sectional study of location and citation counts. Montori VM, Wilczynski NL, Morgan D, Haynes RB. BMC Med 2003: 1(1); 2. The first relates how meta-analysis supports some, but not all of Hill's criteria for causation. The second compared systematic reviews with narrative reviews. Systematic reviews were cited more often (26 times on average versus 8) and included twice as many citations.

5. Stats: Hedge's G (May 13, 2005). Someone asked me today about Hedge's G. I had never heard of it before, but if you do a web search, you will find econwpa.wustl.edu/eps/prog/papers/0411/0411124.pdf which defines it as a variation on Cohen's D that corrects for biases due to small sample sizes.

4. Stats: Cumulative meta-analysis (March 11, 2005). This figure below, published in Erythropoietin, uncertainty principle and cancer related anaemia. Clark O, Adams JR, Bennett CL, Djulbegovic B. BMC Cancer 2002: 2(1); 23 shows cumulative meta-analysis, which is the cumulated effects over time of studies in the use of erythropoietin (EPO) to treat cancer related anemia.

3. Stats: Summary Receiver Operating Characteristic Curve (January 21, 2005). In the past week, I have had two inquiries about how to perform a meta-analysis of studies of a diagnostic test. An intriguing idea that I discovered in researching this is the use of the Receiver Operating Characteristic (ROC) curve to summarize the results of the studies. Each study will have its own sensitivity and specificity, and plotting the sensitivity/specificity pairs on the coordinates of an ROC plot, along with some reference lines, will help you to evaluate the degree of heterogeneity of the studies, among other things.

2. Stats: Forest plots (January 12, 2005). Many meta-analyses use a graph known as a forest plot. I was always confused by the funny squares in a forest plot, so I looked for a description.


1. Stats: Measuring heterogeneity in meta-analysis (November 29, 2004). While browsing through the archives of the British Medical Journal, I noticed an excellent article on measuring heterogeneity in meta-analysis. There is a new measure, I-squared, that measures the proportion of inconsistency in individual studies that cannot be explained by chance. It ranges between 0% and 100% with lower values representing less heterogeneity.

What now?

Browse other categories at this site

Browse through the most recent entries

Get help