Evaluation of diagnostic tests involves some subtle but important issues in
Statistics. These webpages show some interesting examples of diagnostic tests,
offer pointers for critical evaluation of studies of diagnostic tests, and
present practical applications of diagnostic tests in your day-to-day medical
practice. Also see Category: Bayesian statistics. Other entries about diagnostic testing can be found in the
diagnostic testing page at the
StATS website.
2009
- P.Mean: Data layout for an ROC curve
(created 2009-10-16). Back in 1999, I wrote a brief description of the ROC
curve and showed what it would look like in SPSS. That page can be found at
www.childrensmercy.org/stats/ask/roc.asp. I didn't show, however, what the
data would look like when entered into SPSS or what the dialog boxes would
look like.
- P.Mean: The problem with being too
sensitive or too specific (created 2009-09-16). Somebody asked my opinion
about cost effectiveness research. My bottom line is that I like it, but I
understand why it is controversial. Here's the logic that I presented to draw
that conclusion.
- P.Mean: Getting a good
cut-off when sensitivity is more important than specificity (created
2009-09-14). "I am working on a prediction model to help with
diagnosis. In this particular area I need a model that has the highest
possible sensitivity (low specificity is not a problem)." One obvious
comment is that you can achieve a sensitivity of 100% if you don't mind a
specificity of 0%. So when you say "low specificity is not a problem" that
statement is only partially true. What you mean to say is that false negatives
are far more serious than false positives. How much more serious, though. Five
times? Ten times? Once you've decided the relative costs of false negatives
and false positives, the rest is easy.
- P.Mean: Locating individual points on an
ROC curve (created 2009-03-05). In a project examining a diagnostic test,
I was asked to develop an ROC curve. That is fairly easy to do. Six months
later, though, I was asked to designate a particular point on the curve
corresponding to a cutpoint of 7. This is a bit ambiguous, but in re-reading
the paper, it was obvious from the context that this meant locating the point
on the curve where a positive test result of 7 or less (alternatively a
negative test result of 8 or more) occurred. It takes a while to get oriented
properly on an ROC curve. Here's what I did.
2008
- P.Mean: Controversies with a test for
ovarian cancer (created 2008-08-27). A recent article in the New York Times
raises some interesting questions about diagnostic testing.
Outside resources:
- Osamu Komori, Shinto Eguchi. A boosting method for maximizing the
partial area under the ROC curve. BMC Bioinformatics. 2010;11(1):314.
Abstract: "BACKGROUND: The receiver operating characteristic (ROC) curve is a
fundamental tool to assess the discriminant performance for not only a single
marker but also a score function combining multiple markers. The area under
the ROC curve (AUC) for a score function measures the intrinsic ability for
the score function to discriminate between the controls and cases. Recently,
the partial AUC (pAUC) has been paid more attention than the AUC, because a
suitable range of the false positive rate can be focused according to various
clinical situations. However, existing pAUC-based methods only handle a few
markers and do not take nonlinear combination of markers into consideration.
RESULTS: We have developed a new statistical method that focuses on the pAUC
based on a boosting technique. The markers are combined componentially for
maximizing the pAUC in the boosting algorithm using natural cubic splines or
decision stumps (single-level decision trees), according to the values of
markers (continuous or discrete). We show that the resulting score plots are
useful for understanding how each marker is associated with the outcome
variable. We compare the performance of the proposed boosting method with
those of other existing methods, and demonstrate the utility using real data
sets. As a result, we have much better discrimination performances in the
sense of the pAUC in both simulation studies and real data analysis.
CONCLUSIONS: The proposed method addresses how to combine the markers after a
pAUC-based filtering procedure in high dimensional setting. Hence, it provides
a consistent way of analyzing data based on the pAUC from maker selection to
marker combination for discrimination problems. The method can capture not
only linear but also nonlinear association between the outcome variable and
the markers, about which the nonlinearity is known to be necessary in general
for the maximization of the pAUC. The method also puts importance on the
accuracy of classification performance as well as interpretability of the
association, by offering simple and smooth resultant score plots for each
marker." [Accessed June 14, 2010]. Available at:
http://www.biomedcentral.com/1471-2105/11/314.
- Jens Klotsche, Dietmar Ferger, Lars Pieper, Jurgen Rehm, Hans-Ulrich
Wittchen. A novel nonparametric approach for estimating cut-offs in
continuous risk indicators with application to diabetes epidemiology. BMC
Medical Research Methodology. 2009;9(1):63. Abstract: "BACKGROUND:
Epidemiological and clinical studies, often including anthropometric measures,
have established obesity as a major risk factor for the development of type 2
diabetes. Appropriate cut-off values for anthropometric parameters are
necessary for prediction or decision purposes. The cut-off corresponding to
the Youden-Index is often applied in epidemiology and biomedical literature
for dichotomizing a continuous risk indicator. METHODS: Using data from a
representative large multistage longitudinal epidemiological study in a
primary care setting in Germany, this paper explores a novel approach for
estimating optimal cut-offs of anthropomorphic parameters for predicting type
2 diabetes based on a discontinuity of a regression function in a
nonparametric regression framework. RESULTS: The resulting cut-off
corresponded to values obtained by the Youden Index (maximum of the sum of
sensitivity and specificity, minus one), often considered the optimal cut-off
in epidemiological and biomedical research. The nonparametric regression based
estimator was compared to results obtained by the established methods of the
Receiver Operating Characteristic plot in various simulation scenarios and
based on bias and root mean square error, yielded excellent finite sample
properties. CONCLUSION: It is thus recommended that this nonparametric
regression approach be considered as valuable alternative when a continuous
indicator has to be dichotomized at the Youden Index for prediction or
decision purposes." [Accessed October 11, 2009]. Available at:
http://www.biomedcentral.com/1471-2288/9/63.
- K J Hamberg, B Carstensen, T I Sørensen, K Eghøje. Accuracy of clinical
diagnosis of cirrhosis among alcohol-abusing men. J Clin Epidemiol.
1996;49(11):1295-1301. Abstract: "There is a considerable variation among
specialists in the use of liver biopsy for the diagnosis of alcoholic
cirrhosis, which is often based solely on clinical findings, sometimes
supplemented with blood tests. To assess the diagnostic accuracy that may be
achieved by this approach, we related items of the history, symptoms and
signs, and routine blood tests to the presence/absence of cirrhosis in a
unique, previously established, consecutive series of 303 alcohol-abusing men,
in whom liver biopsy was performed irrespective of the clinical and
biochemical findings. Using logistic regression analyses, we created a
clinical, a combined clinical and biochemical, and a pure biochemical
diagnostic model. The probability of cirrhosis in patients with the specified
characteristics was estimated, the diagnostic accuracy was assessed as
functions of diagnostic thresholds for cirrhosis defined by the probability of
cirrhosis varying between 0 and 1,and confidence intervals were estimated by
bootstrap sampling. The clinical model, including facial teleangiectasia,
vascular spiders, white nails, abdominal veins, fatness, and peripheral edema,
could be used with high diagnostic accuracy and it was clearly superior to the
biochemical model. Adding biochemical findings to the clinical model improved
the accuracy of the clinical model only slightly. We conclude that cirrhosis
may be diagnosed in alcohol-abusing men with a high accuracy using selected,
properly weighted clinical observations only." [Accessed December 4,
2009]. Available at:
http://www.ncbi.nlm.nih.gov/pubmed/8892498.
- I Hozo, B Djulbegovic. Calculating confidence intervals for threshold
and post-test probabilities. MD Comput. 1998;15(2):110-115. Abstract:
"We describe a method and a computer program, written in JavaScript, for
calculating confidence intervals. The method uses Taylor's series to
approximate the standard errors of a post-test probability and threshold
probabilities and, from them, to obtain the associated confidence intervals.
This method is valid if the variables of interest are stochastically
independent." [Accessed December 4, 2009]. Available at:
http://www.ncbi.nlm.nih.gov/pubmed/9540324.
- Nathaniel D. Mercaldo, Kit F. Lau, Xiao H. Zhou. Confidence intervals
for predictive values with an emphasis to case-control studies. Statistics
in Medicine. 2007;26(10):2170-2183. Abstract: "The accuracy of a
binary-scale diagnostic test can be represented by sensitivity (Se),
specificity (Sp) and positive and negative predictive values (PPV and NPV).
Although Se and Sp measure the intrinsic accuracy of a diagnostic test that
does not depend on the prevalence rate, they do not provide information on the
diagnostic accuracy of a particular patient. To obtain this information we
need to use PPV and NPV. Since PPV and NPV are functions of both the accuracy
of the test and the prevalence of the disease, constructing their confidence
intervals for a particular patient is not straightforward. In this paper, a
novel method for the estimation of PPV and NPV, as well as their confidence
intervals, is developed. For both predictive values, standard, adjusted and
their logit transformed-based confidence intervals are compared using coverage
probabilities and interval lengths in a simulation study. These methods are
then applied to two case-control studies: a diagnostic test assessing the
ability of the e4 allele of the apolipoprotein E gene (ApoE.e4) on
distinguishing patients with late-onset Alzheimer's disease (AD) and a
prognostic test assessing the predictive ability of a 70-gene signature on
breast cancer metastasis. Copyright © 2006 John Wiley & Sons, Ltd."
[Accessed December 10, 2009]. Available at:
http://dx.doi.org/10.1002/sim.2677.
- Judith L Bowen. Educational strategies to promote clinical diagnostic
reasoning. N. Engl. J. Med. 2006;355(21):2217-2225. Excerpt: "Clinical
teachers differ from clinicians in a fundamental way. They must simultaneously
foster high-quality patient care and assess the clinical skills and reasoning
of learners in order to promote their progress toward independence in the
clinical setting. Clinical teachers must diagnose both the patient's clinical
problem and the learner's ability and skill. To assess a learner's diagnostic
reasoning strategies effectively, the teacher needs to consider how doctors
learn to reason in the clinical environment." [Accessed December 4, 2009].
Available at:
http://www.ncbi.nlm.nih.gov/pubmed/17124019.
- David J. Hand. Evaluating diagnostic tests: The area under the ROC
curve and the balance of errors. Statistics in Medicine.
2010;29(14):1502-1510. Abstract: "Because accurate diagnosis lies at the
heart of medicine, it is important to be able to evaluate the effectiveness of
diagnostic tests. A variety of accuracy measures are used. One particularly
widely used measure is the AUC, the area under the receiver operating
characteristic (ROC) curve. This measure has a well-understood weakness when
comparing ROC curves which cross. However, it also has the more fundamental
weakness of failing to balance different kinds of misdiagnoses effectively.
This is not merely an aspect of the inevitable arbitrariness in choosing a
performance measure, but is a core property of the way the AUC is defined.
This property is explored, and an alternative, the H measure, is described.
Copyright © 2010 John Wiley & Sons, Ltd." [Accessed June 16, 2010].
Available at:
http://dx.doi.org/10.1002/sim.3859.
- Tracey Sach, David Whynes. Men and women: beliefs about cancer and
about screening. BMC Public Health. 2009;9(1):431. Abstract:
"BACKGROUND: Cancer screening programmes in England are publicly-funded.
Professionals' beliefs in the public health benefits of screening can conflict
with individuals' entitlements to exercise informed judgement over whether or
not to participate. The recognition of the importance of individual autonomy
in decision making requires greater understanding of the knowledge, attitudes
and beliefs upon which people's screening choices are founded. Until recently,
the technology available required that cancer screening be confined to women.
This study aimed to discover whether male and female perceptions of cancer and
of screening differ. METHODS: Data on the public's cancer beliefs were
collected by means of a postal survey (anonymous questionnaire). Two general
practices based in Nottingham and in Mansfield, in east-central England, sent
questionnaires to registered patients aged 30 to 70 years. 1,808 completed
questionnaires were returned for analysis, 56.5 per cent from women. RESULTS:
Women were less likely to underestimate overall cancer incidence, although
each sex was more likely to cite a sex-specific cancer as being amongst the
most common cancer site. In terms of risk factors, men were most uncertain
about the role of stress and sexually-transmitted diseases, whereas women were
more likely to rate excessive alcohol and family history as major risk
factors. The majority of respondents believed the public health care system
should provide cancer screening, but significantly more women than men
reported having benefiting from the nationally-provided screening services.
Those who were older, in better health or had longer periods of formal
education were less worried about cancer than those who had illness
experiences, lower incomes, or who were smokers. Actual or potential
participation in bowel screening was higher amongst those who believed bowel
cancer to be common and amongst men, despite women having more substantial
worries about cancer than men. CONCLUSIONS: Our results suggest that men's and
women's differential knowledge of cancer correlates with women's closer
involvement with screening. Even so, men were neither less positive about
screening nor less likely to express a willingness to participate in relevant
screening in the future. It is important to understand gender-related
differences in knowledge and perceptions of cancer, if health promotion
resources are to be allocated efficiently." [Accessed November 30, 2009].
Available at:
http://www.biomedcentral.com/1471-2458/9/431.
- Eta S. Berner, Randolph A. Miller, Mark L. Graber. Missed and Delayed
Diagnoses in the Ambulatory Setting. Annals of Internal Medicine.
2007;146(6):470. Excerpt: "We applaud Gandhi and colleagues for
highlighting the problem of outpatient diagnostic errors. However, malpractice
claims are a biased data source. Primary identification of diagnostic errors
in ambulatory settings remains problematic." [Accessed December 4, 2009].
Available at:
http://www.annals.org/content/146/6/470.1.extract.
- E Berner, M Graber. Overconfidence as a Cause of Diagnostic Error in
Medicine. The American Journal of Medicine. 2008;121(5):S2-S23.
Abstract: "The great majority of medical diagnoses are made using automatic,
efficient cognitive processes, and these diagnoses are correct most of the
time. This analytic review concerns the exceptions: the times when these
cognitive processes fail and the final diagnosis is missed or wrong. We argue
that physicians in general underappreciate the likelihood that their diagnoses
are wrong and that this tendency to overconfidence is related to both
intrinsic and systemically reinforced factors. We present a comprehensive
review of the available literature and current thinking related to these
issues. The review covers the incidence and impact of diagnostic error, data
on physician overconfidence as a contributing cause of errors, strategies to
improve the accuracy of diagnostic decision making, and recommendations for
future research." [Accessed December 4, 2009]. Available at:
http://www.amjmed.com/article/S0002-9343(08)00040-5/fulltext.
- H. Gilbert Welch, William C. Black. Overdiagnosis in Cancer. J.
Natl. Cancer Inst. 2010:djq099. Abstract: "This article summarizes the
phenomenon of cancer overdiagnosis--the diagnosis of a "cancer" that would
otherwise not go on to cause symptoms or death. We describe the two
prerequisites for cancer overdiagnosis to occur: the existence of a silent
disease reservoir and activities leading to its detection (particularly cancer
screening). We estimated the magnitude of overdiagnosis from randomized
trials: about 25% of mammographically detected breast cancers, 50% of chest
x-ray and/or sputum-detected lung cancers, and 60% of prostate-specific
antigen-detected prostate cancers. We also review data from observational
studies and population-based cancer statistics suggesting overdiagnosis in
computed tomography-detected lung cancer, neuroblastoma, thyroid cancer,
melanoma, and kidney cancer. To address the problem, patients must be
adequately informed of the nature and the magnitude of the trade-off involved
with early cancer detection. Equally important, researchers need to work to
develop better estimates of the magnitude of overdiagnosis and develop
clinical strategies to help minimize it." [Accessed April 28, 2010].
Available at:
http://jnci.oxfordjournals.org/cgi/content/abstract/djq099v1.
- Donald A. Redelmeier. The Cognitive Psychology of Missed Diagnoses.
Ann Intern Med. 2005;142(2):115-120. Abstract: "Cognitive psychology is the
science that examines how people reason, formulate judgments, and make
decisions. This case involves a patient given a diagnosis of pharyngitis,
whose ultimate diagnosis of osteomyelitis was missed through a series of
cognitive shortcuts. These errors include the availability heuristic (in which
people judge likelihood by how easily examples spring to mind), the anchoring
heuristic (in which people stick with initial impressions), framing effects
(in which people make different decisions depending on how information is
presented), blind obedience (in which people stop thinking when confronted
with authority), and premature closure (in which several alternatives are not
pursued). Rather than trying to completely eliminate cognitive shortcuts
(which often serve clinicians well), becoming aware of common errors might
lead to sustained improvement in patient care." [Accessed July 8, 2009].
Available at:
http://www.annals.org/cgi/content/abstract/142/2/115.
- Eve A. Kerr, Brian J. Zikmund-Fisher, Mandi L. Klamerus, et al. The
Role of Clinical Uncertainty in Treatment Decisions for Diabetic Patients with
Uncontrolled Blood Pressure. Annals of Internal Medicine.
2008;148(10):717-727. Abstract: "Factors underlying failure to intensify
therapy in response to elevated blood pressure have not been systematically
studied. To examine the process of care for diabetic patients with elevated
triage blood pressure (≥140/90 mm Hg) during routine primary care visits to
assess whether a treatment change occurred and to what degree specific patient
and provider factors correlated with the likelihood of treatment change.
Prospective cohort study. 9 Veterans Affairs facilities in 3 midwestern
states. 1169 diabetic patients with scheduled visits to 92 primary care
providers from February 2005 to March 2006. Proportion of patients who had a
change in a blood pressure treatment (medication intensification or planned
follow-up within 4 weeks). Predicted probability of treatment change was
calculated from a multilevel logistic model that included variables assessing
clinical uncertainty, competing demands and prioritization, and
medication-related factors (controlling for blood pressure). Overall, 573
(49%) patients had a blood pressure treatment change at the visit. The
following factors made treatment change less likely: repeated blood pressure
by provider recorded as less than 140/90 mm Hg versus 140/90 mm Hg or greater
or no recorded repeated blood pressure (13% vs. 61%; < 0.001); home blood
pressure reported by patients as less than 140/90 mm Hg versus 140/90 mm Hg or
greater or no recorded home blood pressure (18% vs. 52%; < 0.001); provider
systolic blood pressure goal greater than 130 mm Hg versus 130 mm Hg or less
(33% vs. 52%; = 0.002); discussion of conditions unrelated to hypertension and
diabetes versus no discussion (44% vs. 55%; = 0.008); and discussion of
medication issues versus no discussion (23% vs. 52%; < 0.001). Providers knew
that the study pertained to diabetes and hypertension, and treatment change
was assessed for 1 visit per patient. Approximately 50% of diabetic patients
presenting with a substantially elevated triage blood pressure received
treatment change at the visit. Clinical uncertainty about the true blood
pressure value was a prominent reason that providers did not intensify
therapy." [Accessed December 4, 2009]. Available at:
http://www.annals.org/content/148/10/717.abstract.
- Karin Velthove, Hubert Leufkens, Patrick Souverein, Rene Schweizer, Wouter
van Solinge. Testing bias in clinical databases: methodological
considerations. Emerging Themes in Epidemiology. 2010;7(1):2. Abstract:
"BACKGROUND: Laboratory testing in clinical practice is never a random
process. In this study we evaluated testing bias for neutrophil counts in
clinical practice by using results from requested and non-requested
hematological blood tests. METHODS: This study was conducted using data from
the Utrecht Patient Oriented Database, a unique clinical database as it
contains physician requested data, but also data that are not requested by the
physician, but measured as result of requesting other hematological
parameters. We identified adult patients, hospitalized in 2005 with at least
two blood tests during admission, where requests for general blood profiles
and specifically for neutrophil counts were contrasted in scenario analyses.
Possible effect modifiers were diagnosis and glucocorticoid use. RESULTS: A
total of 567 patients with requested neutrophil counts and 1,439 patients with
non-requested neutrophil counts were analyzed. The absolute neutrophil count
at admission differed with a mean of 7.4.10E9/l for requested counts and
8.3.10E9/l for non-requested counts (p-value <0.001). This difference could be
explained for 83.2% by the occurrence of cardiovascular disease as underlying
disease and for 4.5% by glucocorticoid use. CONCLUSION: Requests for
neutrophil counts in clinical databases are associated with underlying disease
and with cardiovascular disease in particular. The results from our study show
the importance of evaluating testing bias in epidemiological studies obtaining
data from clinical databases." [Accessed June 14, 2010]. Available at:
http://www.ete-online.com/content/7/1/2.
- Lynne Gaffikin, John McGrath, Marc Arbyn, Paul Blumenthal. Visual
inspection with acetic acid as a cervical cancer test: accuracy validated
using latent class analysis. BMC Medical Research Methodology.
2007;7(1):36. Abstract: "BACKGROUND: The purpose of this study was to
validate the accuracy of an alternative cervical cancer test - visual
inspection with acetic acid (VIA) - by addressing possible imperfections in
the gold standard through latent class analysis (LCA). The data were
originally collected at peri-urban health clinics in Zimbabwe. METHODS:
Conventional accuracy (sensitivity/specificity) estimates for VIA and two
other screening tests using colposcopy/biopsy as the reference standard were
compared to LCA estimates based on results from all four tests. For
conventional analysis, negative colposcopy was accepted as a negative outcome
when biopsy was not available as the reference standard. With LCA, local
dependencies between tests were handled through adding direct effect
parameters or additional latent classes to the model. RESULTS: Two models
yielded good fit to the data, a 2-class model with two adjustments and a
3-class model with one adjustment. The definition of latent disease associated
with the latter was more stringent, backed by three of the four tests. Under
that model, sensitivity for VIA (abnormal+) was 0.74 compared to 0.78 with
conventional analyses. Specificity was 0.639 versus 0.568, respectively. By
contrast, the LCA-derived sensitivity for colposcopy/biopsy was 0.63.
CONCLUSION: VIA sensitivity and specificity with the 3-class LCA model were
within the range of published data and relatively consistent with conventional
analyses, thus validating the original assessment of test accuracy. LCA
probably yielded more likely estimates of the true accuracy than did
conventional analysis with in-country colposcopy/biopsy as the reference
standard. Colpscopy with biopsy can be problematic as a study reference
standard and LCA offers the possibility of obtaining estimates adjusted for
referent imperfections." [Accessed December 4, 2009]. Available at:
http://www.biomedcentral.com/1471-2288/7/36.
All of the material above this paragraph is licensed under a
Creative Commons Attribution 3.0 United States License. This page was written by
Steve Simon and was last modified on
2010-06-16. The material
below this paragraph links to my
old website, StATS. Although I wrote all of the material
listed below, my ex-employer, Children's Mercy Hospital, has claimed copyright
ownership of this material. The brief excerpts shown here are included under
the fair use provisions of U.S. Copyright laws.
Definitions:
2008
- Stats: ROC curve for an imperfect gold
standard (March 12, 2008). Someone asked me about how to use an ROC curve
if you have more than two categories. Apparently the gold standard that the
researchers were using was known to be imperfect, so they wanted an
intermediate category (possible disease).
-
Stats: Does prevalence affect sensitivity (January 31, 2008).
Dear Professor Mean, Does lowering the prevalence of a disease have an effect
on sensitivity?
2007
- Stats: Postlude to my Dallas talk
(November 11, 2007). I gave a talk this morning to the American College
of Allergy, Asthma & Immunology. I documented my preparations for this talk
on my webpages and wanted to share some thoughts I had during and after the
talk.
- Stats: Handout for diagnostic testing
(November 6, 2007). I have been busy preparing a handout describing the
basics of diagnostic testing (e.g., sensitivity and specificity), the medical
issues associated with these tests (e.g., the difficulty in testing for a
rare disease, the need to balance the costs of false positives and false
negatives), and applications of the likelihood ratio. I also show how to use
the likelihood ratio slide rule.
- Stats: Continuing
education questions for a talk on diagnostic tests (July 24, 2007). As
part of my talk to the American College of Allergy, Asthma & Immunology, I
have been asked to present two questions related to my topic (Use of
Diagnostic Tests for Making Clinical Decisions). These questions would
consist of a brief clinical stem followed by four choices on how to manage
the situation. These will be presented prior to my talk and then afterwards
to see how effective the training is.
- Stats: Classic calculations
for a diagnostic test (July 20, 2007). I created a table that illustrates
many of the classic calculations for a diagnostic test.
- Stats: Code for drawing new
likelihood ratio slide rule (July 12, 2007). I have made some minor
changes to my likelihood ratio slide. The original code was lost somewhere,
so I wrote some new code and added documentation. I also changed the
orientation of the slide rule so it can be held horizontally and shaded the
regions that need to be cut out or away.
- Stats: Recommendations from
Sackett et al for evaluating a diagnostic test (July 2, 2007). There is a
lot of controversy about diagnostic testing, and I have mentioned some of
these controversies in other weblog entries. I wanted to review what the
experts say about diagnostic testing. The definitive resource for evaluating
any medical controversy is Evidence-based Medicine How to Practice and
Teach EBM. David L. Sackett, Scott W. Richardson, William Rosenberg,
Brian R. Haynes (1998) Edinburgh: Churchill Livingstone.
- Stats: Use of diagnostic tests for
making clinical decisions (June 15, 2007). I'm giving a talk for the
American College of Allergy, Asthma, and Immunology with the title "Use of
diagnostic tests for making clinical decisions." Here's an abstract of this
talk.
- Stats: Applying likelihood
ratios in your head (June 1, 2007). Someone sent me a nice email
complimenting my likelihood ratio slide rule. He/she also pointed out a
simple way to apply likelihood ratios in your head.
- Stats: Quantifying the ability
of dreams to predict the future (April 10, 2007). Someone wrote to me
about a diary they had kept for the past eight years about their dreams.
About every other month or so, a dream of theirs came true. I was asked if I
could quantify the likelihood of successful predictions. Assessing psychic
phenomena is outside my area of expertise, but I offered a few general
suggestions, partly because I thought that an analogy to diagnostic testing
was interesting.
- Stats: What makes a good diagnostic
test? (April 6, 2007). I've been invited to give a talk at the annual
meeting of the American College of Allergy, Asthma & Immunology. The
tentative title of the talk is "What makes a good diagnostic test?" It will
be part of a plenary session and I'll be followed by two speakers debating
the merits of two particular diagnostic tests. I don't have a lot of details
at this time, but as I develop my talk, I'll put details here on this weblog.
2006
- Stats: Incorporating risk
factors into diagnostic test calculations (November 9, 2006). A
contributor to the Evidence-Based Health email discussion group (PK) raised
an interesting question about how to incorporate information about risk
factors when applying the results of a diagnostic test. When you are
estimating a pre-test probability for a diagnostic test, you need to take
three steps: (1) find an estimate of the prevalence of the disease in the
general population, (2) modify this estimate based on characteristics of your
particular practice, and (3) further modify this estimate based on
characteristics of the individual patient that is currently sitting in front
of you.
- Stats: Mathematical derivation of the
odds form of Bayes theorem (October 16, 2006). I had included some rather
technical details on my web page about likelihood ratios, but I thought it
would be best to move it to a separate page.
- Stats: Calculations involving
diagnostic tests using open source abstracts (October 5, 2006). I spent a
few hours reviewing 200+ abstracts published in BiomedCentral that had the
words "sensitivity" and "specificity" in the title. There were four which had
enough information in the abstract to be used as teaching examples on how to
calculate sensitivity, specificity, positive predictive value, and/or
negative predictive value.
- Stats: A novel diagnostic test (January
26, 2006). A recently published article on diagnosing cancer got a lot of
press. The article, Diagnostic Accuracy of Canine Scent Detection in
Early- and Late-Stage Lung and Breast Cancers. McCulloch M, Jezierski T,
Broffman M, Hubbard A, Kirk Turner, Janecki T. Integrative Cancer Therapies
2006: 5(1); 1-10., noted that canines have an unusually sensitive sense of
smell and might be able to diagnose cancer by sniffing breath sample from
human patients. This is rather intriguing, since dogs have already been
trained to locate explosives, cadavers, drugs, and so forth.
2005
- Stats: An error slips through the peer
review process (September 19, 2005). A group of residents wanted me to
look at an article because they were confused about the calculation of the
likelihood ratio. The numbers that they got were quite different from those
in the publication. It turns out that they were calculating things correctly,
and did not realize that the paper had several serious errors in some of the
more fundamental calculations of sensitivity and specificity.
- Stats: Likelihood ratio--extra
information (August 3, 2005). In a meta-analysis of studies of diagnosing
anemia (Guyatt 1992 JGIM 7(2): 145-53), Serum ferritin was discovered to be
the most effective test. Here are the results of this test
- Stats: The costs of a false positive test
(March 1, 2005). The New York Times had an excellent article on newborn
screening tests, .Panel to Advise Testing Babies for 29 Diseases.
Kolata G. The New York Times, February 21, 2005. Unfortunately, this article
is no longer available online. But it discusses a recent push to standardize
and expand the screening tests for newborns to include 29 different diseases.
- Stats: Spectrum Bias (January 4, 2005).
I tried to start a page on diagnostic tests a while back, but have not had
the time to fully develop it. One of the important issues for diagnostic
tests is spectrum bias. The sensitivity and specificity of a diagnostic test
can depend on who exactly is being tested. Think of disease as a range of
possibilities from slight to moderate to extreme. If only a portion of the
disease range is included, you may get an incorrect impression of how well a
diagnostic test works. This is known as spectrum bias.
2004
- Stats: Unnecessary diagnostic tests
(October 25, 2004). You would think that you can never have enough
information about your health. Barring financial considerations, the more
testing the better. That actually is not true. In some situations, too many
diagnostic tests are being run, and it hurts rather than helps the patient.
American Medical News has an article about this, Lab tests go under a
critical microscope Experts point out that good tests used badly can lead to
bad medicine. Victoria Stagg Elliott. Nov. 1, 2004. www.ama-assn.org/amednews/2004/11/01/hlsd1101.htm.
They offer several good examples.
- Stats: Full-Body Computed Tomography
Screening (September 6, 2004). Full body scans represent a good example
of the conflicting considerations when you need to evaluate a screening test.
A full body scan uses a CT (Computerized Tomography) scan to examine the
inside of your body. These full body scans are heavily advertised as a way to
detect physiologic abnormalities that might provide an early warning of
cancer, heart disease, or other illnesses. Many organizations, including the
U.S. Food and Drug Administration strongly discourage the use of full body
scans in healthy adults with no obvious symptoms of disease.
- Stats: Unbalanced sample sizes for
evaluating a diagnostic test (August 5, 2004). I get a lot of questions
about unbalanced sample sizes. Quite often the mechanics of the research
protocol make it easier to find a lot of patients in one group and only a few
in another group. For example, someone is evaluating a diagnostic test and
notes that only 16% of the patients in the study will actually have the
disease being tested for. Will this cause any bias, he wonders? Any loss in
precision?
You will lose some precision, but there is no bias of any
kind.
- Stats: Evaluating the AUC for an ROC curve (July
27, 2004). Someone asked me where I got the following guidance for Area
Under the Curve (AUC) for a Receiver Operating Characteristic (ROC) curve:
0.50 to 0.75 = fair, 0.75 to 0.92 = good, 0.92 to 0.97 = very good, 0.97 to
1.00 = excellent. I cannot find where I got these numbers. It must be a sign
of senility on my part.
- Stats: Pap smears for women without a cervix
(June 24, 2004). In the most recent issue of JAMA is an article by
Sirovich and Welch, Cervical Cancer Screening Among Women Without a Cervix
that estimates almost 10 million women in the United States have received a
pap smear unnecessarily because they have had a full hysterectomy and no
longer have a cervix. For women who have had only a partial hysterectomy or
where the hysterectomy was done for cervical neoplasia, regular pap smears
are recommended. For the other women, though, this is an unnecessary test,
because the pap smear is trying to detect cancer in an organ that the woman
no longer has.
- Stats: Prostate Specific Antigen testing (May 31,
2004). A recent report in the New England Journal of Medicine highlights
the continuing controversy over Prostate-Specific Antigen (PSA) testing. This
controversy is interesting to me because it highlights the uncertain nature
of medical research. Keep in mind that I am not a doctor (read
my disclaimer) and if you are confronting this issue with regard to your
own health, please discuss this with your doctor. PSA is a test commonly used
to detect prostate cancer, and any value larger than 4.0 ng per milliliter is
considered by some as cause for additional testing. The article examines
prevalence of prostate cancer among men in the control arm of a large
randomized prevention trial. Of the 9,459 men in the trial, 2,950 had
measured PSA that never exceeded 4.0, and yet 15% of these men had
prostate cancer confirmed by biopsy.
2002
- Stats: Likelihood ratio slide rule (October 24,
2002). The use of likelihood ratios requires a bit of tedious
calculations. I have developed a simple slide rule that will do likelihood
ratio calculations for you.
1999
- Stats: Sample size for a diagnostic study
(September 3, 1999) Dear Professor Mean, How big should a study of a
diagnostic test be? I want to estimate a sample size for the sensitivity and
specifity of a test. I guess confidence intervals would address this, but is
there a calculation analogous to a power analysis that would apply to figure
out the size of the groups beforehand? -- Jovial John
- Stats: ROC curve (August 18, 1999) Dear
Professor Mean: I was at a meeting in Belgium and the buzz statistic was ROC
Analysis. I think it stands for Receiver Operating Characteristic curve. It
seems to be used for predictive values. I seemed to be a lone ranger in not
understanding as they were showing in several presentations "by this curve
you can see this is good or bad" and they didn't look very different. Do you
have a simple explanation about ROC curves?
What now?
Browse other categories at this site
Browse through the most recent
entries
Get help
This work is licensed under a
Creative
Commons Attribution 3.0 United States License. This page was written by
Steve Simon and was last modified on
2010-06-16.