[Previous issue] [Next issue]
[The Monthly Mean] November/December 2012 -- How do you pronounce that again? Released 2013-01-19.
New Year's Resolution for 2013. In the past couple of years, The Monthly Mean newsletter has had trouble living up to its name. I could only seem to get out a newsletter every other month. There was one stretch where the newsletter actually covered three months! I decided that in 2013, I will send out twelve newsletters and they will come out reasonably close to the month that they claim to be in. So I'm hoping, for example to get the January newsletter out the door no later than February 5. The only way this will happen is if I shorten the newsletters. So, starting with the next newsletter, you will see only half of the articles. You might see a joke in one month, and a pithy quote instead in the following month. You might see a recommended book one month and a recommended website in a different month.
Welcome to the Monthly Mean newsletter for November/December 2012. The Monthly Mean is a newsletter with articles about Statistics with occasional forays into research ethics and evidence based medicine. If you are having trouble reading this newsletter in your email system, please go to www.pmean.com/news/201211.html. If you are not yet subscribed to this newsletter, you can sign on at www.pmean.com/news. If you no longer wish to receive this newsletter, there is a link to unsubscribe at the bottom of this email. Here's a list of topics.
--> How do you pronounce that again?
--> Sample size calculations in studies with a baseline
--> How large does Cronbach's alpha need to be?
--> Monthly Mean Article (peer reviewed): The use of continuous data versus binary data in MTC models: A case study in rheumatoid arthritis
--> Monthly Mean Article (popular press): Sure, Big Data Is Great. But So Is Intuition
--> Monthly Mean Book: Analysis of Pretest-Postest Designs
--> Monthly Mean Definition: What is p for trend?
--> Monthly Mean Quote: It's hard to tell the extent...
--> Monthly Mean Trivia Question: This past Christmas, one of my gifts...
--> Monthly Mean Video: Ben Goldacre's Bad Evidence
--> Monthly Mean Website: Kaggle
--> Nick News: Nick builds in the snow
--> Very bad joke: The Mayan Doomsday's effect...
--> Tell me what you think.
--> Join me on Facebook, LinkedIn and Twitter
--> Permission to re-use any of the material in this newsletter
--> How do you pronounce that again? There are certain names in Statistics that are commonly mispronounced. Here are three of them.
Akaike. The Japanese statistician who invented the Akaike Information Criterion. Pronounced ah-kah-ee-kay.
Huynh. Coinventor of a degrees of freedom adjustment in repeat measures analysis of variance. Pronounced win.
Likert. Inventor of the five point scale used in many questionnaires. lick-ert (not like-ert!).
Are there any others that you can think of?
Did you like this article? Visit http://www.pmean.com/category/TeachingResources.html for related links and pages.
--> Sample size calculations in studies with a baseline
Many research studies evaluate all patients at baseline and then randomly assign the patients to groups, Pass out the real stuff to one group and sugar pills to the other group, and then re-evaluate everybody at the end of the study. The sample size calculations for these types of studies are a bit tricky.
One of the reasons that you measure the patients at baseline is that you are interested in the change or improvement that a specific intervention might produce. The change in a measure is almost always going to be less variable than the measurement itself. Think about it. Sometimes someone very sick at baseline gets miraculously better at the end of the study. Sometime someone just barely ill is knocking on death's door at the end of the study. But these are rare occurences. Most of the time, the sickest people at baseline are among the sickest at the end of the study and the least ill people at baseline are among the least ill at the end.
So changes are typically (but no always) less variable than the range of values at baseline or at the end of the study. This gives you more precision and might allow you to use a smaller sample size.
You also might measure at baseline to use that value as a covariate. The covariate will also improve precision and might allow you to use a smaller sample size.
As a simple illustration of the increased precision, consider a study of acupuncture that appeared in the March 27, 2004 issue of the British Medical Journal.
Acupuncture for chronic headache in primary care: large, pragmatic, randomised trial. Vickers AJ, Rees RW, Zollman CE, McCarney R, Smith CM, Ellis N, Fisher P, Van Haselen R. Bmj 2004: 328(7442); 744. Available at http://bmj.bmjjournals.com/cgi/content/full/328/7442/744
In this study, patients were randomly assigned to either normal standard of care or normal care plus additional visits to a acupuncturist. After three months, the control group had an average headache score of 23.7 (SD 16.8) and the acupuncture group had a score of 18.0 (14.8). A simple confidence interval for the difference in means is 2.0 to 9.4 which establishes that the acupuncture group did significantly better.
The authors considered a more complex model where the three month measurements were adjusted using the baseline measures as a covariate. The resulting confidence interval, 1.6 to 6.3, is shifted a bit towards zero because the control group had slightly higher headache scores at baseline. But notice also that the interval is much narrower. Using the baseline decreased the width of the confidence interval by about a third.
If you are planning a study with baseline measures, you should try to account for the greater precision you get by having a baseline. Otherwise, you would end up with an unnecessarily large sample size. This is not a trivial consideration. A reduction of one third in the width of the confidence interval, such as the one seen above, would cut your required sample by more than half.
To factor in the greater efficiency of a study with baseline measures, you need to specify
-> the standard deviation of the change score,
-> the within subject variation, or
-> the intraclass correlation.
Typically, these numbers are hard to find. In a pinch, I have performed these calculations using a range of intraclass correlations between 0.7 and 0.9. The problem with this is that the sample size is highly sensitive to small changes in the intraclass correlation.
By the way, I like to use change scores in my analyses because they are easy to explain and to interpret. Change scores, however, are generally considered to be inferior to using the baseline as a covariate.
Did you like this article? Visit http://www.pmean.com/category/SampleSizeJustification.html for related links and pages.
--> How large does Cronbach's alpha need to be? A while back I got a question that is fairly difficult to answer. What would you consider a Cronbach alpha of .60 to be in terms of "label" (i.e., fair, poor, etc.)? In general, I don't think much of Cronbach's Alpha. Everyone runs Cronbach's Alpha on their data because it is an easy thing to do and it shows that they are sincere in trying to assess the validity and reliability of their instrument. It doesn't matter that in most cases, Cronbach's Alpha does not directly address the major concerns about your data. You have to show that you are doing SOMETHING.
Cronbach's Alpha is a measure of how well each individual item in a scale correlates with the sum of the remaining items. It measures consistency among individual items in a scale. Streiner and Norman offer this advice on Cronbach's Alpha.
It is nearly impossible these days to see a scale development paper that has not used alpha, and the implication is usually made that the higher the coefficient, the better. However, there are problems in uncritically accepting high values of alpha (or KR-20), and especially in interpreting them as reflecting simply internal consistency. The first problem is that alpha is dependent not only on the magnitude of the correlations among items, but also on the number of items in the scale. A scale can be made to look more 'homogenous' simply by doubling the number of items, even though the average correlation remains the same. This leads directly to the second problem. If we have two scales which each measure a distinct construct, and combine them to form one long scale, alpha would probably be high, although the merged scale is obviously tapping two different attributes. Third, if alpha is too high, then it may suggest a high level of item redundancy; that is, a number of items asking the same question in slightly different ways. -- pages 64-65, Health Measurement Scales A Practical Guide to Their Development and Use. Streiner DL, Norman GR (1989) New York: Oxford University Press, Inc.
After all these thoughtful warnings, they say that alpha should be above 0.7, but not much higher than 0.9. They cite a classic text, Nunnally 1978, though the third edition of the book, published in 1994, might be better. I do not have this book, but I've seen Nunnally cited a lot. Google lists over fifty thousand matches for the search terms: Nunnally 1978 Cronbach's alpha.
But note also the commentary at
which provides some context for this cut-off.
G. David Garson offers another opinion about what corresponds to a good value for Cronbach's Alpha:
The widely-accepted social science cut-off is that alpha should be .70 or higher for a set of items to be considered a scale, but some use .75 or .80 while others are as lenient as .60. That .70 is as low as one may wish to go is reflected in the fact that when alpha is .70, the standard error of measurement will be over half (0.55) a standard deviation. -- (This used to be on the web, but has disappeared.)
The Wikipedia says that
As a rule of thumb, a proposed psychometric instrument should only be used if an α value of 0.8 or higher is obtained on a substantial sample. However the standard of reliability required varies between fields of psychology: cognitive tests (tests of intelligence or achievement) tend to be more reliable than tests of attitudes or personality. There is also variation within fields: it is easier to construct a reliable test of a specific attitude than of a general one, for example. -- en.wikipedia.org/wiki/Cronbach%27s_alpha
A smattering of other web pages seems to claim that a value as low as 0.6 might be okay for an exploratory study.
By the way, if you ever get a ridiculously small value for Cronbach's alpha, possibly even a negative value. Check first for a coding error. Perhaps some of your items needed to be reverse scaled.
Did you like this article? Visit http://www.pmean.com/category/MeasuringAgreement.html for related links and pages.
--> Monthly Mean Article (peer reviewed): Susanne Schmitz, Roisin Adams, Cathal Walsh. The use of continuous data versus binary data in MTC models: A case study in rheumatoid arthritis. BMC Medical Research Methodology. 2012;12(1):167. Estimates of relative efficacy between alternative treatments are crucial for decision making in health care. When sufficient head to head evidence is not available Bayesian mixed treatment comparison models provide a powerful methodology to obtain such estimates. While models can be fit to a broad range of efficacy measures, this paper illustrates the advantages of using continuous outcome measures compared to binary outcome measures. [Accessed on November 20, 2012]. http://www.biomedcentral.com/1471-2288/12/167/abstract.
Did you like this article? Visit http://www.pmean.com/category/ModelingIssues.html for related links and pages.
--> Monthly Mean Article (popular press): Steve Lohr, Sure, Big Data Is Great. But So Is Intuition. The New York Times, December 29, 2012. Excerpt: "At the M.I.T. conference, a panel was asked to cite examples of big failures in Big Data. No one could really think of any. Soon after, though, Roberto Rigobon could barely contain himself as he took to the stage. Mr. Rigobon, a professor at M.I.T.'s Sloan School of Management, said that the financial crisis certainly humbled the data hounds. 'Hedge funds failed all over the world,' he said."
Did you like this article? Visit http://www.pmean.com/category/ModelingIssues.html for related links and pages.
--> Monthly Mean Book: Peter Bonate (2000), Analysis of Pretest-Postest Designs. Chapman & Hall/CRC, ISBN: 1584881739. Description: There are a lot of very strident articles claiming that there is only one good way to analyze data from a pretest-posttest design. This book is a refreshing change because it outlines ALL the available approaches and discusses the underlying assumptions and the reasons why you might consider each analysis.
Did you like this book? Visit http://www.pmean.com/category/HypothesisTesting.html for related links and pages.
--> Monthly Mean Definition: What is p for trend? I got an inquiry from a colleague about the words "p for trend" that appear in many research papers. Her impression was that "p for trend" was what you used when no other test statistic produced a statistically significant result. Well, mostly no, but there is some truth in what she said.
A "p for trend" test could be anything, but it always involves an ordinal independent variable. You could think of it as a way of testing for a dose-response relationship. I generally like the "p for trend" tests, as they make more efficient use of the data than other tests, such as ANOVA and Chi-square. It has more power because it takes advantage of the ordinal information in the independent variable. So the skeptical comment above is actually a testament to a test has enough power to declare statistical significance where other less powerful tests fail.
There are two common ways to run a test that looks for trends. One is to convert the multiple degree of freedom categorical variable in your statistical model (e.g., linear regression or logistic regression) to a single degree of freedom continuous variable. In different programs, this is done differently. In SAS, you take the variable out of the CLASS statement, but keep it in the model. In SPSS, some procedures have a CATEGORICAL button that you can click on to modify the nature of the variable. In R, there is a FACTOR function that you can remove.
Another is to use a specialized non-parametric test. In a two way classification table, you would use the Cochran-Armitage test instead of the Chi-squared test. In other settings the nonparametric correlations, such as Kendall's tau and Spearman's rho might be used. There are lots of equivalencies between some of these tests and relationships to the Pearson correlation as well.
Now, I'm all in favor of using tests for trends, but they need to be specified prior to data collection. They are a lot like one-tailed tests of hypotheses. You can run the tests after noting a strong dose-response relationship without ruining their statistical properties. So if someone only converts to a test for trend when the originally planned test fails to reject, they are cheating. It is not unlike someone who switches from a two-sided hypothesis to a one-sided hypothesis after looking at the data.
Did you like this article? Visit http://www.pmean.com/category/HypothesisTesting.html for related links and pages.
--> Monthly Mean Trivia Question: This past Christmas, one of my gifts was an interesting clock. It was labelled in radians. Where you'd normally expect to see a 12, there is a value of 2pi. In place of the 1 is pi/6. What time would it be if the big hand was on pi/2 and the little hand was on 4pi/3? First person to answer correctly by email gets mentioned in the next issue of The Monthly Mean.
--> Monthly Mean Quote: It's hard to tell the extent of a flu outbreak because most of the victims just snivel away unhappily in the privacy of their own homes. The Google site solves this problem by tracking the number of times people search for flu-related terms online. Does this make sense to you, people? If we could determine what was going on in the world by the most popular searches, wouldn't the universe be run by mischievous kittens and Kate Middleton? --Gail Collins, as quoted at The New York Times, January 11, 2013. http://www.nytimes.com/2013/01/12/opinion/collins-the-flu-who-knew.html
--> Monthly Mean Video: BBC Radio 4. Ben Goldacre's Bad Evidence. Actually, this is an audio file rather than a video, because radio is radio. Excerpt: "In this programme, the medic and author of Bad Science, Ben Goldacre, sets out to explore the potential for putting RCTs at the heart of the policy-making process, arguing that not only can they reveal if our existing policies are effective but RCTs have the potential to transform the way we create and implement social policy across the country, from education to health, from welfare to crime." [Accessed on January 3, 2013]. http://www.bbc.co.uk/programmes/b01phhb9.
Did you like this video? Visit http://www.pmean.com/category/CriticalAppraisal.html for related links and pages.
--> Monthly Mean Website: Kaggle. Description: "Kaggle is an arena where you can match your data science skills against a global cadre of experts in statistics, mathematics, and machine learning. Whether you're a world-class algorithm wizard competing for prize money or a novice looking to learn from the best, here's your chance to jump in and geek out, for fame, fortune, or fun." http://www.kaggle.com/
Did you like this website? Visit http://www.pmean.com/category/ModelingIssues.html for related links and pages.
--> Nick News: Nick builds in the snow. We had a moderate size snowfall in late December, and it was a rare treat. I don't think we had anything more than a dusting of snow the previous winter. Nick was out right away building several creations out of snow.
Nick built this huge snowman while we were at Grandma's shoveling her driveway.
Here's a second smaller snowman.
Meanwhile, back at our place, a huge herd of dinosaurs tromped through the snow in our front yard. These dinosaurs have two toes in front and one in back.
If you look closely at the tracks you will see vicious claws at the front of each toe.
Finally, Nick built an igloo of sorts. Here it is a few days out and starting to melt. He built up the sides with big snow balls then filled the interior with packed snow. Then he dug out a cave in the packed snow.
Here is Nick working his way into the igloo.
When he gets all the way in, you can barely see him. You'll notice some flags on top of the fort. We had some wires and electrical lines marked in our front yard a while back and these flags were left over from that work. So this igloo was actually the Fort for the Army of the Buried Cable.
--> Very bad joke: "The Mayan Doomsday's effect on survival outcomes in clinical trials" Title of a research article published by Paul Whetley-Price, Brian Hutton, and Mark Clemons in CMAJ December 11, 2012 vol. 184 no. 18 doi: 10.1503/cmaj.121616. The title is humorous enough by itself, but read the full article at http://www.cmaj.ca/content/184/18/2021.full
Here's a bonus Dilbert comic strip. I can't find the copyright policy for this Scott Adams comic strip, so I am including the link only.
--> Tell me what you think. How did you like this newsletter? Give me some feedback by responding to this email. Unlike most newsletters where your reply goes to the bottomless bit bucket, a reply to this newsletter goes back to my main email account. Comment on anything you like, but I am especially interested in answers to the following three
--> What was the most important thing that you learned in this newsletter?
--> What was the one thing that you found confusing or difficult to follow?
--> What other topics would you like to see covered in a future newsletter?
If you send a comment, I'll mention your name and summarize what you said in the next newsletter. It's a small thank you and acknowledgement to those who take the time to help me improve my newsletter. If you send feedback and you want to remain anonymous, please let me know.
I received feedback from six people. The article on data is/data are drew a lot of commentary. Bart Holland mentioned that agenda and propaganda, both plural nouns by the Latin origins, are now treated as singular. He also commented the transition in the 1920s of certain nouns to verbs, such as "impact" and "contact" and notes that today almost any noun can be "verbed."
David McArthur provided a spirited defense of data as plural, and I can't paraphrase his comments and still do them justice. He said "Your "data is" article prompts me to retort with "darn tootin them's plural" for the following specific reasons: 1) While a dataset or dataframe or matrix or list is a container of information (and it is indeed correct to think of a given dataset in the singular), to consider the contents inside that container in any statistical sense demands that we think seriously about the shape of the data inside, and "shape" comes explicitly from considering plural pieces of information. Some students seem to conceive "data" in a stubborn, singular, monolithic sense that indeed makes it harder for them to drill into the utility of 5-point or 12-point or 16-point descriptive summaries, but when faced with the notion that the dataset can be parsed in multiple ways exactly because data are multiple in form, things start to brighten up. 2) Any robust approach to statistical analysis is premised on the reality that data elements even within the same single variable have varying contributions to the overall story (see issues like leverage or downweighting) and that mentally singularizing those elements -- as if they all mean the same -- does a major injustice to the whole. And 3) the famous rubber ruler approach to designing instruments and making measurements reinforces the underlying notion that data must be plural to make sense (that is, you are free to use my magic rubber ruler to your heart's delight to make everyone in the room the identical height, though you'll surely end up with singular data with no real meaning). "Data" are plural." He also followed up with the well known quote about anecdotes and data "The plural of anecdote is not data."
Larry George ponders briefly whether the word "Statistics" is singular or plural.
Larry George and an anonymous reader liked the NNT calculation from rates article. Ed Gracely, though, offers an important caveat. He points out that any rate where the numerator is larger than the denominator is already pretty persuasive evidence of a serious problem with the type of extrapolation required to calculate the NNT from that rate.
Closely related to the idea of NNTs for rates is a suggestion by Larry George to look at recurrent events. I usually think of infections as a good example of recurrent events. You can only die once, but you can pick up an infection multiple times. Anyway, there are lots of complications when the recurrence is correlated. This might require a discussion of frailty models, among other things. As Larry George correctly notes, there's also a lot of literature on this problem from the perspective of Engineering.
A recent FDA proposal on unique identifiers for medical devices raises a host of interesting questions and Larry George thinks this might be worth a comment in a future article.
The anonymous reader also liked the explanation of logit/probit models, though Larry George found it a bit confusing. Hey, I found it confusing also.
Mike Roberts just wrote to say "thank you." I always appreciate emails like this. It helps make the hassle of writing the newsletter more bearable.
One last comment. Many of the suggestions sent here and in emails to previous newsletters offer excellent suggestion for future article topics. Keep in mind, though, that sometimes my lack of knowledge in a particular area may make it difficult for me to write well on that topic. Also, even when I do know that topic well, some topics just aren't easy to talk about in an abbreviated format like a newsletter. Even with those two caveats, please keep those suggestions coming!
--> Join me on Facebook, LinkedIn, and Twitter. I'm just getting started with social media. My Facebook page is www.facebook.com/pmean, my page on LinkedIn is www.linkedin.com/in/pmean, and my Twitter feed name is @profmean. If you'd like to be a Facebook friend, LinkedIn connection (my email is mail (at) pmean (dot) com), or tweet follower, I'd love to add you. If you have suggestions on how I could use these social media better, please let me know.
--> Permission to re-use any of the material in this newsletter. This newsletter is published under the Creative Commons Attribution 3.0 United States License, http://creativecommons.org/licenses/by/3.0/us/. You are free to re-use any of this material, as long as you acknowledge the original source. A link to or a mention of my main website, www.pmean.com, is sufficient attribution. If your re-use of my material is at a publicly accessible webpage, it would be nice to hear about that link, but this is optional.
Sign up for the Monthly Mean newsletter
Review the archive of Monthly Mean newsletters
Go to the main page of the P.Mean website
This work is licensed under a Creative Commons Attribution 3.0 United States License.