P.Mean: Archive organized by date (created 2010-01-06)

This page lists files created in calendar year 2010. Also look at the archives for 2009 and for 2008. You can also browse through an archive of pages organized by topic. Archives for the my old website (StATS) start at the Archive 2008 page. Archives from earlier years can be reached from there.

July 2010

  1. P.Mean: Standard operating procedures for a statistical consulting center (created 2010-07-30). I asked a question on one of the American Statistical Association message boards about how I setting up a consulting service at the University of Missouri-Kansas City (UMKC), where I work part-time. I wanted to develop some SOPs (Standard Operating Procedures) for this center that would supplement the guidance already available on the web. I asked if anyone else had SOPs (or anything similar) that I could look at so I wouldn't re-invent the wheel. I got a lot of responses.
  2. P.Mean: When should research in a given area end? (created 2010-07-26). Someone asked a rather philosophical question, is there ever an end to research in a given area? Will there ever be a "last word" on a research topic. Here's what I wrote in response.
  3. P.Mean: Sample chapter: The first three steps in selecting an appropriate sample size (created 2010-07-24). As I mentioned in an earlier webpage, I am talking to some publishers about writing a second book. The working title is "Jumpstart Statistics: How to Restart Your Stalled Research Project." Here's a tentative chapter from that book. It is not quite complete yet, but I'm hoping to finish it soon. One of your most critical choices in designing a research study is selecting an appropriate sample size. A sample size that is either too small or too large will be wasteful of resources and will raise ethical concerns.
  4. P.Mean: Tentative table of contents for my second book (created 2010-07-24). As I mentioned in an earlier webpage, I am talking to some publishers about writing a second book. The working title is "Jumpstart Statistics: How to Restart Your Stalled Research Project." Here's a tentative table of contents.
  5. P.Mean: Jumpstart Statistics, a proposal for my second book (created 2010-07-23). I want to talk to some publishers about writing a second book. Here is what I will propose to them.
  6. P.Mean: Salary survey for Biostatisticians (created 2010-07-21). I am working part-time at UMKC in the Department of Informatic Medicine and Personalized Health. They like me and want me to increase my hours from 10 hours a week (25% time) to something more. I'll talk to them about this, but at the same time, I want to point out that my salary is not competitive with my peers. Here's a table from a recent survey on salaries, published in the Amstat News.
  7. P.Mean: What is principal components analysis? (created 2010-07-19). I was asked to help someone who was reviewing a paper that used principal components analysis (PCA) as part of the statistical methodology. I have not yet seen the article, so I could only offer very general advice.
  8. P.Mean: Another counter-intuitive probability problem (created 2010-07-04). A recent article in Science News, rekindled the two children problem and offered an odd twist. Here's the simple version. Suppose you have two children, one of whom is a boy. What is the probability that both children are boys? The obvious, but incorrect choice is 1/2. The correct answer is 1/3. How does this work?

    June 2010 (8 entries)
     
  9. P.Mean: Resources using Stack Overflow (created 2010-06-30) . A bunch of Internet resources fell into my lap all at once. Some of them relate to a new technology (Stack Overflow/Stack Exchange) that allows people to pose questions like an Interenet email discussion group, but it is web-based and has some of the capabilities associated with blogs and wikis.
  10. P.Mean: The SPSS t-test is confusing (created 2010-06-29). I have always disliked how SPSS (now IBM SPSS) presented the output from their independent samples t-test. I want to explain why it is confusing and show you an alternative based on the general linear model.
  11. P.Mean: Classic references in Statistics (created 2010-06-29). A prominent statistician, Christian Robert, listed some classic research papers in Statistics that he wanted to present to his students in a special readings class. This was commented on by another prominent statistician, Andrew Gelman. I'm not a prominent statistician, but that won't stop me from adding my two cents.
  12. P.Mean: What I use for talks instead of Powerpoint (created 2010-06-28). Someone on LinkedIn asked a question about what technologies people use for their presentations (laptop, flipchart, or whiteboard). For most of my presentations, I use none of these technologies. Instead I create a webpage of my presentation and then print it and hand out copies.
  13. P.Mean: The futility of small sample sizes for evaluating a binary outcome (created 2010-06-16). I'm helping out with a project that involves a non-randomized comparison of two groups of patients. One group gets a particular anesthetic drug and the other group does not. The researcher wants to compare rates of hypotension, respiratory depression, apnea, and hypoxia. I suggested using continuous outcomes like O2 saturation levels rather than discrete events like hypoxia, but for a variety of reasons, they cannot use continuous outcomes. Their original goal was to collect data on about 20 patients in each group.
  14. P.Mean: An example of a bad survey (created 2010-06-11). I was asked to fill out an Internet survey to define my "consulting needs." That's a rather strange invitation, and sounds almost like a cheap way to develop business leads. But it was a request through LinkedIn, so I thought it was worth filling out. I want to try to build my contacts at LinkedIn, and filling out a short survey seemed like a small price to pay to get a potential lead for my own consulting business. When I went to the webpage with the actual survey, though, I was shocked and disappointed with what I found.
  15. P.Mean: An interesting alternative to power calculations (created 2010-06-09). Someone on the MedStats Internet discussion group mentioned an alternative to power calculations called accuracy in parameter estimation (AIPE). It looks interesting. Here are some relevant references.
  16. P.Mean: Minimum sample size needed for a time series prediction (created 2010-06-08). Someone asked what the minimum sample size that was needed in a time series analysis model to forecast future observations. Strictly speaking, you can forecast with two observations. Draw a straight line connecting the two points and then extend that line as far as you want in the future. But you wouldn't want to do that. So a better question might be what is the minimum number of data points that you would need in order to provide a good forecast of the future.

    May 2010 ( 8 entries)
     
  17. P.Mean: What is the premier conference for statistical consulting (created 2010-05-28). Someone asked what the premier conference for statistical consulting. That's a rather ambiguous question, because different people will interpret terms like "premier conference" and "statistical consulting" differently. The answer, however, is pretty unambiguous. In North America, it would have to be the Joint Statistics Meetings (JSM).
  18. P.Mean: Lessons learned the hard way: don't presume to know how your software handles missing value codes (created 2010-05-28). I'm working on an interesting project that involves summing up rvu's (resource value units) across certain records for a given patient. Some of the rvu's are missing. How should the program handle these missing rvu's. We discussed this by email and agreed to ignore missing rvu's in the sum. This is effectively the same as replacing the missing rvu's with zero. There is two cases worth worrying about, though, and handling those cases makes me realize just how tricky missing values are.
  19. P.Mean: How I got started in my career as an independent statistical consultant (created 2010-05-24). LinkedIn has a question and answer board, and one of the questions inspired me to write up the story of how I got started in my career as an independent statistical consultant. Here's the original question: I'm very curious as to what events or conversations enabled you to change direction in your career. What thought process did you go through? What resources did you use or uncover?
  20. P.Mean: How do I handle criticism (created 2010-05-21). Someone asked how I handle criticism. To be honest, I don't get criticized all that much. Possibly it is that I do very little that deserves criticism, and possibly, people are intimidated by the area I work in (unjustifiably intimidated, by the way, but many people are just plain scared of numbers). It is also important to note that most people don't like to share negative opinions directly. They certainly will tell others, of course, if something is wrong, but it takes some boldness and some bravery to confront a person directly.
  21. P.Mean: How to avoid charges of plagiarism (created 2010-05-15). I'm not an expert on this, but I got a question about how to avoid charges of plagiarism in a thesis, especially the sections of the thesis that reviewed existing research and theoretical background. Here's how I responded.
  22. P.Mean: Withdrawing from a study and taking your data with you (created 2010-05-15). Someone asked me what the phrase "you can withdraw from the study at any time" really means. Can a research subject withdraw and take their data with them (that is, ask that their data be expunged from the database)? What if they raise the objection after the data analysis is done, because they don't like the results of the study. Can they ask for their data to be expunged then? What if they raise the objection after the data is published?
  23. P.Mean: Lessons learned the hard way: don't throw good money after bad (created 2010-05-14). I am helping out with data management for a project involving 19 million records from an insurance database. The file is too big to be read into R in one piece, so I decided to read in successive segments of 100,000 records and then write them out again as separate files. This was a big mistake and showed me the importance of the saying: "Don't throw good money after bad."
  24. P.Mean: More discussion on instrumental variables (created 2010-05-03). I attended the May meeting of the KUMC Statistics Journal Club. The topic of discussion was a paper outlining the properties and applications of instrumental variables.

    April 2010 (7 entries)
     
  25. P.Mean: My life so far: fails to meet expectations (created 2010-04-21). I'm learning how to use LinkedIn, and there are some people on that site who ask general philosophical questions. Some are a bit silly but they are still fun to answer. One person asked people to apply the traditional performance evaluation categories (Exceeds expectations, Meets expectations, Fails to meet expectations) to their own lives. So here is what I wrote.
  26. P.Mean: Interpreting p-values in a published abstract, part 1 (created 2010-04-14). In one of my recent webinars, I asked people to read the following abstract and interpret the p-values presented within. The Outcome of Extubation Failure in a Community Hospital Intensive Care Unit: A Cohort Study. Seymour CW, Martinez A, Christie JD, Fuchs BD. Critical Care 2004, 8:R322-R327 (20 July 2004) Introduction: Extubation failure has been associated with poor intensive care unit (ICU) and hospital outcomes in tertiary care medical centers. Given the large proportion of critical care delivered in the community setting, our purpose was to determine the impact of extubation failure on patient outcomes in a community hospital ICU. Methods: A retrospective cohort study was performed using data gathered in a 16-bed medical/surgical ICU in a community hospital. During 30 months, all patients with acute respiratory failure admitted to the ICU were included in the source population if they were mechanically ventilated by endotracheal tube for more than 12 hours. Extubation failure was defined as reinstitution of mechanical ventilation within 72 hours (n = 60), and the control cohort included patients who were successfully extubated at 72 hours (n = 93). Results: The primary outcome was total ICU length of stay after the initial extubation. Secondary outcomes were total hospital length of stay after the initial extubation, ICU mortality, hospital mortality, and total hospital cost. Patient groups were similar in terms of age, sex, and severity of illness, as assessed using admission Acute Physiology and Chronic Health Evaluation II score (P > 0.05). Both ICU (1.0 versus 10 days; P < 0.01) and hospital length of stay (6.0 versus 17 days; P < 0.01) after initial extubation were significantly longer in reintubated patients. ICU mortality was significantly higher in patients who failed extubation (odds ratio = 12.2, 95% confidence interval [CI] = 1.5–101; P < 0.05), but there was no significant difference in hospital mortality (odds ratio = 2.1, 95% CI = 0.8–5.4; P < 0.15). Total hospital costs (estimated from direct and indirect charges) were significantly increased by a mean of US$33,926 (95% CI = US$22,573–45,280; P < 0.01). Conclusion: Extubation failure in a community hospital is univariately associated with prolonged inpatient care and significantly increased cost. Corroborating data from tertiary care centers, these adverse outcomes highlight the importance of accurate predictors of extubation outcome. It is a bit dangerous to read only the abstract, of course, but this was intended for a general illustration.
  27. P.Mean: Quiz about p-values (created 2010-04-14). In one of my webinars, I offered the following quiz question: A research paper computes a p-value of 0.45. How would you interpret this p-value? 1. Strong evidence for the null hypothesis; 2. Strong evidence for the alternative hypothesis; 3. Little or no evidence for the null hypothesis; 4. Little or no evidence for the alternative hypothesis; 5. More than one answer above is correct; 6. I do not know the answer. This is actually a bit of a trick question.
  28. P.Mean: Using entropy and the surprisal value to measure the degree of agreement with the consensus finding (created 2010-03-02). One of the research problems that I am working on involves evaluation of a subjective rating system. I have been using information theory to try to identify objects where the evaluators agree well and objects where the evaluators do not agree well. I also am working on identifying objects that an individual rater does poorly. The method is to measure when the surprisal of the category that a rater selected is much lower than the entropy (the average surprisal across all raters)
  29. P.Mean: What makes a good website (created 2010-04-07). Someone posed a series of questions  about what makes a perfect website design. I am not a big fan of "design" and tried to make that point in my responses.
  30. P.Mean: Should I learn R instead of SAS (created 2010-04-05). I got a question from a statistician beginning her career asking whether she should learn SAS or R. That's a very personal question and there is no perfect answer. Here is what I wrote.
  31. P.Mean: Dealing with a large text file that crashes your computer (created 2010-04-02). At a meeting, a colleague was describing a text file that he had received that had crashed his system. No way, I thought, could a simple text file crash your system. I offered to investigate and he was right. The text file crashed my system too, and repeatedly. Here's what I did to figure out how a simple text file could crash your computer.

    March 2010 (6 entries)
     
  32. P.Mean: What to say when any data analysis is pointless (created 2010-03-25). Someone on the MEDSTATS email discussion group asked for help. They were trying to establish a normal range or reference interval for a set of observations involving gastric emptying. The sample size, 14, was much too small to produce reliable results, but it got worse than that. For one of the outcomes, the result was fourteen zeros. What can you do with such a data set? What can you say? That a difficult question, and here is how I would approach such a problem.
  33. P.Mean: Calculating weights to correct for over and under sampling (created 2010-03-22). Someone asked how to use weights to adjust for the fact that certain strata in a study were recruited more vigorously than other strata. For example, suppose you sampled at four communities and noted the age distribution as 0-14 years, 15-39 years, and  40+ years. How would you adjust for differential age distributions.
  34. P.Mean: Ordinal surprisals (created 2010-03-20). Closely related to the concept of ordinal entropy is ordinal surprisals. The surprisal is the negative log base 2 of the probability, and if you multiply the probabilities with the surprisals and add them up, you get entropy. Can you define an ordinal surprisal in such a way that when you multiply the ordinal surprisals by the probabilities, you get the ordinal entropy?
  35. P.Mean: Can sex be an outcome variable (created 2010-03-16). Someone asked whether it was legitimate to use sex (gender) as a dependent variable or outcome variable in a logistic regression model. It seems wrong, on the face of it, to think that various factors can influence whether we are male or female. It actually is perfectly fine to use sex as an outcome variable. Here is how I would justify its use.
  36. P.Mean: Ordinal entropy (created 2010-03-11). I have been using the concept of entropy to evaluate a sperm morphology classification system and to identify aberrant records in large fixed format text files. Some of the data I have been using in these areas is ordinal with three levels, normal, borderline, and abnormal. In all of my work so far, I have treated all three categories symmetrically. So, for example, the entropy of a system where 50% of the probability is associated with normal and 50% is associated with borderline is 1. The entropy of a system where 50% of the probability is associated with normal and 50% is associated with abnormal is also 1. It has always bothered me a bit because it seems that the second case, where the probabilities are placed at the two extremes, should have a higher level of entropy. Here is a brief outline of how I think entropy ought to be redefined to take into account the ordinal nature of a variable.
  37. P.Mean: Finding duplicate records in a 19 million record database (created 2010-03-02). I was asked to help find duplicate records in a large database (19 million records). The suspected number of duplicates was suspected to be small, possibly around 90. My colleague's approach was running PROC FREQ in SAS on the "unique" id and then looking for ids that have a frequency greater than 1. That did not work--it took too long or it overloaded the system, or both. So I wanted to look at alternatives for identifying duplicate records that would do this more efficiently.

    February 2010 ( 9 entries)
     
  38. P.Mean: Is intuition real? (created 2010-02-25). Someone asked if intuition is real. My hunch is that intuition is may be real, but it is grossly overrated.
  39. P.Mean: Abstract submitted to Missouri Regional Life Sciences Summit (created 2010-02-13). Yesterday, I submitted the following abstract for a poster session in the Missouri Regional Life Sciences Summit. I'll find out on Monday if it will be accepted. "Slipped deadlines and sample size shortfalls in clinical trials: a proposed remedy using a Bayesian model with an informative prior distribution."
  40. P.Mean: Meta-analysis for a single mean estimate (created 2010-02-11). Someone noted that the usual meta analysis is carried out for the study on two treatment groups, usually for a difference in means. What if you had several studies estimating not a difference in means, but just estimates of a single mean. Could you conduct a meta-analysis in this situation?
  41. P.Mean: Exponential interpolation (created 2010-02-11). Someone wanted an exponential interpolation formula. It's not quite a statistics question, but it caught my interest.
  42. P.Mean: Fan page for The Monthly Mean (created 2010-02-11). I've been getting some advice about Facebook. One suggestion was to set up a "fan page". There are some differences between being a "friend" on Facebook and being a "fan".
  43. P.Mean: Humility is a good thing for researchers to have (created 2010-02-08). I've been writing a series of articles about the seven deadly sins of researchers. One of these sins is pride. I might need to talk about the alternative to pride, which is humility. I believe that researchers should adopt a humble outlook. Humility is often misunderstood as a bad thing. It is not.
  44. P.Mean: Consulting remotely versus consulting in person (created 2010-02-08). Someone was asking whether there is a trend in consulting to demand a local presence rather than allowing a consultant to work remotely. I was unable to comment on work trends, as I have only been an independent consultant for 14 months. I did point out, however, some of the issues associated with remote consulting.
  45. P.Mean: What are the characteristics of a good statistical consultant (created 2010-02-07). Someone was considering a career as a statistical consultant. Besides building up a network and gaining experience, what traits would be necessary to be successful in such a career?
  46. P.Mean: Proposed poster for the Missouri Regional Life Sciences Summit (created 2010-02-03). I am preparing a poster for the Missouri Regional Life Science Summit. The poster guidelines are a bit unusual in that there is only room for a four foot by four foot square poster. Normally, these posters can be much wider. The tentative title is "Slipped deadlines, sample size shortfalls, and a proposed Bayesian solution using an informative prior distribution" and here is a proposed abstract.

    January 2010 (7 entries)
     
  47. P.Mean: Facebook account (created 2010-01-25). Several people have been encouraging me to set up an account on Facebook. I did it this evening and two hours later, I had two friends.
  48. P.Mean: Abstracts for a possible upcoming talk (created 2010-01-20). I might be asked to give a talk in February and I wanted to offer two possible choices. Here are the titles and abstracts of those talks.
  49. P.Mean: SPSS or Stata? (created 2010-01-19). I am an SPSS user. Some of my friends are choosing to leave SPSS and learn STATA. What are the advantages of STATA over SPSS?
  50. P.Mean: Masters or Phd in Statistics? (created 2010-01-19). Someone asked me about careers in Statistics and if you get the best career with a Masters degree or a PhD. That's a very subjective choice and individual preferences should weigh strongly in your choice.
  51. P.Mean: Power calculations for comparison of Poisson counts across two groups (created 2010-01-11). Suppose you want to compare Poisson count variables across two groups. How much data would you need to collect? It's a tricky question and there are several approaches that you can consider.
  52. P.Mean: Where can I find free online textbooks (created 2010-01-07). Someone was away from their personal library for a while and needed a free online statistics reference book. With a free textbook, you get what you pay for, of course, but there are some exceptions.
  53. P.Mean: What is residual confounding (created 2010-01-06). Residual confounding is a frequent explanation for unusual research findings. Before I define the term and show an example, I need to address a more basic issue. The term "confounding" is used frequently but often without careful consideration of the true definition of the term. I tend to shy away from this term and typically use "covariate imbalance" instead.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2010-07-30. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at