P.Mean >> Category >> Unusual data (created 2007-06-20). 

These pages describe data analysis that does not fit easily into the more traditional categories of data analysis. If I get a sufficient number of pages on the same general topic, I will create a new category. Also see Category: Modeling issues, Category: Statistical theory. Other entries about unusual data can be found in the unusual data page at the StATS website.

2010

  1. P.Mean: More discussion on instrumental variables (created 2010-05-03). I attended the May meeting of the KUMC Statistics Journal Club. The topic of discussion was a paper outlining the properties and applications of instrumental variables.

    2009
     
  2. P.Mean: Generating multinomial random variables in Excel (created 2009-11-23). Someone asked how to generate six random integers subject to the conditions that the sum of those random integers had to equal a value, x. This is a classic description of a multinomial distribution. Unstated in the question, but assumed by me, was that each random integer had to have the same distribution. that forces the probability vector for the multinomial to be (1/6, 1/6, 1/6, 1/6, 1/6, 1/6).

Other resources:

Creative Commons License All of the material above this paragraph is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2010-05-06. The material below this paragraph links to my old website, StATS. Although I wrote all of the material listed below, my ex-employer, Children's Mercy Hospital, has claimed copyright ownership of this material. The brief excerpts shown here are included under the fair use provisions of U.S. Copyright laws.

2008

  1. Stats: Bootstrap estimates of the standard error (June 20, 2008). A regular correspondent (JU) on the MEDSTATS email discussion group asked about using the bootstrap to estimate the standard error of the mean in a simple case with 9 data values. He wanted to know why the commonly used approach in the bootstrap community was to use n instead of n-1 in the variance denominator. It seemed to him that n-1 would produce an unbiased estimate of the standard error and wanted to know if that was true just in this special case or true in general. He quoted from the book by Efron and Tibshirani that they felt that for most purposes either method would work well.
  2. Stats: A brief overview of instrumental variables (April 14, 2008). People will often ask me questions that are outside my area of expertise. Yes, I know you're shocked to hear this, but there are lots of areas of statistics where I only have a vague understanding. One of these questions was about instrumental variables. I could only offer a vague explanation, but I hope that is better than no explanation at all.

    2007

    2006
     
  3. Stats: Parametric tests for a ratio (October 27, 2006). Dear Professor Mean, I computed a variable, Y3, which is the ratio of two other variables, Y1 and Y2. Can I use a parametric test on this ratio?
  4. Stats: The problem with ranking ordinal scales (June 29, 2006). When I was young and naive, I thought that anytime you encountered ordinal data, it would make the most sense to use a test statistic based on ranks, such as the Mann-Whitney-Wilcoxon test or the Kruskal-Wallis test. Unfortunately, the ranks can sometime distort the true nature of an ordinal scale. I thought that I had provided an example of how ranks can distort things, but I could not find it this morning when someone asked a question relating to ordinal scales. So here is the example again.
  5. Stats: Randomization tests for paired data (January 24, 2006). The randomization test offers a lot of flexibility for analyzing data in ways well beyond what traditional tests might offer. Here's a simple example from the Chance Data Sets web page.

    2005

    2004
     
  6. Stats: Outcomes research (November 24, 2004). Someone asked me for a simple definition of outcomes research. I hemmed and hawed and could not come up with a good definition. It turns out that the Agency for Healthcare Research and Quality has a nice definition.
  7. Stats: Report cards (August 27, 2004). I'm working on a project looking at some outcomes that might eventually become part of a report card or benchmarking system. This is an area fraught with controversy and it needs to be handled very carefully. Here are a few references that I have accumulated that address some of these issues.
  8. Stats: Randomization test (July 14, 2004). I received some data from a project where the outcome measure was the degree of improvement after a treatment, with values of -1 (slight decline), 0 (no change), 1 (slight improvement), 2 (moderate improvement), and 3 (large improvement). The two treatments had quite different results. The old therapy had eight patients, three of whom showed a slight decline and five of whom showed no change. Among the eight patients in the new therapy, one showed no change, three showed a slight improvement, six showed moderate improvement, and two showed a large improvement. There are several approaches that you could try with this data. Even though I did not have a problem with computing averages, I was a bit nervous about the t-test. This data is clearly non-normal, and with the sample sizes as small as they are, I'd be worried about whether the t-test would be valid. An interesting alternative is the randomization test.
  9. Stats: McNemar's Test (June 17, 2004). I received an email asking how to test two correlated proportions to see if one proportion is significantly larger than another. This is a classic application of McNemar's test.
  10. Stats: Analyzing percentage data (May 24, 2004). I received one of those difficult to answer questions: how do I analyze my data when the outcome variable is a percentage. That depends a lot on the context of the problem. The first thing to look at is whether the percentage involves counts of some type, and if so, do you know the numerator and denominator. Instead, the percentage might be the ratio of two continuous measurements.

    2003

    2002

    2001
     
  11. Stats: Parametric versus nonparametric tests (July 30, 2001). Dear Professor Mean: When should I use a parametric test versus a non-parametric test?

    2000
     
  12. Stats: Outliers (January 28, 2000). Dear Professor Mean: I have recently conducted a survey of attitudes toward research from a professional group. There are some outliers (+/- 3SD) that I would eliminate , but others conducting the research with me feel that this might be a minority view, and should not be eliminate from the dataset......any views or references that I should read to confirm my view, or theirs?
  13. Stats: Composite scores (January 27, 2000). Dear Professor Mean: I have developed a method to distinguish among several products that we need to buy so our company can make a good purchasing decision. I created a composite score which is a weighted average of several different indicators of quality. I want to use statistics to determine when two different products have significantly different composite scores.
  14. Stats: Mixture models (January 27, 2000). Dear Professor Mean: I have read a journal article where the authors used a mixture model . What is this?
  15. Stats: Physician Performance Data (January 27, 2000). Dear Professor Mean: Producing statistics of physician performance or group performance or whatever seems to be one of the great growth industries in medicine. Graphs of performance in just about anything seem to be produced - usually with something that looks at first glance like a normal distribution (and almost never with any statistical addenda). But I would like to know whether we can use them sensibly as anything other than pictures? In particular when I am one of the subjects of the analysis how do I interpret my own performance?
  16. Stats: Splines (January 27, 2000). Dear Professor Mean: Can you send me a basic definition of splines?
  17. Stats: Bootstrap (January 26, 2000). Dear Professor Mean: I've heard a lot about how the bootstrap is going to revolutionize statistics. How does the bootstrap work?

    1999
     
  18. Stats: Injury index creation (September 23, 1999). Dear Professor Mean: I want to create an injury index that describes the severity of an injury to a child. This would include information about the type of injury, the location of the injury, the age of the child, etc. What's the best way to do this?
  19. Stats: Chi-square (September 3, 1999). Dear Professor Mean: Can the Chi-squared test be used for anything besides categorical data?
  20. Stats: Page's test (September 3, 1999). Dear Professor Mean: I have recently come across a statistical test (Page's L test), with which I am unfamiliar. Does anyone either have information about this test or know where I might find information about it?

What now?

Browse other categories at this site

Browse through the most recent entries

Get help