These pages describe data
analysis that does not fit easily into the more traditional categories of data
analysis. If I get a sufficient number of pages on the same general topic, I
will create a new category. Also see Category: Modeling issues, Category: Statistical theory. Other entries about unusual data can be found in the
unusual
data page at the
StATS website.
2010
- P.Mean: More discussion on
instrumental variables (created 2010-05-03). I attended the May meeting of
the KUMC Statistics Journal Club. The topic of discussion was a paper
outlining the properties and applications of instrumental variables.
2009
- P.Mean: Generating multinomial
random variables in Excel (created 2009-11-23). Someone asked how to
generate six random integers subject to the conditions that the sum of those
random integers had to equal a value, x. This is a classic description of a
multinomial distribution. Unstated in the question, but assumed by me, was
that each random integer had to have the same distribution. that forces the
probability vector for the multinomial to be (1/6, 1/6, 1/6, 1/6, 1/6, 1/6).
Other resources:
- Kanji GK. One hundred statistical tests. 3rd ed. Thousand Oaks,
Calif: Sage Publications; 2006.
Description: Gopal Kanji lists specific details of many statistical
tests, some quite obscure. This book is for students who want more
mathematical details.
- Gunnes N, Seierstad T, Aamdal S, et al. Assessing quality of life in
a randomized clinical trial: Correcting for missing data. BMC Medical
Research Methodology. 2009;9(1):28. Available at:
http://www.biomedcentral.com/1471-2288/9/28 [Accessed May 20, 2009].
Excerpt: Use of proper methodology developed for analysing data subject
to missingness is necessary to reduce potential estimation bias. The quality
of life of patients receiving radiation therapy with concurrent chemotherapy
(docetaxel) appears somewhat worse than that of patients receiving radiation
therapy alone in the period during which treatment is given. The conclusions
are robust for the choice of statistical methods.
- Westaby S, Archer N, Manning N, et al. Comparison of hospital episode
statistics and central cardiac audit database in public reporting of
congenital heart surgery mortality. BMJ. 2007;335(7623):759. Available
at:
http://www.bmj.com/cgi/content/abstract/335/7623/759 [Accessed March 4,
2009]. Description: One of the more lively debates in medicine
today is the use of report cards to summarize performance of hospitals and/or
individual physicians. This paper takes individual statistics compiled by
hospitals (hospital episode statistics) and compares them to a centralized
database. There are large discrepancies between the two, and the authors
suggest that individual hospitals should spend the effort to more rigorously
collect and validate their data.
- Micheloud F. Jean Paul Benzécri's Correspondence Analysis.
Available at:
http://www.micheloud.com/FXM/COR/E/index.htm [Accessed March 4, 2009].
Excerpt: This paper is an introduction to correspondence analysis, a
statistical method allowing to analyze and describe graphically and
synthetically big contingency tables, that is tables in which you find at
the intersection of a row and a column the number of individuals who share
the characteristic of the row and that of the column.
- Rigdon E. Frequently Asked Questions about SEM. Available at:
http://www2.gsu.edu/~mkteer/semfaq.html [Accessed March 4, 2009]. Description: This is the first place you should
look if you have questions about Structural Equation Models.
- Wikipedia. Instrumental variable. Available at:
http://en.wikipedia.org/wiki/Instrumental_variable [Accessed March 4,
2009].
Excerpt: In statistics and econometrics, an instrumental variable (IV, or
instrument) can be used to produce a consistent estimator of a parameter
when the explanatory variables (covariates) are correlated with the error
terms. Such correlation can be caused by endogeneity, by omitted covariates,
or by measurement errors in the covariates. In this situation, ordinary
linear regression produces biased and inconsistent estimates. However, if an
instrument is available, consistent estimates may still be obtained. An
instrument is a variable that does not itself belong in the explanatory
equation, that is correlated with the suspect explanatory variable, and that
is uncorrelated with the error term.
- Kenny DA. SEM: Instrumental Variables. Available at:
http://davidakenny.net/cm/iv.htm
[Accessed March 4, 2009]. Excerpt: One way of identifying models that cannot be
estimated by using multiple regression is through the use of instrumental
variables. For path analysis, the disturbance must not be correlated with
each causal variable. There are three reasons why such a correlation might
exist: * Spuriousness (Third Variable Causation): A variable causes both the
endogenous variable and one its causal variables and that variable is not
included in the model. * Reverse Causation (Feedback Model): The endogenous
variable causes, either directly or indirectly, one of its causes. *
Measurement Error: There is measurement error in a causal variable.
- Michel Chavance, Sylvie Escolano, Monique Romon, et al. Latent
variables and structural equation models for longitudinal relationships: an
illustration in nutritional epidemiology. BMC Medical Research
Methodology. 2010;10(1):37. Abstract: "BACKGROUND: The use of structural
equation modeling and latent variables remains uncommon in epidemiology
despite its potential usefulness. The latter was illustrated by studying
cross-sectional and longitudinal relationships between eating behavior and
adiposity, using four different indicators of fat mass. METHODS: Using data
from a longitudinal community-based study, we fitted structural equation
models including two latent variables (respectively baseline adiposity and
adiposity change after 2 years of follow-up), each being defined, by the
four following anthropometric measurement (respectively by their changes):
body mass index, waist circumference, skinfold thickness and percent body
fat. Latent adiposity variables were hypothesized to depend on a cognitive
restraint score, calculated from answers to an eating-behavior questionnaire
(TFEQ-18), either cross-sectionally or longitudinally. RESULTS: We found
that high baseline adiposity was associated with a 2-year increase of the
cognitive restraint score and no convincing relationship between baseline
cognitive restraint and 2-year adiposity change could be established.
CONCLUSIONS: The latent variable modeling approach enabled presentation of
synthetic results rather than separate regression models and detailed
analysis of the causal effects of interest. In the general population,
restrained eating appears to be an adaptive response of subjects prone to
gaining weight more than as a risk factor for fat-mass increase."
[Accessed May 6, 2010]. Available at:
http://www.biomedcentral.com/1471-2288/10/37.
- Kelly PA. Overview of Computer-Intensive Statistics. Available
at:
http://www.hsrd.houston.med.va.gov/AdamKelly/resampling.html [Accessed
March 4, 2009]. Description: This page provides a nice overview of the
permutation test, randomization test, Monte Carlo estimation, bootstrapping,
the jackknife, and Markov Chain Monte Carlo methods.
- Walters S, Campbell M. The use of bootstrap methods for analysing
health-related quality of life outcomes (particularly the SF-36). Health
and Quality of Life Outcomes. 2004;2(1):70. Available at:
http://www.hqlo.com/content/2/1/70 [Accessed March 4, 2009].
Description: The article provides an illustrative example of how to use the
bootstrap method.
- Li P. The Zoo of Loglinear Analysis. Available at:
http://facultystaff.richmond.edu/~pli/psy538/loglin02/index.html
[Accessed March 4, 2009].
Excerpt: "Loglinear Analysis is a multivariate extension of Chi Square.
You use Loglinear when you have more than two qualitative variables. Chi
Square is insufficient when you have more than two qualitative variables
because it only tests the independence of the variables. When you have more
than two, it cannot detect the varying associations and interactions between
the variables. Loglinear is a goodness-of-fit test that allows you to test
all the effects (the main effects, the association effects and the
interaction effects) at the same time."
All of the material above this paragraph is licensed under a
Creative Commons Attribution 3.0 United States License. This page was written by
Steve Simon and was last modified on
2010-05-06. The material
below this paragraph links to my
old website, StATS. Although I wrote all of the material
listed below, my ex-employer, Children's Mercy Hospital, has claimed copyright
ownership of this material. The brief excerpts shown here are included under
the fair use provisions of U.S. Copyright laws.
2008
- Stats: Bootstrap estimates
of the standard error (June 20, 2008). A regular correspondent (JU) on
the MEDSTATS email discussion group asked about using the bootstrap to
estimate the standard error of the mean in a simple case with 9 data values.
He wanted to know why the commonly used approach in the bootstrap community
was to use n instead of n-1 in the variance denominator. It seemed to him
that n-1 would produce an unbiased estimate of the standard error and wanted
to know if that was true just in this special case or true in general. He
quoted from the book by Efron and Tibshirani that they felt that for most
purposes either method would work well.
- Stats: A brief overview of
instrumental variables (April 14, 2008). People will often ask me
questions that are outside my area of expertise. Yes, I know you're shocked
to hear this, but there are lots of areas of statistics where I only have a
vague understanding. One of these questions was about instrumental
variables. I could only offer a vague explanation, but I hope that is better
than no explanation at all.
2007
2006
- Stats: Parametric tests for a
ratio (October 27, 2006). Dear Professor Mean, I computed a variable,
Y3, which is the ratio of two other variables, Y1 and Y2. Can I use a
parametric test on this ratio?
-
Stats: The
problem with ranking ordinal scales (June 29, 2006).
When I was young and naive, I thought that anytime you
encountered ordinal data, it would make the most sense to use a test
statistic based on ranks, such as the Mann-Whitney-Wilcoxon test or the
Kruskal-Wallis test. Unfortunately, the ranks can sometime distort the true
nature of an ordinal scale. I thought that I had provided an example of how
ranks can distort things, but I could not find it this morning when someone
asked a question relating to ordinal scales. So here is the example again.
- Stats: Randomization tests for
paired data (January 24, 2006). The randomization test offers a lot of
flexibility for analyzing data in ways well beyond what traditional tests
might offer. Here's a simple example from the Chance Data Sets web page.
2005
2004
- Stats: Outcomes research (November
24, 2004). Someone asked me for a simple definition of outcomes
research. I hemmed and hawed and could not come up with a good definition.
It turns out that the Agency for Healthcare Research and Quality has a nice
definition.
- Stats: Report cards (August 27, 2004).
I'm working on a project looking at some outcomes that might eventually
become part of a report card or benchmarking system. This is an area fraught
with controversy and it needs to be handled very carefully. Here are a few
references that I have accumulated that address some of these issues.
- Stats: Randomization test (July 14,
2004). I received some data from a project where the outcome measure was
the degree of improvement after a treatment, with values of -1 (slight
decline), 0 (no change), 1 (slight improvement), 2 (moderate improvement),
and 3 (large improvement). The two treatments had quite different results.
The old therapy had eight patients, three of whom showed a slight decline
and five of whom showed no change. Among the eight patients in the new
therapy, one showed no change, three showed a slight improvement, six showed
moderate improvement, and two showed a large improvement. There are several
approaches that you could try with this data. Even though I did not have a
problem with computing averages, I was a bit nervous about the t-test. This
data is clearly non-normal, and with the sample sizes as small as they are,
I'd be worried about whether the t-test would be valid. An interesting
alternative is the randomization test.
- Stats: McNemar's Test (June 17, 2004). I
received an email asking how to test two correlated proportions to see if
one proportion is significantly larger than another. This is a classic
application of McNemar's test.
- Stats: Analyzing percentage data (May 24,
2004). I received one of those difficult to answer questions: how do I
analyze my data when the outcome variable is a percentage. That depends a
lot on the context of the problem. The first thing to look at is whether the
percentage involves counts of some type, and if so, do you know the
numerator and denominator. Instead, the percentage might be the ratio of two
continuous measurements.
2003
2002
2001
- Stats: Parametric versus nonparametric
tests (July 30, 2001). Dear Professor Mean: When should I use a
parametric test versus a non-parametric test?
2000
- Stats: Outliers (January 28, 2000).
Dear Professor Mean: I have recently conducted a survey of attitudes toward
research from a professional group. There are some outliers (+/- 3SD) that I
would eliminate , but others conducting the research with me feel that this
might be a minority view, and should not be eliminate from the
dataset......any views or references that I should read to confirm my view,
or theirs?
- Stats: Composite scores (January 27,
2000). Dear Professor Mean: I have developed a method to distinguish
among several products that we need to buy so our company can make a good
purchasing decision. I created a composite score which is a weighted average
of several different indicators of quality. I want to use statistics to
determine when two different products have significantly different composite
scores.
- Stats: Mixture models (January 27, 2000).
Dear Professor Mean: I have read a journal article where the authors used a
mixture model . What is this?
- Stats: Physician Performance Data
(January 27, 2000). Dear Professor Mean: Producing statistics of
physician performance or group performance or whatever seems to be one of
the great growth industries in medicine. Graphs of performance in just about
anything seem to be produced - usually with something that looks at first
glance like a normal distribution (and almost never with any statistical
addenda). But I would like to know whether we can use them sensibly as
anything other than pictures? In particular when I am one of the subjects of
the analysis how do I interpret my own performance?
- Stats: Splines (January 27, 2000).
Dear Professor Mean: Can you send me a basic definition of splines?
- Stats: Bootstrap (January 26, 2000).
Dear Professor Mean: I've heard a lot about how the bootstrap is going to
revolutionize statistics. How does the bootstrap work?
1999
- Stats: Injury index creation (September 23,
1999). Dear Professor Mean: I want to create an injury index that
describes the severity of an injury to a child. This would include
information about the type of injury, the location of the injury, the age of
the child, etc. What's the best way to do this?
- Stats: Chi-square (September 3, 1999).
Dear Professor Mean: Can the Chi-squared test be used for anything besides
categorical data?
- Stats: Page's test (September 3, 1999).
Dear Professor Mean: I have recently come across a statistical test (Page's
L test), with which I am unfamiliar. Does anyone either have information
about this test or know where I might find information about it?
What now?
Browse other categories at this site
Browse through the most recent entries
Get help