P.Mean >> Category >> Statistical theory (created 2007-06-16).

These pages describe some of the more mathematical and/or technical aspects of Statistics. Also see Category: Statistical computing, Category: Unusual data.


19. P.Mean: What does it really mean to say that a mean of a large number of variables is approximately normal (created 2013-01-14). Someone was looking at the Wikipedia page for the normal distribution and noted a comment that read "Normal distributions are extremely important in statistics, and are often used in the natural and social sciences for real-valued random variables whose distributions are not known.[1][2] One reason for their popularity is the central limit theorem, which states that, under mild conditions, the mean of a large number of random variables independently drawn from the same distribution is distributed approximately normally, irrespective of the form of the original distribution." What does this mean exactly?


18. P.Mean: Reviewing how the binomial and negative binomial distributions work (created 2012-05-17). When you look at the binomial distribution and the negative binomial distribution side by side, they look almost identical. But the subtle differences are important. I was working on some problems involving these two distributions and thought it might be helpful to review their properties. These properties are indeed well known, but I wanted to get comfortable with them before I started tackling some more complex alternatives to these two distributions.

17. P.Mean: A simple example of change of variable (created 2012-05-15). I need to review some basic mathematical statistics in order to understand some of the Bayesian accrual models that I am developing. One of those things, that is actually quite easy, but I seem to have some trouble with is the method known as "change of variable." This is a method that allows you to characterize the probability distribution of a random variable that is transformed by a simple function. I wanted to illustrate how this works for a simple, but not trivial case, just to prove to myself that this works.


16. What is regression to the mean? (July/August 2011)


15. P.Mean: Poem to help you remember the quotient rule (created 2010-11-26). I was working on some derivatives then involved a fraction, and the formula is a bit tricky to remember. There was a short poem that I learned a long time ago for the derivative of a fraction, and I can't find it anywhere on the Internet. There are some variants that are close, but nothing quite like the poem I remember. Everything important has to be found somewhere on the Internet, so I am posting the poem here. If anyone can attribute this poem to the original source, please let me know.


14. P.Mean: Rotating locations (created 2009-11-02). Someone asked about holding a series of meeting with subgroups of people and wanted to insure during any round of the meetings that people would meet at a different location than the previous round and with a different mix of people. So on the first round of meetings, Allen, Barb, Charlie, and Denise would meet at location E and Fred, Gina, Harry, and Iona would meet at location J. On the next round, you'd mix things up so that it wasn't the same four people at the same location.

13. P.Mean: What is the effect of an unmeasured covariate? (created 2009-06-09). Suppose you want to conduct an analysis of covariance, but you have data on some but not all of the covariates. What do you miss out on because of the unmeasured covariate. To understand this, we need to venture in to the world of partitioned matrices.

Other resources:

Approximation Theorems of Mathematical Statistics Description: Serfling's book provides all the mathematical theory needed to establish that something is asymptoticly normal. This book is for students who want more mathematical details.

Creative Commons License All of the material above this paragraph is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2017-06-15. The material below this paragraph links to my old website, StATS. Although I wrote all of the material listed below, my ex-employer, Children's Mercy Hospital, has claimed copyright ownership of this material. The brief excerpts shown here are included under the fair use provisions of U.S. Copyright laws.


12. Stats: Can the standard deviation be more than half of the range? (June 22, 2007). Dear Professor Mean, I was trying to work with some simple data sets to see how large I could make the standard deviation relative to the range. I know the standard deviation can never be larger than the range, but I can't seem to get it to be larger than half the range.

11. Stats: Compound interest and powers (February 11, 2007). In some of my mathematical calculations, I end up computing an expression that involves a number very close to one raised to a very large power. This term can often be approximated by an exponential function, but I can never quite remember the relationship. An example involving compound interest may help me remember better in the future.


10. Stats: Mathematical and statistical challenges (December 13, 2006). A regular poster on the EDSTAT-L list (DR) mentioned an interesting page on the IBM website, www.research.ibm.com/ponder, that offers a monthly puzzle on mathematical topics.


9. Stats: Testing for bimodality (May 3, 2005). I have talked about this topic before and it is a rather tricky thing. A recent discussion of tests of bimodality on edstat-l, though, yielded a few promising leads relating to the Dip test, which is described in: The Dip Test of Unimodality, Hartigan JA, Hartigan PM. The Annals of Statistics, v13(1):70-84, 1985.

8. Stats: A surprising application of the harmonic mean (February 1, 2005). The radio show, Car Talk, has a puzzle that they read every week on the show. Usually, it is some unusual or unexpected problems with automobiles, but Ray and Tom Magliozzi also will toss in a mathematical puzzle from time to time. A recent car talk puzzler, www.cartalk.com/content/puzzler/transcripts/200505/index.html, discusses a family that has two cars, one which gets 10 miles per gallon and the other gets 100 mpg.


7. Stats: Simpson's Paradox (December 22, 2004). Someone wrote to the Evidence Based Health email discussion group about a theoretical situation where someone had an estimated risk of disease based on a study that showed the degree to which factors a, b, and c might influence disease status. Suppose in a different study, a factor d was shown to double the risk of disease. What could you then say about the probability of disease among a patient who has a, b, c, and d? You would think that someone with a, b, c, and d should have a greater risk of disease than someone with just a, b, and c. The answer unfortunately, is that nothing is predictable here, and it is possible for someone with a, b, c, and d to have a lower risk even though a study looking at d showed a doubling of risk. You have to watch out for Simpson's Paradox.

6. Stats: Searching for bimodality (August 4, 2004). One of the people I work with is always looking for hidden subgroups in his data. For him, this is a starting point for exploring for genetic variations. That's an admirable activity, but it is remarkably difficult to thing to do in practice. The first step is to see if the distribution of values for some measurement has a bimodal distribution. A second mode is an indication of a subgroup of patients that may have a genetic variation from the rest of the patients.

5. Stats: Missing values (June 22, 2004). Someone here at the hospital asked me how to do a reliability analysis on a 20 item measure where a large number of participants left a single item blank. There are several approaches that work, but you need to exercise a bit of caution.

4. Stats: Degrees of freedom, Part 2 (April 15, 2004). I received an email inquiry about degrees of freedom. I explain the concept briefly, but this person wanted a more detailed answer to the question, why do we use n-1 in the calculation of the standard deviation and not n?


3. Stats: Maximum likelihood estimation (May 6, 2003). Dear Professor Mean: What is maximum likelihood estimation and how does it work?


2. Stats: Stein's paradox (January 27, 2000). Dear Professor Mean: What is "Stein's Paradox?"


1. Stats: Degrees of Freedom (September 3, 1999) Dear Professor Mean, In your Simple Descriptive Statistics class, you described the standard deviation as the square root of the average squared deviation. If it is an average, how come we divide by the degrees of freedom (n-1) rather than n. Is this just a conspiracy among statisticians to make this stuff harder to understand.

What now?

Browse other categories at this site

Browse through the most recent entries

Get help