The SPSS t-test is confusing (created 2010-06-29).

This page is moving to a new website.

I have always disliked how SPSS (now IBM SPSS) presented the output from their independent samples t-test. I want to explain why it is confusing and show you an alternative based on the general linear model.

First let me show you the dialog boxes that you use for the independent samples t-test. If you select ANALYZE | COMPARE MEANS | INDEPENDENT- SAMPLES T TEST from the menu, you will get the following dialog box.

The dependent or outcome variable goes into the TEST VARIABLE(S) field and the grouping variable indicating which of two groups the outcome variable belongs to goes into the GROUPING VARIABLE field. In this data set, Price is the outcome variable and CustomBuild is the variable that tells you which group a data value belongs to. Here, the t-test will be used to examine the (patently obvious) claim that custom built houses cost more, on average, than regular houses. When you fill the two fields with these variables, here's what you get.

This illustrates my first complaint. The CustomBuild variable has two possible values, "Yes" and "No" but for some reason SPSS cannot figure this out. It does just fine for more complex procedures, such as ANOVA. This is surprising. Surely figuring out what the two possible categories cannot be a more difficult task than figuring out what the possible categories are among unspecified number of categories. Actually, I suspect this is historical and a throwback to the days when doing t-tests among all possible pairs of groups was still a common choice.

To define what the two levels are, you need to click on the DEFINE GROUPS button. This brings up another dialog box where you tell SPSS what it couldn't figure out.

When you click on the CONTINUE button, SPSS now recognizes the two groups.


Another complaint, and more serious, is the lack of options available. If you click on the options button, here is what you see.

Notice that the only things you can do are to change the confidence level and to select between a bad but tolerable missing value choice (exclude cases analysis by analysis) and an even worse choice in most settings (exclude cases listwise). I know it is too much to expect a choice for multiple imputation, but there are other options here that I wish I had control over. It turns out that the output in SPSS is very very confusing and I would have liked to suppress some of the stuff I don't like.

Here's what the output looks like when printed (it is slightly different on the screen because more information can fit horizontally on the screen).

The first few numbers are useful. You see that there are 27 custom built houses and 90 regular houses. The means are $145,000 and $95,000 respectively. The standard deviations are $48,000 and $25,000 respectively. Let's not comment on the standard errors right now.

If you were compiling a table of descriptive statistics, these first few numbers would be included in that table. Here's one way of presenting that data.

                             Custom-built  Regular
                             (n=27)        (n=90)

Price (thousands of dollars) 145 +/- 48    95 +/- 25

I always prefer to present the standard deviation rather than the standard error, but if the journal that you are writing for seems to cite the standard error more commonly, please follow their convention.

The next set of numbers really bothers me. It presents the Levene's test for homogeneity. Homogeneity is an assumption required by the traditional t-test. Homogeneity means that the two populations that you are sampling from have the same population variance (or equivalently the same standard deviations). Heterogeneity occurs when the population variance of one group is larger or smaller than the population variance of the second group. The assumption of homogeneity is not a truly critical assumption. The traditional t-test does just fine if there is a moderate amount of heterogeneity, especially when the sample sizes are the same in each group.

I do not recommend Levene's test and I would love to find a way to suppress it. But that is not an option. Please ignore the F value (24.8) and the p-value (.000). This p-value is frequently and mistakenly presented as the p-value of the t-test, but it is not.

Levene's test is that it is overly sensitive, especially for large sample sizes and can detect trivial amounts of heterogeneity. Levene's test also is highly influenced by the assumption of normality. So if Levene's test is bad, what should you do when you are worried that the variances might be unequal? I recommend that you not worry about this unless there is a strong a priori reason to believe that the variances might be unequal. Or I recommend that you assess heterogeneity in a qualitative fashion (is the larger standard deviation more than three times as big as the smaller population standard deviation? Is there evidence of heterogeneity in previous closely related studies?).

The right hand side of the second table, combined with the third table presents another area of confusion. SPSS reports two t-tests (7.2 and 5.2), two degrees of freedom (116 and 30.6), two p-values (both, thankfully are .000 but they can sometimes differ), two mean differences (they are always the same value, here $50,000), two standard errors ($7,000 and $5,200) and two confidence intervals ($36,000 to $64,000 and $30,000 to $69,000). Which t-test do I use and how do I report it.

I recommend that you always use the results in the first row (equal variances assumed) and ignore the results in the second row (equal variances not assumed), unless there is a strong a priori reason to believe that there is serious heterogeneity. This would come from previous studies using the same outcome variable.

Now other people prefer to always use the second row, but it has two assumptions rather than the tree assumptions needed for the first row. Others will let the results of Levene's test dictate whether to use the equal variances assumed row or the equal variances not assumed.

Now why oh why did SPSS not allow us to make the choice as to which t-test to display? I would love to suppress that second row as it just causes needless confusion and anxiety.

How should you report these results? It depends. Always report the confidence interval. Some journals and reviewers like to see the p-value also. A few places might encourage you to present the t-statistic and the degrees of freedom. Here's an example of how you might report these results as text in the results section.

There is a statistically significant difference in average house prices (95% CI $36,000 to $64,000, p=0.001, t=7.2, df=116).

When I teach people how to use SPSS, I deliberately avoid showing them the independent samples t-test. Instead, I encourage people to use a more complicated procedure, the general linear model. Note that this is not the same as the generalized linear model, an unfortunate terminology confusing that rivals sensitivity and specificity for its ability to confuse people.

The general linear model lives up to the adjective in its name as it is a very general procedure. It can conduct an independent samples t-test, as we shall see, but it also is capable of producing an analysis of variance model, a linear regression model, an analysis of covariance model and many others. It is a "one stop shopping" source for many of your basic data analysis needs.

Select ANALYZE | GENERAL LINEAR MODEL | UNIVARIATE from the SPSS menu. Here's the dialog box you get.

The outcome variable goes in the DEPENDENT VARIABLE field. The group variable which tells you whether a data value goes in the first or the second group goes in the FIXED FACTORS field. For the independent samples t-test, you would leave the rest of the fields empty. Here's what the dialog box looks like for our example.

There are two options that I suggest. Click on the OPTIONS button to get this dialog box.

Check the DESCRIPTIVE STATISTICS and PARAMETER ESTIMATES options here and then click on the CONTINUE button in this dialog box and the OK button in the previous dialog box.

Here is what the output looks like (again, this is a printed version but it should be pretty close to what appears on the screen).

The first table lists the sample sizes and the second table lists the means, standard deviations. The sample sizes are repeated a second time (huh?) but this is not too confusing. The third table (tests of between subject effects) is only of interest in a multi-factor ANOVA model and we can safely ignore it here. The last table presents rows for the intercept and CustomBuild=No. The intercept row presents the average price in the reference category (CustomBuild=Yes), the standard error ($6,000), the t-statistic for testing whether this mean is zero (23.7, but this is an irrelevant test in this and most other settings), and a p-value and confidence limits for the average price in the reference category (.000 and $132,000 to $158,000, respectively).

The second row (CustomBuild=No) presents the exact same mean difference, standard error, t-statistics, p-value, and confidence limits as the independent samples t-test output did, with one big exception. The independent samples t-test computed the mean difference as custom-built - regular and the general linear model computed the mean difference as regular - custom-built. There is no obvious reason to prefer one way of subtracting versus the other. Some people prefer to subtract the control mean from the treatment mean, some prefer to subtract the smaller mean from the larger mean, but these are arbitrary choices and there is no way that a computer could recognize all of the different conventions that people use.

But the data is effectively the same. To say that custom-built houses cost about $50,000 more on average is not really any different than saying that regular houses cost about $50,000 less.

Now the second set of output is not really that much less confusing than the first set, but the big advantage is that once you understand how to read the general linear model output for an independent samples t-test, you are that much closer to understanding the general linear model output for ANOVA, regression, analysis of covariance, and so forth. The general linear model also allows you to do things like risk adjustment.