|P.Mean: Bad scaling choices for the SPSS ROC curve (created 2012-04-09).
News: Sign up for "The Monthly Mean," the newsletter that dares to call itself average, www.pmean.com/news.
I was helping a colleague with an ROC curve in SPSS and when he drew the curve, I couldn't believe what I saw.
I've added some annotations in purple. Notice how the veritical axis (Sensitivity) has values of 0 and 1 at the lower left and upper left corners. This is good.
But the horizontal axis is all messed up. The corners are offset by about 20% in either direction from the minimum and maximum values. This is a horrible default, for two reasons.
First, it treats the two axes differently. An ROC curve should be square, since the two quantities being graphed are on the same scale and have (for most examples) the same minimum and maximum values.
Second, the scaling on the horizontal axis is EXTREMELY wasteful of space. There are times that you want a bit of extra room below the minimum and above the maximum values. This can prevent plotting symbols from being clipped or intersecting with an axis. But 20%??? Who in heavens name decided that you needed room for essentially a probability scale that goes all the way down to -0.2 and all the way up to 1.2?
What's worse is that you can't change the scaling. Double click on the graph to call up the graph editor and then right click on the horizontal axis to get the Properties dialog box.
SPSS somehow thinks that the two margins are already at 0%. If you try to change these numbers nothing happens. This is a wretched state of affairs.
Compare this to the Kaplan-Meier curve.
In this graph, SPSS places a 5% margin at each corner for both the vertical and horizontal axis. It's reasonable, and you can, if you like, do away with these margins.
The boxplot in SPSS is a little weird
There is no margin at the bottom of the graph, but a generous margin (10%) at the top. That's not my first choice for margins, but it's not bad and it is easy to change.
The scaling in the ROC curve module is clearly a bug, but it appears on two different computers, so I suspect it is a global bug rather than a bug specific to any one computer.
Epilog: I tried dumping the data values (sensitivity, 1-specificity) to Excel and then read the data in. It turns out that this doesn't work either because SPSS can't do a decent job interpolating.
Here are the data points.
I wanted to draw a straight line interpolation connecting all these data points, but SPSS cannot do this. When SPSS encounters multiple Y values for a single X value, it forces any interpolation line through the average of those Y values.
This is a linear interpolation. It looks almost like a cure around the point 0.5, 0.7, but actually there are four very short line segments if you look closely.
This is a stair step interpolation. There are actually three variations on the stair step interpolation, but none of them work. You could fix this by removing the multiple Y values except for the maximum one, but at this point I was so disgusted with SPSS that I switched to R. It turns out that R had its own issues, but I eventually got it to work.
I don't want to pick too much on SPSS. Every program has its idiosyncracies and there are reasons why a program like SPSS might prefer to interpolate to the average value when there are multiple Y values for a single X value. When I get over my anger, I'll show you how to fix this in SPSS and maybe I'll share the R code as well.
This page was written by Steve Simon and is licensed under the Creative Commons Attribution 3.0 United States License. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Spss Software.