P.Mean: Ordinal surprisals (created 2010-03-20).

Closely related to the concept of ordinal entropy is ordinal surprisals. The surprisal is the negative log base 2 of the probability, and if you multiply the probabilities with the surprisals and add them up, you get entropy. Can you define an ordinal surprisal in such a way that when you multiply the ordinal surprisals by the probabilities, you get the ordinal entropy?

First, let's review the concept of surprisal. If an event has probability p, then the surprisal of the event is -log2(p). An even with probability 1/32 has a surprisal of 5 and corresponds to an event as surprising as getting five consecutive heads in the flip of a coin. The surprisal does not change on the basis of probability of other events. Suppose for example that there are three events, with probability, p1, p2, and p3. The surprisal associated with the first event is -log2(p1) and does not change with differing values of p2 and p3.

The image above shows regions where the surprisal for p1 is less than 1 (red), between 1 and 2 (orange), between 2 and 3 (yellow), etc. A surprisal of 2 occurs on the border between the orange and yellow regions. This corresponds to p1=0.25, regardless of the values of p2 and p3.

Similarly, the surprisal for p2 equals 2 when p2=0.25, regardless of the values of p1 and p3.

And, of course, the surprisal for p3 equals 2 when p3=0.25, regardless of the values of p1 and p2.

Now how would you define the surprisal for ordinal entropy? The ordinal entropy involves calculations involving the cumulative probabilities and their complements. Calculate

and add up all the e's.

The surprisal value for ordinal entropy is simply the sum of surprisals associated with the respective cumulative probabilities. The ordinal surprisal for a probability p3 associated with the third of five ordinal categories would simply be the sum of the surprisals associated with p2+p3+p4+p5, p3+p4+p5, p1+p2+p3, p1+p2+p3+p4. This list seems confusing, but it helps to review a display of all the cumulative probabilities and their complements. The surprisal for each row is the choice of whichever between the cumulative probability and the complement includes p3.

If you define the surprisals this way, it is not difficult to show that the sum of the products of the probabilities and the ordinal surprisals will equal the ordinal entropy.

This graph shows the ordinal surprisal values for p1. Notice that a surprisal of 2 corresponds to anything from a 25% probabilitiy to a 50% probability, depending on how the remainder of the probability is distributed across the other two categories. A probability of 25% is not too surprising if the remainder of the probability is concentrated at the adjacent category. But when the remaining probability is concentrated at the other extreme, the surprisal of 2 corresponds to a much larger probability, 50%.

The graph above shows the ordinal surprisals for p2. This pattern is very surprising. Any probability from 25% to 0% could represent a surprisal of 2, depending on how the two extreme probabilities are distributed. The point at the middle bottom of the graph seems a bit odd. The surprisal for a probability of zero is infinite, but in the world of ordinal data, a middle choice when the two outside choices are split evenly between each other can't be seen as too surprising, no matter how small.

Here are the ordinal surprisals for p3, which is a mirror image of the ordinal surprisals for p1.

The relationship between ordinal surprisals and ordinal entropy will help identify particular levels of an ordinal variable that do not "fit" well with the bulk of the data. By highlighting these values, you can discover interesting features of your data. I hope to show some examples of this in future data analyses.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2017-06-15. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Information theory.