This page is moving to a new website.

In a previous webpage, I discussed the concept of joint entropy, conditional entropy, and information. The information for two measurements is zero if the two measurements are statistically independent. Information increases between two measurements as the degree of dependence (either positive or negative) increases. I thought it would be helpful to visualize this relationship graphically.

Consider a two by two table such as shown below.

`x=0 x=1 Tot`

y=0 0.1 0.3 0.4

y=1 0.4 0.2 0.6

Tot 0.5 0.5 1.0You can compute the entropy of the four cell probabilities.

`e(X,Y) = -0.1*log`

_{2}(0.1)-0.3*log_{2}(0.3)-0.4*log_{2}(0.4)-0.2*log_{2}(0.2)

= 1.84This is the joint entropy of the row and column variable combined. You can also compute the marginal entropy of the row totals

`e(Y) = -0.4*log`

_{2}(0.4)-0.6*log_{2}(0.6)

= 0.97and the column totals

`e(X) = -0.5*log`

_{2}(0.5)-0.5*log_{2}(0.5)

= 1.00The difference between the sum of the two marginal entropies and the joint entropy is defined as the information shared between the two variables.

`i = 1.00 + 0.97 - 1.84`

= 0.13Consider a different 2 by 2 table, where the rows and columns are statistically independent.

`x=0 x=1 Tot`

y=0 0.2 0.2 0.4

y=1 0.3 0.3 0.6

Tot 0.5 0.5 1.0For this table, the joint entropy is 1.97 and the marginal entropies are 1.00 and 0.97. So in this example, the information would be

`i(X,Y) = 1.00 + 0.97 - 1.97`

= 0For the original example, you can also compute information by comparing the entropies of the conditional probabilities to the entropy of the unconditional probability.

`x=0 x=1 Tot`

y=0 0.25 0.75 1.00

y=1 0.67 0.33 1.00

Tot 0.50 0.50 1.00

`e(X|Y=0) = -0.25*log`

_{2}(0.25)-0.75*log_{2}(0.75)

= 0.81

`e(X|Y=1) = -0.67*log`

_{2}(0.67)-0.33*log_{2}(0.33)

= 0.91You need a bit of caution in that you shouldn't just compute a straight average these two values. The second row has a higher probability and so should be given slightly greater weight.

`e(X|Y) = p(Y=0)*e(X|Y=0) + p(Y=1)*e(X|Y=1)`

= 0.4*0.81 + 0.6*0.91

= 0.87

`i(X,Y) = e(X) - e(X|Y)`

= 1.00 - 0.87

= 0.13Now 0.13 seems like a small value for information, but it actually isn't. There is quite a bit of shared information between X and Y. If you don't know Y, X is split evenly. Knowing Y, you can change to a 3 to 1 split or into a 2 to 1 split in the opposite direction. A mosaic plot can help illustrate this.

A horizontal line is drawn slightly above the middle to reflect the 40/60 split between the first and second rows of the table. This splits the square into two regions. The first region is split vertically at the conditional probability for the first row (0.25). The second region is split at the conditional probability for the second row (0.67).

For a table with independence (see above) the lines for the two conditional probabilities are aligned.