What is a point biserial correlation?
The point biserial correlation is a measure of association between a continuous variable and a binary variable. It is constrained to be between -1 and +1.
Calculation of the point biserial correlation
Assume that X is a continuous variable and Y is categorical with values 0 and 1. Compute the point biserial correlation using the formula
where
This is mathematically equivalent to the traditional correlation formula. The interpretation is similar. The point biserial correlation is positive when large values of X are associated with Y=1 and small values of X are associated with Y=0.
Examples
FB represents postural sway in the forward-backward direction and is continuous. SS represents postural sway in the side-side direction and is also continuous. AGE_GRP represents the age group (0=Young, 1=Elderly) and is binary.
FB and SS show a strong positive correlation with each other and a moderate correlation with age group.
Postural sway correlations.
Comparison of the point biserial correlation to boxplots
This is a boxplot of FB sway for each age group.
This is a plot of SS sway for each age group. Notice for both this and the previous graph that the elderly age group tends to have higher sway scores than the young group. Even so, there is still a large amount of overlap between these groups, which is why the point biserial correlations are only moderately positive.
The next few pages will show some correlations using data from a breast feeding study I was involved with.
In a study of breastfeeding, the point biserial correlation between exclusive breastfeeding at discharge and distance from the hospital is -0.06.
Notice that there is little or no association between distance and breast feeding. Exclusive breast feeders tended to live at a wide range of distances from the hospital and so did the non breast feeders.
The point biserial correlation between exclusive breastfeeding and mother’s age is 0.37.
Notice that exclusive breast feeders were more likely to have older mothers and the non exclusive breast feeders were more likely to have young mothers. There still remains a large overlap between the two groups, as is indicated by the moderately positve correlation.
The point biserial correlation between exclusive breastfeeding at discharge and age at discharge is -0.27.
Notice that exclusive breast feeders were more likely to have shorter stays at the hospital (younger ages at discharge) and the non exclusive breast feeders were more likely to have longer stays.
Again, the two groups still show a good degree of overlap, which is why the correlation is only weakly negative.
This work is licensed under a Creative Commons Attribution 3.0 United States License. It was written by Steve Simon on 2005-08-18, edited by Steve Simon, and was last modified on 2010-04-01. This page needs minor revisions. Category: Definitions, Category: Measuring agreement.