P.Mean: A false sense of frugality (created 2008-12-17).

This page is moving to a new website.

A while back I received a data set that was very well documented, but there was one thing that I wish that the data entry person had not done. The demographic data was listed as 45f, 52m, 22m, 21f, etc. This was obvious shorthand for a 45 year old female, 52 year old male, and so forth.

When you squeeze both pieces of information into the same cell, you lose the ability to compute simple statistics. Most statististical software programs, for example will not know to drop the last letter before computing the average age, or to ignore the first two digits when computing the percentages of males and females.

In Excel, I used the LEFT function to extract the leftmost two digits into a separate cell and the RIGHT function to extract the last character into a different new cell. I could have avoided this though, if the data entry person didn't try to squeeze two pieces of data into the same cell.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2010-04-01. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Data management.