Can sex be an outcome variable (created 2010-03-16).

This page is moving to a new website.

Someone asked whether it was legitimate to use sex (gender) as a dependent variable or outcome variable in a logistic regression model. It seems wrong, on the face of it, to think that various factors can influence whether we are male or female. It actually is perfectly fine to use sex as an outcome variable. Here is how I would justify its use.

If you took a probability course back in college, they probably talked about balls and urns. An urn, by the way, is a fancy word for container. So if an urn had 10 black balls and 20 white balls, what is the probability that the first two balls drawn without replacement are white? Or suppose that you drew two white balls, what's the probability that the next ball drawn will also be white?

In the urn model, drawing two white balls does not physically change the color of any of the remaining balls. The colors of the balls are immutable. But you can still talk about a probability distribution for the balls because they are being drawn randomly from the urn.

You can even reverse the time arrow in probability calculations using an urn model. Suppose an urn has an unknown number of black and white balls. Draw one ball and throw it away without looking at it. Now draw ten more balls and note their colors. The data from these ten balls can help you determine the probability that the first ball was white.

Think of a bunch of people in an urn and the same concept applies. The probability that a person drawn from the "does not stop and ask for directions when lost" urn is male is approximately 98%. Since gender has a probability distribution, it can be modeled using logistic regression.

Now if you think in terms of independent variables being "causes" and dependent variables being "effects" then using sex as an outcome variable doesn't make much sense. After all, that time in 1987 when I was lost and asked for directions didn't suddenly turn me into a female. But we know that just because an independent variable is a good predictor of a dependent variable doesn't mean that the independent variable "caused" the dependent variable.

So go ahead an use sex (or any other immutable variable like race or ethnicity) as a dependent variable if you like. It doesn't lead to logical contradictions as long as you don't cling to the concept of independent variables as causes.