** 40+ years. How would you adjust for differential age distributions.**

This page is moving to a new website.

The key calculation is to understand the sampling probability. Let n_{ij}
represent the number of patients sampled in community i and age strata j. Let
N_{ij} represent the total number of patients in the population in
community i and age strata j. The probability of sampling, p_{ij},
is n_{ij}/N_{ij}. The inverse of this probability, 1/p_{ij}
is an interesting quantity. It tells you how many people in the population are
represented by a single individual in the population. So if the sample size is
100 and there are 2 million people in the population, each person in the
sample represents 20,000 people in the population.

If you weight the data by the inverse of p_{ij}, this will give
greater weight to those strata where you undersample, because each person in
the sample represents a larger number of individuals in the population than
you had hoped for. Similarly, this will give less weight to those strata where
you oversample.

Suppose you don't know the total number of patients in the population, but
you do know the relative proportions in each community. So in community 1, the
age group 0-14 years constituted 40% of your sample, but you knew that in the
population for community 1, age group 0-14 years corresponded to 50% of the
community. Let p_{ij} be the proportion of sample patients in
community i and age strata j relative to the total number sampled in community
i across all strata. Let P_{ij} be the proportion of the population in
community i who belong to strata j. If you weight the data by P_{ij}/p_{ij},
you will give greater weight to those patients who are undersampled (P_{ij}
> p_{ij}) and lesser weight to those patients who are oversampled (P_{ij}
< p_{ij}). You will give weight 1 to those patients who are
sampled correctly (P_{ij} = p_{ij}). In the above example
assign a weight of 0.5/0.4 = 1.25 to the age group 0-14 years.