Skip to main content

An Application of the Biprobit Heckman Selection Model to Correct Estimates of HIV Prevalence from Sample Surveys

September 2012 CSSS Working Paper #119



1 Background
This is a very succinct summary of our application of the Heckman selection model approach to correcting estimates of HIV prevalence from sample surveys to HIV biomarker data recently collected at the Agincourt HDSS in South Africa. A sex-age-stratifed sample was drawn from the 30,000 individuals ages 15+ alive and resident in the DSS in 2010. With respect to the sampled individuals, the survey proceeded as follows:
1. attempt to make contact result: found or not-found
2. for those who were found, attempt to interview result: interviewed or not
3. for those who were interviewed, attempt to collect biomarkers result: biomarkers or not
4. for those who provided biomarkers, test biomarkers -result: positive or negative
Consequently, there are three decision points at which the sample is subdivided. At each of these unmeasured factors could have produced a selection effect that results in the `selected' fraction of the sample being systematically different from the `not- selected' fraction. At each of these stages we can use a Heckman Selection model to attempt to identify and correct for the selection bias. We attempt to do this methodically so that we can predict the HIV status of everyone in the original sample. Working down the list above, there are three subgroups of the sample that are not observed:
1. not-found
2. found but not- interviewed
3. interviewed but not- tested