Skip to main content

Clustering South African Households Based on their Asset Status Using Latent Variable Models

Tyler Harris McCormick

October 2012 CSSS Working Paper #121

Abstract

Abstract

The Agincourt Health and Demographic Surveillance System has since 2001 conducted a biannual household asset survey in order to quantify household socio-economic status (SES)in a rural population living in northeast South Africa. The data contain binary, ordinal and nominal items. We aim to describe the SES landscape in the study population by clustering the households into homogeneous groups based on their asset status.
A model-based approach to clustering, based on latent variable models, is proposed. In the case of modeling binary or ordinal items, item response models are employed. For nominal survey items, a factor analysis model, similar in nature to a multinomial probit model, is used. Both model types have an underlying latent variable structure { this similarity is exploited and the models are combined to produce a hybrid model capable of handling mixed data types. Further, a mixture of the hybrid models is considered to provide clustering capabilities within the context of mixed binary, ordinal and nominal response data. The proposed model is termed the mixture for factor analyzers for mixed data (MFA-MD).
The MFA-MD model is applied to the SES data to cluster households into homogeneous groups. The model is estimated within the Bayesian paradigm, using a Markov chain Monte Carlo algorithm. Intuitive groupings result providing insight to the di erent socio-economic strata within the Agincourt region.

Keywords: clustering, mixed data, item response theory, Metropolis-within-Gibbs