Skip to main content

Using Network Structure to Estimate Latent Features in Hard-to-Reach Populations

We propose network-based statistical models for learning about groups which are difficult to reach using standard surveys, such as the homeless or individuals with HIV/AIDS. Rather than sampling directly, we reach these individuals through their social network using questions on standard surveys. Specifically, we use questions of the form "How many X's do you know?,'' where X represents the population of interest, and present two statistical models for inferring unobserved features. Our first model, akin to a block model in complete network literature, leverages known information about respondents and some common populations (people named Michael, for example) to estimate demographic profiles and population sizes in hard-to-reach populations. We next propose a latent space model where the propensity for an individual to know members of a given group is independent, given the positions of the individual and the group in a latent "social space." This framework is similar in spirit to latent space models for complete networks (Hoff 2005). We then estimate relative homogeneity of hard-to-reach groups and describe variation in the propensity for interaction between respondents and population members. We also demonstrate how our method can be used for network inference outside of hard-to-reach populations, making information about more complicated network structure available to the multitude of researchers who cannot practically or financially collect data from the entire network.