Alleviating Ecological Bias in Poisson Models using Optimal Subsampling: The Effects of Jim Crow on Black Illiteracy in the Robinson Data

Jonathan C Wakefield

November 2012 CSSS Working Paper #122



In many situations data are available at the group level but one wishes to estimate the individual-level association between a response and an explanatory variable. Unfortunately this endeavor is fraught with difficulties because of the ecological level of the data. The only reliable solution to such ecological inference problems is to supplement the ecological data with individual-level data. In this paper we illustrate the benefits of gathering individual-level data in the context of a Poisson modeling framework. Additionally, we derive optimal designs that allow the individual samples to be chosen so that information is maximized. The methods are illustrated using Robinson's classic data on illiteracy rates. We show that the optimal design produces accurate inference with respect to estimation of relative risks,with ecological bias removed.

Keywords: Ecological bias, Combining information, Sample design