Skip to main content

Ecological Inference For 2 x 2 Tables

A fundamental problem in many disciplines, including political science, sociology and epidemiology, is to examine the association between two binary variables within a series of 2x2 tables, when only the margins are observed, and one of the margins is fixed. Various approaches to such ecological inference problems have been proposed, generating a great deal of controversy. Much of this controversy stems from the assumptions, often not explicitly stated, that underlie proposed approaches. A number of these approaches impose a hierarchy in which the pair of probabilities that characterize the table are assumed to arise from a distribution. Under a number of assumptions, the likelihood is a convolution of binomial distributions and is awkward to work with, and so a variety of approximations have been utilised. We work directly with the convolution likelihood and provide computational schemes for sampling from the posterior corresponding to this likelihood. A number of approximations that are useful for tables with large margins are also described, and the convolution likelihood is related to previous approximations. We suggest a non-hierarchical baseline model that may be applied to each table separately in order to quantify the amount of information on the pair of probabilities that is provided by the data in each table alone. This model provides a starting point for analysis, and also allows other proposed models, such as common probabilities across areas, or the imposition of a hierarchy, to be critically assessed. In particular the information that is purely a function of the hierarchical prior may be examined. We also clarify the importance of the design, that is, the distribution over areas of the numbers in the fixed margin of the table. Registration/race data from 275 US counties are used to illustrate the methods. A number of extensions are outlined including the consideration of multi-way tables, spatial dependence and area-specific (contextual) variables.