Skip to main content

Logistic Regression with Distributed Databases

Considerable effort has gone into understanding issues of privacy protection of individual information in single databases, and various solutions have been proposed depending on the nature of the data, the ways in which the database will be used and the precise nature of the privacy protection being offered. Once data are merged across sources such as government agencies or competing business establishments, however, the nature of the problem becomes far more complex and a number of privacy issues arise for the linked individual files that go well beyond those that are considered with regard to the data within individual sources. The talk gives a brief overview of statistical disclosure limitation methods and their link to privacy-preserving data mining techniques. It also outlines an approach that gives full statistical analysis on the combined data sets without actually combining them. We focus on logistic regression, but the method and tools described may be applied essentially to other statistical models as well.

(Joint research with Stephen Fienberg and Yuval Nardi)