Propensity score matching aims to gain for an observational study various benefits characteristic of experiments, only some of which can be directly observed. If it is successful in those of its aims that are observable, this suggests it has succeeded in the remaining ones: we cite visible successes as evidence of likely success elsewhere. Yet existing theory pertinent to matching supports such connections only vaguely. It seems to require exact matching on the true propensity score, even in the absence of hidden bias. In practice, the best one can do is to match approximately, and on estimated scores.
There are at least two problems with this state of affairs. First, there is confusion about how best to match: Must one match as closely as is possible on the best estimate of the propensity score? What precisely is the role of balance, and how much of it does one need? Second, we are without a basis for inference with propensity-matched data that requires neither the addition of a model of the data generating process nor the pretense of exact propensity score matching.
This paper develops a novel large sample account of permutation-type causal inferences with propensity-matched data. Rather than relying on a specific estimation or matching technique, it puts the more nearly verifiable of propensity matching's aims in a central role, thus clarifying their contributions to the integrity of inferences about treatment effects.