Skip to main content

Designs and Analyses of Case-Control and Case-Cohort Studies: Applications to Prognosis of Wilms Tumor Patients

Two phase or double sampling was introduced by Neyman as a technique for drawing efficient stratified samples. We consider such designs primarily for the case of a binary outcome variable. When strata and phase two sampling fractions depend both on outcome and covariates, care must be taken with the analysis of data from the resulting biased sample. The standard survey sampling (Horvitz-Thompson) approach involves weighting the log-likelihood contributions by the inverse sampling fractions. Only recently have "nonparametric maximum likelihood" (NPMLE) procedures become available (Scott and Wild 1997, Breslow and Holubkov 1997). Large sample theory for the NPMLE procedures confirms that they have a semiparametric efficient influence function and achieve the information bounds derived by Robins, Hsieh and Newey (1995).

Using data from the National Wilms Tumor Study, and simulations based on these data, we demonstrate the advantages of careful selection of the phase two sample and use of an efficient analysis method. The basic principles include: (i) fine stratification of the phase one sample using outcome and available covariates; (ii) selection of a ``balanced'' rather than a simple case-control sample at phase two; and (iii) estimation via non-parametric maximum likelihood. Double sampling in the context of survival analysis leads to the exposure stratified case-cohort design considered by Borgan, Langholz, Samuelsen and Goldstein (2000). See also Lin (2000). Simulations based on the Wilms tumor data confirm that strategies (i) and (ii) are advantageous also in this context.

Portions of this work are joint with N. Chatterjee and J. Wellner.