We develop a latent curve model for longitudinal data which captures underlying stochastic variation in an interpretable manner. The approach decomposes the variation into data-adaptive components, which we call proto-splines. These proto-splines are linear combinations of basis functions chosen to reflect important features of the process under investigation. Our approach can be viewed as a hybrid of principal components analysis and the usual basis function approach to functional data analysis. The resulting components should be more interpretable than principal components and are more flexible than those employed in the basis function approach. Our proto-spline model class extends the scope of traditional mixed effects models, but still retains their emphasis on the estimation of components of variance, which often have substantive meaning. We prove the parameter estimates are consistent, asymptotically efficient, and Gaussian. An application to the analysis of wage inequality based on the National Longitudinal Survey of Young Men is presented.
Joint work with Marc Scott, NYU