Skip to main content

Imputing Race and Ethnicity in State Administrative Data: Challenges and Future Directions

Lizzie Pelletier

Lizzy Pelletier


Administrative microdata hold promise for demographic and social scientific research, but some administrative records lack information on race and ethnicity. This paper describes methods of data integration and imputation to add individual-level ethnoracial data to administrative records. We aim to understand the limits of Bayesian Improved Surname Geocoding (BISG) methods, which combine information on residential address and last name to predict a set of ethnoracial group membership probabilities. Using Washington State administrative data, we compare BISG estimates to self-reported racial and ethnic identity from birth certificates and public assistance program data. We show low accuracy of BISG predictions especially for individuals identifying as Black, American Indian or Alaska Native, and Native Hawaiian or Pacific Islander. Furthermore, we use state employment records to show that BISG imputation methods yield both under- and over-estimates of earnings and employment statistics. We explore potential BISG alternatives, notably multinomial logistic regression and machine learning approaches, and assess how these methods compare in terms of overall accuracy and estimation of important metrics. 

Authors: Elizabeth Pelletier, Jennifer Romich, Sofia G. Ayala


Lizzy Pelletier is a PhD candidate studying public policy at the Evans School of Public Policy & Governance at the University of Washington, and an NIH-supported Data Science and Demography trainee through the Center for Studies in Demography and Ecology at UW. Her research examines how public policy shapes economic inequality, instability, and wellbeing, with a current focus on paid leave policies. Her work with large administrative microdata also explores how tools from data science and computational demography can be used to make these records more useful to social scientists.