The POLITY data were introduced in the 1970s to measure the regime characteristics of historical and contemporary countries (Gurr, 1974). These data have been subsequently used in a wide variety of empirical studies of the structure, behavior, and evolution of political regimes. Since the inception of the Polity project, the data have been coded by a small group of subject matter experts, using historical and journalistic information. In this study, we develop an approach to coding the major dimensions of the POLITY data based on processing a large corpus of text about the political events within and between countries. We test these ideas on monthly data between January 2000 and December 2017 and then apply the estimated models to out-of-sample data for the months between January 2018 and April, 2019. The results illustrate the accuracy of text based data combined with a machine learning approach to coding the POLITY data. The accuracy we obtained matches or exceeds levels typically found with subject matter experts.
Turning Polity Upside Down
Michael D. Ward
Room
409