Skip to main content

Major, Minor and Internal Categories of PNAS Research Reports

The Proceedings of the National Academy of Sciences is one of the world's most cited multidisciplinary scientific journals. PNAS publishes research reports in the Physical, Biological, and Social Sciences. The journal's official classification structure is reflected in topic labels submitted by the authors of manuscripts, largely related to traditionally established disciplines within the Physical, Biological, and Social Sciences. Focusing on articles in the Biological Sciences, we explore their internal soft classification structure based only on semantic decompositions of abstracts and bibliographies, and compare it with the formal discipline classifications.Our hierarchical model assumes that there is a fixed number of internal categories, each characterized by multinomial distributions over words (in abstracts) and references (in bibliographies). Soft classification for each article is based on proportions of the article's content coming from each category. Using eight internal categories in the model, we find that most traditional disciplines have major soft classification components in more than one internal category.

This is a joint work with Stephen Fienberg and John Lafferty, Carnegie Mellon University.