In many applications, we face the challenge of modeling the hidden interactions between multiple observations (e.g. discovering clusters of points in space or learning topics in documents). Furthermore, an added difficulty is that our datasets often have empirical distributions which are heavy tailed tailed (e.g. problems in natural language processing). In other words, even though we have large datasets, we are often in a sparse regime where there is a large fraction of our items that have only been observed a few times (e.g. Zipf's law which states that regardless of how big our corpus of text is, a large fraction of the words in our vocabulary will only be observed a few times). The question we consider is how to learn a model of our data when our dataset is large and yet is sparse.
We provide an algorithm for learning certain natural latent variable models applicable to this sparse regime, making connections to a body of recent work in sparse random graph theory and community detection. We also discuss the implications to practice.