Skip to main content

Exploring the housing crisis with ggplot2 and plyr

ggplot2 is a new data visualisation package for R that uses the insights from Leland Wilkison's Grammar of Graphics to create a powerful and flexible system for creating data graphics. Practically, ggplot provides beautiful, hassle-free plots, that take care of fiddly details like drawing legends.

In this talk you'll see ggplot2 in action, exploring a dataset of nearly half a million house sales in the Bay area. I'll start with the basics, histograms and scatterplots, and then discuss how these plots can be enhanced with aesthetics and facetting to explore deeper into the data, answering progressively more complicated questions.

Graphics work best in conjunction with other analytic tools, so I'll also show you how the plyr package can be used to create rich summary statistics, exploring how the housing bubble has effected cities in the bay area differently. I'll connect these summaries to census data and speculate on who the bubble has affected most.