Skip to main content

Data Flow in the Real World: Why Scripting, SQL, and Sociology Go So Well Together

In the perfect world, data would come in tables, preferably in SPSS or SAS format. While this is still true of survey research, a growing interest in research based on online activities-- email, blog posts, and Usenet, to name just a few --leads to new issues and challenges in the ways that data can be accessed, processed, and used. I am a member of a team consisting of several sociologists; my last few projects have been based on online and internet-based data sets.

In my talk, I will argue that the current set of off-the-shelf tools is not (and probably will not be, any time soon) adequate to the task of collecting and researching internet behavior, and so researchers will be forced to roll their own.

I will further argue that important decisions about data use and integrity occur throughout the process, and must be intimately reflected in the computer codes that drive the process. Sociologists should strongly consider learning at least basic levels of competence in both computer scripting languages and database languages. Last, I will provide a brief overview of some of these languages.

Danyel Fisher is a researcher at Microsoft Research in the Community Technologies Group. His PhD work in Computer Science from UC Irvine was based on examining social networks within email; he has since worked on projects involving online deliberation, email reciprocity, and the identification of social types.